

# Analog In-Memory Compute using SST SuperFlash

### Mark Reiten





Artificial intelligence based on advanced machine learning models has gained tremendous momentum in many industries. Many machine learning optimized digital processing solutions have been introduced in the last 5 years, but none can match the power and performance advantages of analog memory-based computing devices. SST's memBrain cost/performance can be 10 times better and the power can be 100x lower than a comparably performing digital solution. This is accomplished by storing multiple levels (up to 256) per cell to represent a "weight" or "synapse" in a neural model. Multiplication is done through cell operation and addition is done by summing the output lines. Vector Matrix Multiplication is accomplished through design techniques so any existing SuperFlash process can support this new optimized compute paradigm.







- Neural Systems and Uses
- Deep Neural Networks: The Problem
- Analog vs Digital compute
- What is memBrain
- memBrain<sup>tm</sup> Analog Inference Designware
- Many Model Types can be supported
- Why it works
- Software Flow





### **Neural Systems and Uses**

#### Image Processing Only simple math needed Massive input data sets required **Object Detection and Classification** Requires floating point math High parallelism needed for low latency Speech recognition Many training iterations needed Memory with embedded processing is best – ESF1 or ESF3 is best solution! Natural Language Processing GPUS or FPGAs or TPUs or ... best Text processing Training Inference Search Similarity/relevance analysis Recommendations/Ads Cat Cat **Data Mining** Model Model Face **Business Pattern extraction** Car **Bioinformatics** PR 28. 28 Car Weights are Hybrid Applications Adjusted Weights are Visual scene understanding Fixed Personal Assistants (speech) Trainer Game playing Correct output Actual Robotics (image + speech) output

Flash Memory Summit 2019 Santa Clara, CA OCHIP



### Deep Neural Networks: The Problem

#### DNNs require vast numbers of Multiply-Accumulate operations (MACs)









## Analog vs Digital Compute



Analog advantages:

Data and compute compression

### No Data thrashing!

In neural networks, involving simple operations on large data sets, time needed to read and write computation registers will dwarf actual computation time.



Bus traffic can occupy >90%  $\smile$   $\bigcirc$  (of the time required to process the calculations.





## What is memBrain?

nenBrain

solves the compute problem by storing the weights in eFlash and using analog cell operation to perform the MAC operations inside the storage array



Each cell stores up to 8 bits in 1.5 transistors Multiply happens through cell operation Compared to 48 SRAM transistors for 8 bits!

Output line functions as a Neuron Summation happens along output

Power is ~.3pj per MAC !!





### memBrain<sup>tm</sup> Analog Inference Designware

HY



512x512 tile full frame cycle time 10-30us

Depends on D-A and A-D power

Energy is 0.3pJ per MAC with D/A+A/D @ 30us frame cycle time

Area with D to A input and A to D output blocks = .48 mm^2 on 40nm for 512x512 Tile

memBrain uses the SuperFlash cell as a multi-level analog device storing 4-8 bits per cell depending on the application

memBrain is organized as "Tiles" with wide I/O configurations supporting massively parallel multiply/accumulate operations

> Performance per silicon area and power are orders of magnitude better than optimized digital solutions





### Many Model Types can be supported



Santa Clara, CA





### Why it Works

ESF3 Cell analog operation varies the floating gate potential with repeatable consistent separation between steps across array and temp





ESF3 Cell data retention is excellent Noise is low Noise and temperature can be compensated for





### **Software Flow**







### **Thank You**

