



# Persistent Memory for Artificial Intelligence

Bill Gervasi
Principal Systems Architect
bilge@Nantero.com





# **Demand Outpacing Capacity**



**In-Memory Computing** 

**Artificial Intelligence** 

**Machine Learning** 

**Deep Learning** 





# **Driving New Capacity Models**



Non-volatile memories

Industry successfully snuggling large memories to the processors...

...but we can do oh! so much more





# My Three Talks at FMS



**NVDIMM Analysis** 



**Memory Class Storage** 



**Artificial Intelligence** 





# **History of Architectures**







# Historical Trends in Computing



Computing

Central Processing



Distributed Processing

Edge Computing







Power Failure
Data Loss





# Some Moments in History



Central Processing

**Shared Processor Dumb terminals** 



Peer-to-peer networks



**Distributed Processing** 

Processor per user





# Some Moments in History



Central Processing

"Native Signal Processing"
Main CPU drivers
Cheap analog I/O





Tightly-coupled coprocessing



**Distributed Processing** 

Hercules graphics
Sound Blaster audio
Rockwell modem
Ethernet DSP





### The Lone Survivor...



**Graphics add-in cards** 



**Integrated graphics** 

...survived the NSP war





# Some Moments in History

Central Processing

Phone providers controlled all data processing





**Edge computing** reduces latency



**Distributed Processing** 

Phone apps provide local services





# When the Playing Field Changes

The speed of networking directly impacts the pendulum swing from centralized to distributed







A faster network favors distributed computing





#### Winners and Losers

Often, the maturity of the software development environment determined who won and who lost









# Maintaining an Edge



13





# The Tail Wagging the Dog



I won't say "It's the Software, Stupid" because I know you're not stupid

however

To succeed, Al needs GREAT software infrastructure

Driving some companies to design hardware to the software instead of software to the hardware







#### Al on Traditional Server



No magic

Al applications are like any other

Data processing done on main CPU

Downside is main CPU is overkill in floating point, and weak in parallelism





### **Al Evolution**







#### Al Evolution



#### **Al Accelerator Characteristics**

Wide array of simple processing elements

Reduced floating point precision

**Tuned for matrix operations** 





# **In-Memory Computing**

In-memory computing lets the Al accelerator control the memory directly



Also great for encryption





# **Data Processing Paradigms**



**Traditional database** 

**Data mining** 

Inferencing

**Fuzzy logic** 

Recognition

etc







# The Actualization Gap



**Research projects** 



**Deployments** 





### The "Research" Projects

Flash Memory Summit





Many interconnects between storage elements and processing elements

Weighted calculations produce parallel possible results

Focus for a number of startup companies





# What Most People Mostly Building



Dense matrix memory for highest storage capacity

Shared memory controller for many execution units

Pipes for networking







#### **Practical I/O Connection Limits**



Toroid is a more practicable solution



Limits how quickly data can flow in and out







### **Lossless Versus Lossy**

Persistent data: reload needed

Deep Convolutional Network (DCN)

Transient data: reload, restart calculations

Accumulated data: modified models are expensive to rebuild



Time to reload is always an issue





# Recovering From Power Fail

Data pulled from main memory

...or worse...

**Backing store** 



Data requires multiple hops through the interconnects

Not uncommon for data reload to take 3 minutes or more

Before recalculation can begin!





# **Distributed Memory Complications**



This may help explain the gap between research projects and actual deployments

Distributed cells complicate download time into the arrays





# Persistent Main Memory

NVDIMMs are moving data persistence to the main memory bus

and in some cases increasing memory capacity

See my other talk later this week







#### Cost of Power Failure

The average downtime in the U.S. is 7.9 hours per outage.

worse than average downtime of 175 hours a year can lose

The average cost of downtime per minute is



The Cost Of Internet Outages Beyond Revenue

>\$5M \$2M to \$5M \$1M to \$2M 12% \$501,000 to \$1M 11% \$401,000 to \$500,000 \$301,000 to \$400,000 \$201,000 to \$300,000 \$101,000 to \$200,000 7% \$50,000 to \$100,000

> Statistics vary but all agree... downtime costs a LOT

August 2018







DRAM
Loses data
Must be refreshed
Can't lose power

Santa Clara, CA August 2018



Persistent Memory
Holds data
forever, even
on power fail





#### Nantero NRAM™







DDR4 DDR5



Nantero NRAM is a persistent memory using carbon nanotubes to build resistive arrays which can be arranged in a DRAM compatible device



See my other talks later this week





# Classes of Persistent Memory

Non-volatility

Endurance

Read Time

Write Time

| DRAM     | NRAM     |  |
|----------|----------|--|
| No       | Yes      |  |
| No limit | No limit |  |
| 10 ns    | 10 ns    |  |
| 10 ns    | 10 ns    |  |

Memory & Memory Class Storage

| MRAM    | ReRAM   | PCM /<br>3DXpoint | FeRAM    |
|---------|---------|-------------------|----------|
| Yes     | Yes     | Yes               | Yes      |
| Limited | Limited | Limited           | No limit |
| Х       | Х       | Х                 | Х        |
| Х       | Х       | Х                 | Х        |

**Storage Class Memory** 

Flash

Yes

10<sup>3</sup>

50M ns

25M ns

Storage

See my other talks later this week



Flash Memory Summit

**Applying Persistent Memory** 

Replace DRAM with Persistent Memory

Completely eliminates the need to reload on Power fail



Next generation persistent memory will target SRAM,

Persistent shadow registers aren't such a bad idea, either

Santa Clara, CA August 2018

3/





# **NRAM** for Main Memory

**NRAM** 

NRAM replaces DDR4, DDR5 for main memory







#### **Enables the New Architectures**



NRAM cells in the array

Permanent storage through power fail

Programmed once during manufacturing, no reload





# NRAM Everywhere

Soon we will look back and say

"Remember when data was lost when power went out?"

and laugh







### **Full Disclosure**



My first home computer had an 8" floppy disk

I earned my gray hair





### Summary

Centralized versus distributed computing is a long term cycle

Quality of software infrastructure typically determines the winner

Artificial intelligence accelerators are a recent co-processing addition

Data loss on power failure is worsened by Al architectures

Persistent memory in Al device solves major problems

Nantero NRAM addresses many usages of PM in Al systems

If you remember 8" floppies, you probably can't read this screen





#### Questions?

Bill Gervasi
Principal Systems Architect
bilge@Nantero.com