

#### Design of PRAM-based Persistent NVDIMM Controllers to Prepare the Data Age

Myoungsoo Jung



Computer Architecture and MEmory systems Lab.

School of Integrated Technology Yonsei University



#### Data Explosion





(1) IDC "Worldwide Internet of Things (IoT) 2013-2020 forecast" October 2013. (2) IDC "The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things" April 2014 (3) Global Smart Meter Forecasts, 2012-2020. Smart Grid Insights (Zypryme), November 2013 (4) http://en.wikipedia.org



Flash

Santa

#### Data Explosion



TECHNOLOGY

#### Facebook Privacy: Social Network Buys Data From Third-Party Brokers To Fill In User Profiles

BY AJ DELLINGER ON 12/28/16 AT 4:57 PM

Data

#### becomes the new currency



**2025** 163 ZB

(IDC-2017)

It comes as no surprise to any Facebook user that the social network gathers a considerable amount of information based on their actions and interests. But according to a <u>report from ProPublica</u>, the world's largest social network knows far more about its users than just what they do online.





## What Can System/Architecture Her Section





DISK



Santa Clara, CA



#### *Time & Technology node*

Flash Memory Summit 2018 Santa Clara, CA

Source: SK Hynix







#### General Assumptions of PRAM La

| Architecting Phase Change Memory as a<br>Scalable DRAM Alternative                                                                                                                                                                                                                                                                                                                                                                                         |                                                                                                                                                                                      | HPCA'1                                                                                                                                                | <b>8</b> Accelera                                                                                                                | Accelerating Write by Exploiting PCM Asymmetries |                                                                                                                                                                                                                                                               |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Benjamin C. Lee† Engin Ipek<br>†Computer Architecture Group<br>Microsoft Research<br>Redmond, WA<br>{blee, ipek, dburger}@microsoft.com                                                                                                                                                                                                                                                                                                                    | Crash Consistency in Encrypted Non-Volatile Main Memory Systems<br>Sihang Liu <sup>1</sup> , Aasheesh Kolli <sup>2,3</sup> , Jinglei Ren <sup>4</sup> , and Samira Khan <sup>1</sup> |                                                                                                                                                       |                                                                                                                                  |                                                  | ng Zhu<br>Engineering<br>aine<br>@maine.edu                                                                                                                                                                                                                   |
| ABSTRACT<br>Memory scaling is in jeopardy as charge storage and sensing<br>mechanisms become less reliable for prevalent memory tech-<br>nologies, such as DRAM. In contrast, phase charge memory<br>(PCM) storage relies on scalable current and thermal mecha-<br>nisms. To exploit PCM's scalability as a DRAM alternative,<br>PCM must be architected to address relatively long laten-<br>cies, high energy writes, and finite endurance.<br>ISCCA'09 | Abstract Non-Volatile Main Memory (NVMM)                                                                                                                                             | /MM) systems provide high difference between storage and memory, making it possible to stora and manipulate participant data in place in memory. Such |                                                                                                                                  |                                                  | be simultaneously written to a bank. N is referred to as a<br>unit in this paper. Accordingly, writing a cache line of<br>ytes requires multiple serially executed write units, which<br>s down the overall write performance dramatically. Beside<br>HPCA'13 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                            | performance by directly manipulating persistent data in-<br>memory, but require crash consistency support to recover data                                                            |                                                                                                                                                       | systems with non-volatile main memory (NVMM), also referred<br>to as <i>persistent memory</i> systems improve the performance of |                                                  |                                                                                                                                                                                                                                                               |

Many previous works assume P RAM's write latency as **similar to or slightly worse** than DRA M (1.5x)





#### **PRAM Latency Measurement**





## Our performance measurement on real 3x nm PRAM exhibits expensive write latency than DRAM (190x)



#### New Memory Placement



• PRAM offers promising read performance, but terrible write latency, compared to DRAM



## What Are The Considerations To Design Hybrid Memory (DRAM+PRAM) Controller?



# To get insights of controller design, let's understand the details of DRAM and PRAM



- Multiple banks to serve multiple memory requests in parallel
- Single row buffer within a bank





#### Does PRAM have the Same Internal Architecture with DRAM?

→ PRAM employs multiple row buffers
 → PRAM employs multiple row buffers
 Challenge2: PRAM's asymmetric latency incurs lots of bank conflicts

→ PRAM uses multi-partition architecture



- Multiple partitions within the bank for partition-level parallelism
- Multiple row buffers to mitigate long write latency



Can A Conventional DRAM Controller Be Aware of Multi-Partition? (Inside of A Bank)

(Revisited) Conventional DRAM scheduler just utilizes **bank-level parallelism**. Cannot see inside of bank!

∴ Partition-level parallelism should be supported



#### How Memory Requests Can Be Scheduled By Exploiting Multi-Partition Architecture?

**Limitation of PRAM**: In PRAM design, WRITE request blocks whole PRAM bank.







#### How Memory Requests Can Be Scheduled By Exploiting Multi-Partition Architecture?



**Key insight**: Although WRITE cannot be serviced, READs can be serviced if partition number is different



**Solution**: Add a *register* to store 'partition number of WRITE' and compare it with partition number of incoming READ request



### Design1: Scheduling Support Module for PRAM-aware New Scheduling Scheme





### Now Requests Are Scheduled. Then, How Then Can It Serve to Hybrid Memory?





Firstly, as is generally known, LPDDR is JEDEC standard low-power memory interface (used for DRAM)



- 1) Activation: activate target row & write that data to row buffer
- 2) Read/Write: accessing row buffer with column address
- 3) Precharge: charge half-voltage of bit-line





#### Does PRAM have the Same Memory Interface (LPDDR2) with DRAM?



(Revisited) PRAM has a different architecture with DRAM such as **Multiple row buffers** and **More larger capacity** 

∴ Different interface is required



## **PRAM's Timing**



- PRAM requires different timing model from DRAM
  - NVM memory space is much larger than a DRAM
- 3-Phase addressing (LPDDR-NVM by JEDEC)





Design2: Heterogeneity Support Module for both LPDDR & LPDDR-NVM







#### Don't Forget DRAM is For Cache. Then, How Caching Can Be Supported?



# Solution: Keep which data exist in DRAM (caching info) in lookup table.

Moreover, like conventional cache, controller should have algorithms such as DRAM dataline update, eviction, and find empty dataline.



Lookup table do not include data value, includes address information



#### Design3: Caching Support Module for Use DRAM As Inclusive Cache of PRAM





BTW, How Non-Volatility of PRAM Can Be Maintained Although DRAM Is Integrated? (Hybrid)

**Challenge of hybrid memory**: Data in DRAM will disappear when there is a power failure





## **FLUSH Operation**



**Solution**: Provide *Flush* operation which moves DRAM data to PRAM. Memory controller generates 'PRAM write' request corresponding to the target DRAM row.

NOTE) 'PRAM write' will be stored in command queue which exists in memory controller. And DRAM dataline is invalidated.



#### Okay, Data Delivery Is Guaranteed. Is It Good Enough?

**Challenge of flush**: User believes data has the latest value. But, the memory controller can reorder the order of memory request





### **FENCE** Operation



**Solution**: Provide *Fence* operation to enforce data delivery order of memory requests. The memory controller can simply add *'fence flag'* to check fenced or not



#### Design4: Persistent Support Module to Guarantee Data Delivery & Delivery Order







### Demo (Track and Field)







#### **Demo – Slow Version**









# **Demo – Normal Version**







# **Further Enhancement**







# **Further Enhancement**





# 











### **Real-Time Data**







Flas

Fla

Sa

## Amazon won't say if it hands your Echo data to the government

The retail, cloud, and device giant stands as the least transparent of transparent tech companies.



By Zack Whittaker for Zero Day | January 16, 2018 -- 21:36 GMT (05:36 GMT-08:00) | Topic: Cybersecurity in an IoT and Mobile



Security

#### Who owns connected car data?



28 Sep 2015 Matthew DeBord

**f) 🕑 (in** 

Latest Articles

Forgotten twice: the untold story of LGBT refugees Antonio Zappulla 19 Jan 2018 Get ready for the biggest change to driving in the history of the automobile. Transportation and information are converging, at a rate much faster than anyone imagined possible even just a few years ago.

Every major automaker is grappling with the shift. General Motors has made a big bet on high-speed wireless connectivity throughout its vehicle fleet. Luxury carmakers such as BMW and Audi are rapidly enhancing the ability of the their cars to be as digitally enabled as smartphones. And Google and Apple are aggressively experimenting with both software and hardware, through Android Auto, self-driving cars, and Apple Car Play.

Here's just one example of how serious this business is for the auto industry. At the 2015 New York Auto Show, Ford CEO Mark Fields held a talk with a small

# Amazon won't say if it hands your Echo data to the government

Amazon has a transpart Are Car Companies Going To Profit From Your Driving Three years ago, the re reveal how many subpr for customer data in a P regularly published its 6.

Who owns connected car data?

RECOMMENDED FOR YOU

company's flagship Echo, an "always listening" speaker, collects vast amounts of customer data that's openly up for grabs by the government.

But Amazon's bi-annual transparency figures don't want you to know that

In fact Amazon has been downright deceptive in how it presents the data

Cars have gotten smart in a hurry. This has raised two major issues, one positive, one maybe not so positive.

lace where data was freely flowing

**srg** 





- Data should be tagged and classified by criticality
  - Security is matter the private data should be managed by local or endpoint part
- Can we make each device as a standalone accelerator?
  - IoT leverages very power-limited devices
  - Even OS or file system can be a burden for IoT!





Storage-based Accelerator!



Traditional Het Stogragge base proceederator Storage Host Coprocessor

Direct accesses

Integrate NVM chips into embedded coprocessor.

All cores are directly connected to storage.

Process data within an accelerator.



## **Overall Architecture**





Santa Clara, CA



NearZero architecture







## NearZero (Results)









- Full FPGA Automation for SCMbased Storage
- Performance Latency is sustainable for all random and sequential access patterns

| Туре                      | NVMe SSD (ASIC) | New Memory SSD<br>(ASIC) | Our Prototype<br>(FPGA) |
|---------------------------|-----------------|--------------------------|-------------------------|
| Bandwidth<br>(read/write) | 1.1~2.3GB/sec   | 1.5~3.5GB/sec            | 2.5 ~ 5.2 GB/sec        |
| Latency                   | 15 ~ 150 us     | 8 us ~ 100 us            | 11~13 us                |





# QnA

