# Venice

## Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses

Onur Mutlu <u>omutlu@gmail.com</u>

https://people.inf.ethz.ch/omutlu

7 August 2024

FMS: the Future of Memory and Storage

SAFARI

ETH zürich



## Venice Paper, Slides, Video [ISCA 2023]

 Rakesh Nadig, Mohammad Sadrosadati, Haiyu Mao, Nika Mansouri Ghiasi, Arash Tavakkol, Jisung Park, Hamid Sarbazi-Azad, Juan Gómez Luna, and Onur Mutlu, "Venice: Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses" Proceedings of the 50th International Symposium on Computer Architecture (ISCA), Orlando, FL, USA, June 2023.
 [arXiv version]
 [Slides (pptx) (pdf)]
 [Lightning Talk Slides (pptx) (pdf)]
 [Lightning Talk Video (3 minutes)]
 [Talk Video (14 minutes, including Q&A)]

#### Venice: Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses

\*Rakesh Nadig<sup>§</sup> \*Mohammad Sadrosadati<sup>§</sup> Haiyu Mao<sup>§</sup> Nika Mansouri Ghiasi<sup>§</sup> Arash Tavakkol<sup>§</sup> Jisung Park<sup>§∇</sup> Hamid Sarbazi-Azad<sup>†‡</sup> Juan Gómez Luna<sup>§</sup> Onur Mutlu<sup>§</sup> <sup>§</sup>ETH Zürich <sup>∇</sup>POSTECH <sup>†</sup>Sharif University of Technology <sup>‡</sup>IPM

#### https://arxiv.org/pdf/2305.07768



# Venice

## Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses

Rakesh Nadig\*, Mohammad Sadrosadati\*, Haiyu Mao, Nika Mansouri Ghiasi, Arash Tavakkol, Jisung Park, Hamid Sarbazi-Azad, Juan Gómez Luna, and Onur Mutlu

Presented at ISCA 2023









#### **Talk Outline**

#### **Motivation**

Venice

**Evaluation** 

**Summary** 

### **Overview of a Modern Solid-State Drive**



#### **Key Problem: Path Conflicts in Modern SSDs**



## **Delay Caused by Path Conflicts**



Path conflicts increase the average I/O latency by 57% in our experiments on a performance-optimized SSD

The performance overhead of path conflicts increases by 1.6x in our experiments for high-I/O-intensity workloads









SAFARI



[1] Kim+, "Networked SSD: Flash Memory Interconnection Network for High-Bandwidth SSD", MICRO 2022
[2] Tavakkol+, "Network-on-SSD: A Scalable and High-Performance Communication Design Paradigm for SSDs", IEEE CAL 2012



Kim+, "Networked SSD: Flash Memory Interconnection Network for High-Bandwidth SSD", MICRO 2022
 Tavakkol+, "Network-on-SSD: A Scalable and High-Performance Communication Design Paradigm for SSDs", IEEE CAL 20

#### **Our Goal**

# To fundamentally address the path conflict problem in SSDs by

 increasing the number of paths to each flash chip (i.e., path diversity) at low cost

2. effectively utilizing the increased path diversity for communication between the SSD controller and flash chips

#### **Talk Outline**

#### **Motivation**

Venice

**Evaluation** 

**Summary** 

## **Our Proposal**



# Venice

A low-cost interconnection network of flash chips in the SSD



Conflict-free path reservation for each I/O request



SAFARI

A non-minimal fully-adaptive routing algorithm for path identification

Named after the network of canals in the city of Venice https://en.wikipedia.org/wiki/Venice

## **Our Proposal**



# Venice

A low-cost interconnection network of flash chips in the SSD



Conflict-free path reservation for each I/O request



SAFARI

A non-minimal fully-adaptive routing algorithm for path identification

Named after the network of canals in the city of Venice https://en.wikipedia.org/wiki/Venice

### **Venice: Architecture**



Venice provides increased path diversity at low cost

No modifications to existing flash chips in Venice

## **Our Proposal**



# Venice

| R |   |
|---|---|
| J | Y |

A low-cost interconnection network of flash chips in the SSD



**Conflict-free path reservation for each I/O request** 



SAFARI

A non-minimal fully-adaptive routing algorithm for path identification

Named after the network of canals in the city of Venice https://en.wikipedia.org/wiki/Venice

## Venice: Path Reservation (I)

• Venice uses a small *scout packet* to reserve a conflict-free path for each I/O request



## Venice: Path Reservation (II)

Venice uses a small *scout packet* to reserve a conflict-free path for each I/O request



### **Venice: Path Reservation**

#### Venice uses a small scout packet to reserve a

#### Path reservation eliminates path conflicts by enabling conflict-free I/O transfer



The overhead of path reservation is negligible due to the small size of the scout packet



## **Our Proposal**



# Venice

A low-cost interconnection network of flash chips in the SSD



Conflict-free path reservation for each I/O request



SAFARI

A non-minimal fully-adaptive routing algorithm for path identification

Named after the network of canals in the city of Venice https://en.wikipedia.org/wiki/Venice

#### **Venice: Non-Minimal Fully Adaptive Routing**

- Venice uses a non-minimal fully-adaptive routing algorithm to route *scout packets* when a minimal path is unavailable
- Effectively utilizes the idle links in the interconnection network to find a conflict-free path



## **More in the Paper**

- Venice's non-minimal fully-adaptive routing algorithm
- Handling deadlock and livelock scenarios
- Overhead of exercising a non-minimal path
- Analysis of prior architectures proposed to mitigate the path conflict problem
- Detailed background on modern SSD architecture

#### **Talk Outline**

#### **Motivation**

Venice

**Evaluation** 

**Summary** 

## Simulating SSDs: MQSim [FAST 2018]

Arash Tavakkol, Juan Gomez-Luna, Mohammad Sadrosadati, Saugata Ghose, and Onur Mutlu,
 "MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices"
 Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST), Oakland, CA, USA, February 2018.
 [Slides (pptx) (pdf)]
 [Source Code]

#### MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices

Arash Tavakkol<sup>†</sup>, Juan Gómez-Luna<sup>†</sup>, Mohammad Sadrosadati<sup>†</sup>, Saugata Ghose<sup>‡</sup>, Onur Mutlu<sup>†‡</sup> <sup>†</sup>*ETH Zürich* <sup>‡</sup>*Carnegie Mellon University* 

https://github.com/CMU-SAFARI/MQSim

SAFARI https://people.inf.ethz.ch/omutlu/pub/MQSim-SSD-simulation-framework\_fast18.pdf

## Simulating Memory: Ramulator 2.0

 Haocong Luo, Yahya Can Tugrul, F. Nisa Bostanci, Ataberk Olgun, A. Giray Yaglikci, and Onur Mutlu, "Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator" *Preprint on arxiv*, August 2023.
 [arXiv version]
 [Ramulator 2.0 Source Code]

# Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator

Haocong Luo, Yahya Can Tuğrul, F. Nisa Bostancı, Ataberk Olgun, A. Giray Yağlıkçı, and Onur Mutlu

https://arxiv.org/pdf/2308.11030.pdf

#### SAFARI https://github.com/CMU-SAFARI/ramulator2

## Open Source Tools: SAFARI GitHub



https://github.com/CMU-SAFARI/

## SSD Course (Spring 2023)

#### Spring 2023 Edition:

https://safari.ethz.ch/projects\_and\_seminars/spring2023/ doku.php?id=modern\_ssds

#### Fall 2022 Edition:

https://safari.ethz.ch/projects\_and\_seminars/fall2022/do ku.php?id=modern\_ssds

#### Youtube Livestream (Spring 2023):

https://www.youtube.com/watch?v=4VTwOMmsnJY&list =PL5Q2soXY2Zi\_8qOM5Icpp8hB2SHtm4z57&pp=iAQB

#### Youtube Livestream (Fall 2022):

- https://www.youtube.com/watch?v=hqLrd-Uj0aU&list=PL5Q2soXY2Zi9BJhenUq4JI5bwhAMpAp13&p p=iAQB
- Project course
  - Taken by Bachelor's/Master's students
  - SSD Basics and Advanced Topics
  - Hands-on research exploration
  - Many research readings

#### https://www.youtube.com/onurmutlulectures



Fall 2022 Meetings/Schedule

| Week           | Date       | Livestream        | Meeting                                                                                                                                             | Learning<br>Materials   | Assignment |
|----------------|------------|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|------------|
| W1             | 06.10      |                   | M1: P&S Course Presentation                                                                                                                         | Required<br>Recommended |            |
| W2             | 12.10      | Yeu Ture Live     | M2: Basics of NAND Flash-<br>Based SSDs<br>m PDF m PPT                                                                                              | Required<br>Recommended |            |
| W3             | 19.10      | You Live          | M3: NAND Flash Read/Write<br>Operations                                                                                                             | Required<br>Recommended |            |
| W4             | 26.10      | You Doc Live      | M4: Processing inside NAND<br>Flash                                                                                                                 | Required<br>Recommended |            |
| W5             | 02.11      | Yeu Tube Live     | M5: Advanced NAND Flash<br>Commands & Mapping<br>2020 PDF 2020 PPT                                                                                  | Required<br>Recommended |            |
| W6             | 09.11      | You Tube Live     | M6: Processing inside Storage                                                                                                                       | Required<br>Recommended |            |
| W7             | 23.11      | You Live          | M7: Address Mapping &<br>Garbage Collection                                                                                                         | Required<br>Recommended |            |
| W8             | 30.11      | You Time Live     | M8: Introduction to MQSim                                                                                                                           | Required<br>Recommended |            |
| W9             | 14.12      | You Live          | M9: Fine-Grained Mapping and<br>Multi-Plane Operation-Aware<br>Block Management<br>amPDF amPPT                                                      | Required<br>Recommended |            |
| W10 04.01.2023 | 04.01.2023 | Yeu Time Premiere | M10a: NAND Flash Basics                                                                                                                             | Required<br>Recommended |            |
|                |            |                   | M10b: Reducing Solid-State<br>Drive Read Latency by<br>Optimizing Read-Retry                                                                        | Required<br>Recommended |            |
|                |            |                   | M10c: Evanesco: Architectural<br>Support for Efficient Data<br>Sanitization in Modern Flash-<br>Based Storage Systems<br>an PDF im PPT im Paper     | Required<br>Recommended |            |
|                |            |                   | M10d: DeepSketch: A New<br>Machine Learning-Based<br>Reference Search Technique<br>for Post-Deduplication Delta<br>Compression<br>mPDF m PPT mPaper | Required<br>Recommended |            |
| W11            | 11.01      | Yeu 🛅 Live        | M11: FLIN: Enabling Fairness<br>and Enhancing Performance in<br>Modern NVMe Solid State<br>Drives<br>miPDF im PPT                                   | Required                |            |
| W12            | 25.01      | Yeu Tube Premiere | M12: Flash Memory and Solid-<br>State Drives                                                                                                        | Recommended             |            |

## **Evaluation Methodology**

- Using MQSim [Tavakkol+, FAST'18], a state-of-the-art SSD simulator
- Two SSD configurations
  - Performance-Optimized (Samsung Z-NAND SSD)
  - Cost-Optimized (Samsung PM9A3)
- Nineteen data-intensive workloads from
  - MSR Cambridge, YCSB, Slacker, SYSTOR '17 and RocksDB

#### Prior Approaches

- Baseline SSD: A typical multi-channel shared bus SSD
- Packetized SSD (pSSD) [Kim+, MICRO'22]: Uses packetization to double the flash channel bandwidth
- Packetized Network SSD (pnSSD) [Kim+, MICRO'22]: Increases path diversity by introducing vertical channels
- Network-on-SSD (NoSSD) [Tavakkol+, CAL 2012]: Proposes an interconnection network of flash chips with simple deterministic routing
- Path-conflict-free SSD: An *ideal SSD* with no path conflicts

## **Results: Performance Analysis (I)**

• Performance-Optimized SSD



Venice improves SSD performance by 1.9x on average over the best-performing prior work

Venice's performance is within 45% of the performance of a Path-conflict-free SSD

## **Results: Performance Analysis (II)**

• Cost-Optimized SSD



Venice improves SSD performance by 1.5x on average over the best-performing prior work

Venice's performance is within 25% of the performance of a Path-conflict-free SSD

## **Results: Performance Analysis (III)**

#### Performance-Optimized SSD



### **Results: Reduction in Path Conflicts**

99.98% of I/O requests do not experience path conflicts



Venice mitigates path conflicts by using path reservation and effective utilization of path diversity

## **Results: SSD Energy Consumption**



Venice reduces the SSD energy consumption by 46% on average over the most efficient prior work

## **Tail Latency**

 Comparison of tail latencies in the 99th percentile of I/O requests



### **Tail Latency**

• Comparison of tail latencies in the 99th percentile of I/O requests



- Power and area overhead analysis
- Tail latency analysis
- Sensitivity to interconnection network configurations
- Performance on mixed workloads
- Detailed evaluation methodology

### **More in the Paper**

### Venice: Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses

\*Rakesh Nadig<sup>§</sup> \*Mohammad Sadrosadati<sup>§</sup> Haiyu Mao<sup>§</sup> Nika Mansouri Ghiasi<sup>§</sup> Arash Tavakkol<sup>§</sup> Jisung Park<sup>§∇</sup> Hamid Sarbazi-Azad<sup>†‡</sup> Juan Gómez Luna<sup>§</sup> Onur Mutlu<sup>§</sup> <sup>§</sup>ETH Zürich <sup>∇</sup>POSTECH <sup>†</sup>Sharif University of Technology <sup>‡</sup>IPM



https://arxiv.org/abs/2305.07768



### **Talk Outline**

### **Motivation**

Venice

**Evaluation** 

**Summary** 

### **Venice: Summary**



## Venice Paper, Slides, Video [ISCA 2023]

 Rakesh Nadig, Mohammad Sadrosadati, Haiyu Mao, Nika Mansouri Ghiasi, Arash Tavakkol, Jisung Park, Hamid Sarbazi-Azad, Juan Gómez Luna, and Onur Mutlu, "Venice: Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses" Proceedings of the 50th International Symposium on Computer Architecture (ISCA), Orlando, FL, USA, June 2023.
 [arXiv version]
 [Slides (pptx) (pdf)]
 [Lightning Talk Slides (pptx) (pdf)]
 [Lightning Talk Video (3 minutes)]
 [Talk Video (14 minutes, including Q&A)]

### Venice: Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses

\*Rakesh Nadig<sup>§</sup> \*Mohammad Sadrosadati<sup>§</sup> Haiyu Mao<sup>§</sup> Nika Mansouri Ghiasi<sup>§</sup> Arash Tavakkol<sup>§</sup> Jisung Park<sup>§∇</sup> Hamid Sarbazi-Azad<sup>†‡</sup> Juan Gómez Luna<sup>§</sup> Onur Mutlu<sup>§</sup> <sup>§</sup>ETH Zürich <sup>∇</sup>POSTECH <sup>†</sup>Sharif University of Technology <sup>‡</sup>IPM

#### https://arxiv.org/pdf/2305.07768

# Venice Can Enable More Effective In-Storage Processing

# In-Storage Genomic Data Filtering [ASPLOS 2022]

Nika Mansouri Ghiasi, Jisung Park, Harun Mustafa, Jeremie Kim, Ataberk Olgun, Arvid Gollwitzer, Damla Senol Cali, Can Firtina, Haiyu Mao, Nour Almadhoun Alserr, Rachata Ausavarungnirun, Nandita Vijaykumar, Mohammed Alser, and Onur Mutlu,
 "GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis"
 Proceedings of the <u>27th International Conference on Architectural Support for</u> Programming Languages and Operating Systems (ASPLOS), Virtual, February-March 2022.
 [Lightning Talk Slides (pptx) (pdf)]

[Lightning Talk Video (90 seconds)]

### GenStore: A High-Performance In-Storage Processing System for Genome Sequence Analysis

Nika Mansouri Ghiasi<sup>1</sup> Jisung Park<sup>1</sup> Harun Mustafa<sup>1</sup> Jeremie Kim<sup>1</sup> Ataberk Olgun<sup>1</sup> Arvid Gollwitzer<sup>1</sup> Damla Senol Cali<sup>2</sup> Can Firtina<sup>1</sup> Haiyu Mao<sup>1</sup> Nour Almadhoun Alserr<sup>1</sup> Rachata Ausavarungnirun<sup>3</sup> Nandita Vijaykumar<sup>4</sup> Mohammed Alser<sup>1</sup> Onur Mutlu<sup>1</sup>

<sup>1</sup>ETH Zürich <sup>2</sup>Bionano Genomics <sup>3</sup>KMUTNB <sup>4</sup>University of Toronto

#### **SAFARI**

#### https://arxiv.org/abs/2202.10400

### GenStore

#### GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis

Nika Mansouri Ghiasi<sup>1</sup> Jisung Park<sup>1</sup> Harun Mustafa<sup>1</sup> Jeremie Kim<sup>1</sup> Ataberk Olgun<sup>1</sup> Arvid Gollwitzer<sup>1</sup> Damla Senol Cali<sup>2</sup> Can Firtina<sup>1</sup> Haiyu Mao<sup>1</sup> Nour Almadhoun Alserr<sup>1</sup> Rachata Ausavarungnirun<sup>3</sup> Nandita Vijaykumar<sup>4</sup> Mohammed Alser<sup>1</sup> Onur Mutlu<sup>1</sup>

<sup>1</sup>ETH Zürich <sup>2</sup>Bionano Genomics <sup>3</sup>KMUTNB <sup>4</sup>University of Toronto



https://arxiv.org/abs/2202.10400



# In-Storage Metagenomics [ISCA 2024]

 Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Mao, Joel Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, and Onur Mutlu,
 "MegIS: High-Performance and Low-Cost Metagenomic Analysis with In-Storage Processing"
 Proceedings of the 51st Annual International Symposium on Computer Architecture (ISCA), Buenos Aires, Argentina, July 2024.
 [Slides (pptx) (pdf)]
 [arXiv version]

### MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing

Nika Mansouri Ghiasi<sup>1</sup> Mohammad Sadrosadati<sup>1</sup> Harun Mustafa<sup>1</sup> Arvid Gollwitzer<sup>1</sup> Can Firtina<sup>1</sup> Julien Eudine<sup>1</sup> Haiyu Mao<sup>1</sup> Joël Lindegger<sup>1</sup> Meryem Banu Cavlak<sup>1</sup> Mohammed Alser<sup>1</sup> Jisung Park<sup>2</sup> Onur Mutlu<sup>1</sup> <sup>1</sup>ETH Zürich <sup>2</sup>POSTECH



### MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing

Nika Mansouri Ghiasi<sup>1</sup> Mohammad Sadrosadati<sup>1</sup> Harun Mustafa<sup>1</sup> Arvid Gollwitzer<sup>1</sup> Can Firtina<sup>1</sup> Julien Eudine<sup>1</sup> Haiyu Mao<sup>1</sup> Joël Lindegger<sup>1</sup> Meryem Banu Cavlak<sup>1</sup> Mohammed Alser<sup>1</sup> Jisung Park<sup>2</sup> Onur Mutlu<sup>1</sup> <sup>1</sup>ETH Zürich <sup>2</sup>POSTECH



https://arxiv.org/abs/2406.19113



# Can Enable Better Error Handling in SSDs



Proceedings of the IEEE, Sept. 2017

# Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives



This paper reviews the most recent advances in solid-state drive (SSD) error characterization, mitigation, and data recovery techniques to improve both SSD's reliability and lifetime.

By Yu CAI, SAUGATA GHOSE, ERICH F. HARATSCH, YIXIN LUO, AND ONUR MUTLU

#### https://arxiv.org/pdf/1706.08642





# Venice

# Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses

Rakesh Nadig\*, Mohammad Sadrosadati\*, Haiyu Mao, Nika Mansouri Ghiasi, Arash Tavakkol, Jisung Park, Hamid Sarbazi-Azad, Juan Gómez Luna, and Onur Mutlu

https://arxiv.org/abs/2305.07768









# Venice

# Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses

Onur Mutlu <u>omutlu@gmail.com</u>

https://people.inf.ethz.ch/omutlu

7 August 2024

FMS: the Future of Memory and Storage

SAFARI

ETH zürich



# **Backup Slides**



- <u>Scout Packet Structure</u>
- <u>Power Overhead</u>
- <u>Area Overhead</u>
- Evaluated Configurations
- <u>Workload Characteristics</u>
- <u>Mixed Workloads</u>
- <u>Results</u>
  - Throughput Analysis
  - Tail Latency
  - <u>Power Consumption</u>
  - <u>Performance on Mixed Workloads</u>
  - Sensitivity to Interconnection Network Configuration

### **Structure of a Scout Packet**



- A *scout packet* consists of two 8 bit flits, a header flit and a tail flit
- The flash controller sends a scout packet to identify a conflict-free path for the I/O request



# **Power Consumption (I)**



### • Router

- We implement the HDL and synthesize it using UMC 65nm technology node
- Router consumes 0.241mW for a 4KB page transfer

### Network Link

- ORION 3.0 power model tool
- Each network link consumes about 1.08mW for a 4KB page transfer
- Link capacitance is lower than bus capacitance -> 90% less power than that of the shared channel bus
  - Links are shorter and thinner than a shared bus
  - Two drivers in links compared to several drivers in a bus

| Component | # of Instances         | Avg. Power [mW]<br>for 4KB page transfer | Area                             |
|-----------|------------------------|------------------------------------------|----------------------------------|
| Router    | 1 per flash node       | 0.241                                    | 8% of flash chip area            |
| Link      | Up to 4 per flash node | 1.08                                     | $0.04 \times$ flash channel area |

### **Area Overhead**

- Router
  - Area overhead estimated using router's HDL model
  - Each router has
    - an area of 614  $\mu$ m<sup>2</sup> + 40 I/O
    - A total area of 8mm<sup>2</sup> -> 8% of a typical 100mm<sup>2</sup> flash chip

- Network Link
  - ORION 3.0 model for area analysis of network links
  - 112 network links for a 8x8 flash array configuration
  - 44% lower area than a baseline multi-channel shared bus architecture
  - Links are thinner and require lower pitch sizes



### **Evaluated Configurations**



| 240GB, Z-NAND [31, 99, 119],                              |  |  |
|-----------------------------------------------------------|--|--|
| 8-GB/s External I/O bandwidth (4-lane PCIe Gen4);         |  |  |
| 1.2-GB/s Flash Channel I/O rate                           |  |  |
| NAND Config: 8 channels, 8 chips/channel,                 |  |  |
| 1 die/chip, 2 planes/die, 128Gb die capacity,             |  |  |
| 1024 blocks/plane, 768 pages/block, 4KB page              |  |  |
| Latencies: Read(tR): 3µs; Erase (tBERS): 1ms              |  |  |
| Program (tPROG): 100µs                                    |  |  |
| 1TB, 3D TLC NAND Flash,                                   |  |  |
| 8-GB/s External I/O bandwidth (4-lane PCIe Gen4);         |  |  |
| 1.2-GB/s Flash Channel I/O rate                           |  |  |
| NAND Config: 8 channels, 8 chips/channel,                 |  |  |
| 1 die/chip, 2 planes/die, 1024 blocks/die, 16KB page      |  |  |
| Latencies: Read (tR): 45µs; Erase (tBERS): 3.5ms          |  |  |
| Program (tPROG): 650µs                                    |  |  |
| <b>Topology.</b> 8×8 2D mesh topology, 8-bit 1 GHz links, |  |  |
| One router next to each flash chip                        |  |  |
| <b>Router Architecture.</b> Two 8-bit buffers per port,   |  |  |
| 1 GHz frequency                                           |  |  |
| <b>Routing Algorithm.</b> Non-minimal fully-adaptive      |  |  |
| Switching. Circuit switching [102]                        |  |  |
|                                                           |  |  |

### **Workload Characteristics**



|                     | Traces   | Read % | Avg. Request<br>Size (KB) | Avg. Inter-request<br>Arrival Time (μs) |
|---------------------|----------|--------|---------------------------|-----------------------------------------|
|                     | hm_0     | 36     | 8.8                       | 58                                      |
|                     | mds_0    | 12     | 9.6                       | 268                                     |
|                     | proj_3   | 95     | 9.6                       | 19                                      |
|                     | prxy_0   | 3      | 7.2                       | 242                                     |
| MSR Cambridge [122] | rsrch_0  | 9      | 9.6                       | 129                                     |
| MSK Cambridge [122] | src1_0   | 56     | 43.2                      | 49                                      |
|                     | src2_1   | 98     | 59.2                      | 50                                      |
|                     | usr_0    | 40     | 22.8                      | 98                                      |
|                     | wdev_0   | 20     | 9.2                       | 162                                     |
|                     | web_1    | 54     | 29.6                      | 67                                      |
| VCSP [122]          | YCSB_B   | 99     | 65.7                      | 13                                      |
| YCSB [123]          | YCSB_D   | 99     | 62                        | 14                                      |
| Slacker [124]       | jenkins  | 94     | 33.4                      | 615                                     |
| Slackel [124]       | postgres | 82     | 13.3                      | 382                                     |
|                     | LUN0     | 76     | 20.4                      | 218                                     |
| SYSTOR '17 [125]    | LUN2     | 73     | 16                        | 320                                     |
|                     | LUN3     | 7      | 7.7                       | 3127                                    |
| VCSP DealerDP [196] | ssd-00   | 91     | 90                        | 5                                       |
| YCSB RocksDB [126]  | ssd-10   | 99     | 11.5                      | 2                                       |



| Mix  | Constituent<br>Workloads [122, 123] | Description                                                                                      | Avg. Inter-request<br>Arrival Time (μs) |
|------|-------------------------------------|--------------------------------------------------------------------------------------------------|-----------------------------------------|
| mix1 | src2_1 and proj_3                   | Both workloads are read-intensive                                                                | 5.8                                     |
| mix2 | src2_1, proj_3 and YCSB_D           | All three workloads are read-intensive                                                           | 8.4                                     |
| mix3 | prxy_0 and rsrch_0                  | Both workloads are write-intensive                                                               | 93                                      |
| mix4 | prxy_0, rsrch_0 and mds_0           | All three workloads are write-intensive                                                          | 56                                      |
| mix5 | prxy_0 and src2_1                   | <pre>prxy_0 is write-intensive and<br/>src2_1 is read-intensive</pre>                            | 5                                       |
| mix6 | prxy_0, src2_1 and usr_0            | prxy_0 is write-intensive,<br>src2_1 is read-intensive and<br>usr_0 has 60% writes and 40% reads | 3                                       |

### **SSD Throughput Analysis**



### • Performance-Optimized SSD



Cost-Optimized SSD



# **SSD Throughput Analysis**



### • Performance-Optimized SSD



### Venice improves SSD throughput over prior approaches by effectively mitigating path conflicts



# **Tail Latency**



 Comparison of tail latencies in the 99th percentile of I/O requests



# **Tail Latency**



• Comparison of tail latencies in the 99th percentile of I/O requests



# **Power Consumption (II)**





### Venice reduces the average power consumption by 4% over Baseline SSD

### **Performance on Mixed Workloads**





Venice outperforms prior approaches on high-intensity mixed workloads by effectively mitigating path conflicts

### **Sensitivity to Network Configurations**





Venice provides higher performance improvement for 8x8 compared to 4x16 and 16x4

### **Prior Approaches to Address Path Conflicts**

- Network-On-SSD [2]
  - Replaces a multi-channel shared bus architecture with an interconnection network of flash chips
  - Significantly increases path diversity than a typical SSD



### **Prior Approaches to Address Path Conflicts**

- Network-On-SSD [2]
  - Replaces a multi-channel shared bus architecture with an interconnection network of flash chips
  - Significantly increases path diversity than a typical SSD

Network-On-SSD's simple routing algorithm fails to mitigate path conflicts in SSDs

Taraet Chin

