Hyperscale Composable Memory Systems with

## Dynamically Adjusting Compressed Tier

Presenter: Angelos Arelakis, CTO Nilesh Shah, VP Business Development ZeroPoint Technologies



# Challenge

Hyperscalers spending significant \$\$ on software based compression



4.6%

3%

**Google** Cloud



#### Hyperscaler requirement: hardware compression is a MUST-have



Hyperscale CXL Tiered Memory Expander Specification

Revision 1

Version 1.0 Base Specification Template v1.2 Effective October 27, 2023



## Opportunity Add Compressed CXL Memory Tier

**Deploy New Tiers:** Ordinary + Compressed DRAM memory on CXL



|   | Benefit                                             | Spec Component                                                            |  |  |
|---|-----------------------------------------------------|---------------------------------------------------------------------------|--|--|
| 1 | <b>Reduction</b> in total cost of ownership         | Standardization, Hardware accelerated,<br>Lossless Compressed Memory Tier |  |  |
| 2 | Energy <b>Efficiency</b> ,<br><b>Sustainability</b> | Transparent Hardware accelerated Compression                              |  |  |
| 3 | <b>Preserve</b> Software<br>Investments             | Support for legacy Compression<br>Algorithms                              |  |  |

Removes barriers, enables diversity of Hyperscalers + Enterprise Customers



### Requirement: Hyperscale CXL Tiered Memory Expander Spec

#### **Preserving Software Investment, without** compromising performance

- Support for Legacy Compression Algorithms
- Future Proof, supports Algorithm Innovation Stringent Latency & Bandwidth Specifications

| Parameter                      | Specification |  |  |
|--------------------------------|---------------|--|--|
| Latency Uncompressed<br>Access | 90 to 150ns   |  |  |
| Latency Compressed<br>Access   | 250ns to <1us |  |  |
| Bandwidth Efficiency           | 80% / 75%     |  |  |
| Read only/ Write only          |               |  |  |

### **Stringent Requirements**

https://www.opencompute.org/documents/hyperscale-cxl-tiered-memory-expander-for-ocp-basespecification-1-pdf



### Hyperscale CXL Tiered Memory Expander Spec



### CXL Controller with integrated Hardware Acceleration: (De)Compression+ Compaction + Transparent memory management

# The ZeroPoint IP Solution Results

- OCP Spec compliant Hardware Accelerated CXL memory (De)Compression + Compaction + Transparent memory management IP block
- **2-4x** transparent (de)compression major Datacenter workloads
- Protocol: CXL 2.0, 3.0, 3.1
- Compression Algorithms: LZ4, ZID (proprietary)
- Portable: AXI4, CHI, Leading process node support

**OCP Spec Compliant, portable IP** 



#### Spec compliant Plug and Play (De)Compression IP

| IP Solution Performance Characteristics         | Value  |
|-------------------------------------------------|--------|
| Compression Ratio                               | 2-3X   |
| Block cache (SRAM) hit latency                  | <30ns  |
| Cache line in uncompressed region latency       | <90ns  |
| Cache line in an uncompressed block latency     | <150ns |
| Cache line in a compressed block latency        | <250ns |
| Tail latency [cache line in a compressed block] | <1us   |
| Decompress bandwidth[4x 1867MT/s]               | >46G/s |

### Reduce Data Center TCO 20-25%

Total Cost of Ownership (TCO) for 40 server rack over 3y lifetime [kUSD]





## Demo

| <b>©</b>                        | Q Search or jump to                                         | e ctrl+k |                                | +~   👁 🔉 🍪                      |  |  |  |
|---------------------------------|-------------------------------------------------------------|----------|--------------------------------|---------------------------------|--|--|--|
| 🚍 Home > Dashboards > SC'23 🌟 🥰 |                                                             |          | ⊪⊮ Add ~ 🖹 🛞 (                 | D now-15m to now-10s 🗸 🔾 5s 🗸 🗼 |  |  |  |
|                                 |                                                             |          |                                | ÷                               |  |  |  |
| DenseMem                        | DenseMem <sup>™</sup> : Hardware-Compressed CXL Memory Tier |          |                                |                                 |  |  |  |
| Memory Usage                    | Current Benchmark                                           |          | Memory Usage                   |                                 |  |  |  |
|                                 | 1                                                           | None     |                                |                                 |  |  |  |
|                                 |                                                             |          |                                | No data                         |  |  |  |
| No data                         |                                                             | data     |                                |                                 |  |  |  |
|                                 |                                                             | uala     | Live Compression Ratio         |                                 |  |  |  |
|                                 |                                                             |          |                                | No data                         |  |  |  |
| Compression Ratio               | Compressed Memory Manager Reque                             | sts      | DenseMem Page Cache Hit Ratio  |                                 |  |  |  |
|                                 |                                                             |          |                                |                                 |  |  |  |
|                                 |                                                             | No data  |                                | data                            |  |  |  |
| 14:55 15:00 15:05               |                                                             |          |                                | adua                            |  |  |  |
| — FPGA (Live)                   |                                                             |          |                                |                                 |  |  |  |
| QEMU Line Cache Throughput      | DenseMem Page Cache Throughput                              |          | DenseMem Compressor / Decompre | ssor Throughput                 |  |  |  |
|                                 |                                                             |          |                                |                                 |  |  |  |
| No data                         | No data                                                     | No data  | 16 MiB/s                       |                                 |  |  |  |
|                                 |                                                             |          | 0 B/s 14:55                    | 15:00 15:05                     |  |  |  |
|                                 |                                                             |          | Emulation Platform Limit       |                                 |  |  |  |



# Performance : Datacenter workloads





# APIs

- [.io] Telemetry: Remaining actual free space (not DPA)
- [.io] Config: 4 capacity watermarks {High, ..., Critically Low}
- [.io] MSI/-X interrupts: The IP raises interrupts to the Host as the watermarks are crossed. Internally it reacts:
  - write back-pressure delay mechanism (effectively delay write credits)
  - tune compressor to favor CR over bandwidth
  - defragmentation
- [.io] Command: Freeing capacity (aka overwrite DPA pages with 0s)

### Host SW leverage IP Solution APIs to manage migrations



# Call To Action

### • Summary

- OCP Spec Compliant IP Solution
- Portable across process nodes, AXI/CHI interface
- Performance verified

### Call To Action

- Controller manufacturers: Collaboratively address Hyperscale OCP requirements
- ISVs: Host software integration, joint workload demo