Computer Memory Basics
Memory systems a major factor in determining performance of a computer:
- Programs exhibit temporal locality (tendency to reuse data accessed recently)…
- …and spatial locality (tendency to reference data close to recently used data)
- Memory hierarchies take advantage of temporal/spatial locality while moving data…
- …between fast/small upper level memory and big/slow lower level memory
Addresses
- Memory cell (electronic circuit)…store one bit of binary data
- Memory address…reference to a specific memory location
- Usually several memory cells share a single address (e.g. 8bits/1byte)
- The address width limits the maximum addressable memory
- The address width is typically a multiple of eight (8,16,32,64 bits)
Address width | Address locations |
---|---|
8bit | 256 (2^8) |
16bit | 65536 (2^16) |
32bit | 4294967296 (2^32) |
64bit | 1.844674407×10^19 (2^64) |
- Memory is a collection of various memory locations
- Each location has a unique address which can be accessed in any order (in equal amount of time)
- Memory access means selection and data read/write from a specific memory location
- Memory controller…manages data flow (read/write) between main memory and processor
- Memory address bus…connects the main memory to the memory controller
Programmers see virtual memory provided by the system (OS + hardware):
- Simplified abstraction of memory for the program providing the illusion of “infinite” memory
- The system manages the physical memory space transparent to the programmer by mapping virtual memory addresses to the limited physical memory
- Example for the programmer/(micro) architecture trade-off
CPU Cache
…used to avoid repeated access to main memory (typically DRAM):
- Automatically managed memory hierarchy (Level 1,2,3) (typically SRAM)
- Stores frequently used data and is commonly on-die with an associated CPU
Blocks
Memory logically divided into fixed-size blocks…
- …block (or line)
- …minimum unit of information
- …either present or net present in a cache
- …block maps to a location in cache
- …determined by the index bits in the address
- Cache…
- …hit …use cached data instead if accessing next level memory
- …miss …data not cached, read block from next level memory
- Hit ratio …percentage of accesses that result in cache hits
- Miss rate …(1-hit rate) …fraction of memory accesses not found
- Hit time…
- …time to access a level of the memory hierarchy
- …includes time to determine hit/miss
- Miss penalty…
- …time to replace a block in the upper level…
- …with a block from the lower level
AMAT (Average Memory Access Time) …metric to analyze memory system performance
Locality
- …ensures data required by processor kept in fast(er) level(s):
- …recently accessed and adjacent data
- …automatically in fast memory close to processor
- …temporal locality…
- …based on repetitive computations
- …e.g. loops …referencing the same memory
- …spatial locality…
- …based on a probability of related computations
- …referencing a cluster of memory (e.g. array)
Associativity
Caches fall into three categories:
- Direct-mapped
- Each memory location maps into one and only one cache block
- Fast, simple, inefficient
- Maximum cache misses
- Fully associative
- Any memory location can map to anywhere in the cache
- Slow, complex, efficient
- Perfect replacement policy (no cache misses)
- N-way set associative
- Groups of blocks “sets” from associative pools
- A compromise between simplicity and efficiency
- Reduces cache misses
Types of cache misses:
- Compulsory (start miss): First access to a block, must be brought into the cache
- Capacity: Blocks are being discarded to free space
- Conflict (collision/interference miss): Occurs when several memory locations are mapped to the same cache block
Replacement policy: Heuristic used to select the entry to replaced by uncached data (LRU (Least Recently Used))
lshw
Display the cache/memory hierarchy with lshw
:
>>> sudo lshw -C memory -short
H/W path Device Class Description
======================================================
/0/0 memory 64KiB BIOS
/0/400/700 memory 256KiB L1 cache
/0/400/701 memory 1MiB L2 cache
/0/400/704 memory 8MiB L3 cache
/0/1000 memory 12GiB System Memory
/0/1000/0 memory 4GiB DIMM DDR3 1066 MHz (0.9 ns)
/0/1000/1 memory 4GiB DIMM DDR3 1066 MHz (0.9 ns)
/0/1000/2 memory 4GiB DIMM DDR3 1066 MHz (0.9 ns)
/0/1000/3 memory DIMM DDR3 Synchronous [empty]
/0/1000/4 memory DIMM DDR3 Synchronous [empty]
/0/1000/5 memory DIMM DDR3 Synchronous [empty]
lstopo
Show the topology of the system…
- …examine the NUMA topology
- …provide details about processor, caches and memory
- Documentation …https://www.open-mpi.org/projects/hwloc/doc/
Textual rendering…
lstopo-no-graphics -.ascii
Types
Volatile Memory
- SRAM (Static Random Access Memory) - Two cross coupled inverters store a bit persistent (while powered)
- Faster access (no capacitor), no refresh needed, access time close to cycle time
- Lower density (6-8 transistors per bit), higher cost
- Minimal power to retain charge in standby mode
- Manufacturing compatible with logic process, typically integrated with the processor chip
- DRAM (Dynamic Random Access Memory) - Capacitor charge state indicates stored value, cells lose charge over time requiring a refresh
- Slower access (capacitor)
- Higher density (1 transistor + 1 capacitor per bit), lower cost
- Requires periodic refresh (read + write), (costs power, performance, circuitry)
- Manufacturing requires capacitors and logic
- SDRAM (Synchronous DRAM) - Uses a clock to eliminate the time memory and processor need to synchronize
- Bandwidth improved by internal organization into multiple banks each with its own row buffer
- Banks allow simultaneous read/write calle address interleaving
- Fastest version called DDR (Double Data Rare SDRAM), data transfer at rising & falling edges of the clock
Persistent Memory
Persistent Memory (PM, pmem), aka SCM (Storage Class Memory):
- Bridge the access-time gap between DRAM and NAND based flash-storage
- Introducing a third tier in the memory hierarchy
- Connected to the system memory bus (like DRAM DIMMs) via NVDIMMs
- Accessed like volatile memory (processor load/store instructions)
- Change in computing architecture…
Access time | Description |
---|---|
1ns | processor operation |
<5ns | read L2 cache |
60ns | access volatile memory (DRAM) |
<<1us | access persistant memory (NVM) |
20us | read from flash memory (NAND) |
1ms | random write to flash memory |
<10ms | read/write disk |
40s | read tape |
- Requires an [NVM Programming Model][pmem]
- New block and file semantics to applications
- Exposed as memory-mapped file by the operating system
- Persistent memory aware file-system allows DAX (Direct Access) without using (bypass) the system page cache (unlike normal storage-based files)
- Application has direct load/store access to persistence via the MMU
- No interrupts or kernel context switches
- OS (only) flushs CPU caches to get data into the persistence domain
NVM (Non-Volatile Memory), NVRAM (Non-Volatile RAM):
- RRAM/ReRAM (Resistive Random-Access Memory)
- Uses a dielectric solid-state material aka memristor
- In development by multiple companies…
- Scalable below 30nm, cycle time <10ns
- Others…
- CBRAM (Conductive-Bridging RAM)
- PRAM (Phase-Change Memory)
- MRAM (Magnetoresistive RAM)
- FeRAM (Ferroelectric RAM)
- STTRAM (Spin Torque Transfer RAM)
- SHERAM (Spin Hall Effect RAM)
- CNTRAM (Carbon-nanotube RAM)
- Products:
- 3D XPoint (Intel, Micron), called Intel Optane
DIMM Modules
Memory sold in small boards called DIMM (Dual Inline Memory Module)…
- …typically contains 4-16 DRAMs chips
- …normally organized to be 8 bytes wide
- …variants of DIMM slots (i.e. DDR3 or DDR4) have different pin counts
- …ECC (Error-Correcting code) DIMMs have extra circuitry to detect/correct errors
Following a list of common RAM chips and their throughput:
Standard | Chip | GB/s |
---|---|---|
SDRAM (1993) | SDR-66 | 0.53 |
SDR-133 | 0.8 | |
DDR (1996) | DDR-200 | 1.6 |
DDR-266 | 2.13 | |
DDR2 (2003) | DDR2-400 | 3.2 |
DDR2-800 | 6.4 | |
DDR3 (2007) | DDR3-1600 | 12.8 |
DDR3-1866 | 14.93 | |
DDR4 (2012) | DDR4-2133 | 17 |
DDR4-3200 | 24 | |
DDR5 (2020) | DDR5-4800 | 41.6 |
DDR5-5200 | 44.8 | |
DDR5-6400 | 54.4 | |
DDR5-6800 | 57.6 |
NVDIMM
NVDIMMs types:
- NVDIMM-F - Flash only paired with DRAM DIMM
- NVDIMM-N - Flash and DRAM together in the same DIMM
- NVDIMM-P - True persistant memory (no DRAM/flash)
Supported modes (use ndctl
for management):
- Raw
/dev/pemmN
(block devices)- Default mode after installation
- Supports file-systems with or without DAX (ext4,xfs)
- Sector
/dev/pemmNs
(block device with sector atomicity)- Implemented with BTT (Block Translation Table)
- Guarantees power-fail write atomicity
- Only supports file-systems without DAX
- Memory
/dev/pemmN
(block device supporting device DMA)- Supports file-system DAX, Recommended over raw mode
- Requires storing extra “struct page” entries on regular system memory (or persistent memory)
- DAX
/dev/daxN.M
(character device supporting DAX)- Allows memory allocation/mapping (without the need of a file-system)
- No interactions with the kernel page cache
- Character device (does not support a file-system)
- Requires storing extra “struct page” entries on persistent memory
dmidecode
Display the memory vendor, identification numbers, and type
>>> dmidecode --type memory | egrep "Manufacturer|Serial|Part|Type"
Error Correction Type: Multi-bit ECC
Type: DDR3
Type Detail: Synchronous Registered (Buffered)
Manufacturer: Samsung
Serial Number: 35244B2E
Part Number: M393B2G70BH0-YK0
...
Maximum RAM capacity can be checked with dmidecode
. The “Maximum Capacity” is the maximum RAM supported by your system, while “Number of Devices” is the number of memory (DIMM) slots available on your computer.
>>> dmidecode -t 16 | egrep "Capacity|Devices"
Maximum Capacity: 384 GB
Number Of Devices: 32
Check the memory support matrix for the system board to understand the correct DIMM distribution and their corresponding memory frequencies.
Frequency & Voltage
Check the memory speed with lshw (package lshw):
>>> lshw -short -C memory | grep DIMM
/0/1b/0 memory 16GiB DIMM DDR3 Synchronous 800 MHz (1.2 ns)
...
Details about voltage and maximum memory frequency with decode-dimms
from the Debian package i2c-tools:
>>> modprobe eeprom
>>> decode-dimms
[…]
Fundamental Memory type DDR3 SDRAM
Module Type RDIMM
[…]
Maximum module speed 1600MHz (PC3-12800)
Size 16384 MB
[…]
Operable voltages 1.5V, 1.35V
HBM
HBM - (High-Bandwidth Memory) …standardized stacked memory technology
Standard | Date | Bandwidth¹ | Stack² | Size³ |
---|---|---|---|---|
HBM | 2013 | 256 GB/s | ||
HBM2 | 2016 | 307 GB/s | 4 | 8 GB |
HBM2e | 2020 | 460 GB/s | 8 | 16 GB |
HBM3 | 2022 | 819 GB/s | 12 | 24 GB |
¹ max bandwidth per package
² max number of memory dies in stack
³ max capacity per package
Increase memory interface performance…
- …improves bandwidth…access times…transfer rates
- …more power-efficient in terms of bits per watt
- …GDDR5…10.66GB/s per watt
- …HBM2e…35GB/s per watt
- …no fundamental change in the underlying memory technology
Utilizes 3D manufacturing technology…
- …2.5D packaging solution
- …stacks of DRAM chips on top of a bus interface
- …placed side-by-side on top of an silicon interposer
- …interposer acts as the bridge between the chips and a board
- …requires the fabrication of what is basically a PCB in silicon
- …brings logic closer to the memory, enabling more bandwidth
- …comes with thermal management challenges
Capacity…
- …limited compared to DRAM accessed through DDR
- …memory defined is cubes…
- …defined height…4,8,12 or 16 (with HBM3)
- …defined number of data channels (64/128 bits)
- …limited number of HBM dies can fit around the SoC
- …HBM capacities can not rival the capacity of DDR