Computer Memory Basics

Hardware
Published

August 26, 2016

Modified

May 13, 2019

Memory systems a major factor in determining performance of a computer:

Addresses

  • Memory cell (electronic circuit)…store one bit of binary data
  • Memory address…reference to a specific memory location
    • Usually several memory cells share a single address (e.g. 8bits/1byte)
    • The address width limits the maximum addressable memory
    • The address width is typically a multiple of eight (8,16,32,64 bits)
Address width Address locations
8bit 256 (2^8)
16bit 65536 (2^16)
32bit 4294967296 (2^32)
64bit 1.844674407×10^19 (2^64)
  • Memory is a collection of various memory locations
    • Each location has a unique address which can be accessed in any order (in equal amount of time)
    • Memory access means selection and data read/write from a specific memory location
  • Memory controller…manages data flow (read/write) between main memory and processor
  • Memory address bus…connects the main memory to the memory controller

Programmers see virtual memory provided by the system (OS + hardware):

  • Simplified abstraction of memory for the program providing the illusion of “infinite” memory
  • The system manages the physical memory space transparent to the programmer by mapping virtual memory addresses to the limited physical memory
  • Example for the programmer/(micro) architecture trade-off

CPU Cache

…used to avoid repeated access to main memory (typically DRAM):

  • Automatically managed memory hierarchy (Level 1,2,3) (typically SRAM)
  • Stores frequently used data and is commonly on-die with an associated CPU

Blocks

Memory logically divided into fixed-size blocks…

  • …block (or line)
    • …minimum unit of information
    • …either present or net present in a cache
    • …block maps to a location in cache
    • …determined by the index bits in the address
  • Cache…
    • hit …use cached data instead if accessing next level memory
    • miss …data not cached, read block from next level memory
  • Hit ratio …percentage of accesses that result in cache hits
  • Miss rate …(1-hit rate) …fraction of memory accesses not found
  • Hit time…
    • …time to access a level of the memory hierarchy
    • …includes time to determine hit/miss
  • Miss penalty…
    • …time to replace a block in the upper level…
    • …with a block from the lower level

AMAT (Average Memory Access Time) …metric to analyze memory system performance

Locality

  • …ensures data required by processor kept in fast(er) level(s):
    • …recently accessed and adjacent data
    • …automatically in fast memory close to processor
  • temporal locality…
    • …based on repetitive computations
    • …e.g. loops …referencing the same memory
  • spatial locality…
    • …based on a probability of related computations
    • …referencing a cluster of memory (e.g. array)

Associativity

Caches fall into three categories:

  • Direct-mapped
    • Each memory location maps into one and only one cache block
    • Fast, simple, inefficient
    • Maximum cache misses
  • Fully associative
    • Any memory location can map to anywhere in the cache
    • Slow, complex, efficient
    • Perfect replacement policy (no cache misses)
  • N-way set associative
    • Groups of blocks “sets” from associative pools
    • A compromise between simplicity and efficiency
    • Reduces cache misses

Types of cache misses:

  • Compulsory (start miss): First access to a block, must be brought into the cache
  • Capacity: Blocks are being discarded to free space
  • Conflict (collision/interference miss): Occurs when several memory locations are mapped to the same cache block

Replacement policy: Heuristic used to select the entry to replaced by uncached data (LRU (Least Recently Used))

lshw

Display the cache/memory hierarchy with lshw:

>>> sudo lshw -C memory -short
H/W path           Device      Class       Description
======================================================
/0/0                           memory      64KiB BIOS
/0/400/700                     memory      256KiB L1 cache
/0/400/701                     memory      1MiB L2 cache
/0/400/704                     memory      8MiB L3 cache
/0/1000                        memory      12GiB System Memory
/0/1000/0                      memory      4GiB DIMM DDR3 1066 MHz (0.9 ns)
/0/1000/1                      memory      4GiB DIMM DDR3 1066 MHz (0.9 ns)
/0/1000/2                      memory      4GiB DIMM DDR3 1066 MHz (0.9 ns)
/0/1000/3                      memory      DIMM DDR3 Synchronous [empty]
/0/1000/4                      memory      DIMM DDR3 Synchronous [empty]
/0/1000/5                      memory      DIMM DDR3 Synchronous [empty]

lstopo

Show the topology of the system…

Textual rendering…

lstopo-no-graphics -.ascii

Types

Volatile Memory

  • SRAM (Static Random Access Memory) - Two cross coupled inverters store a bit persistent (while powered)
    • Faster access (no capacitor), no refresh needed, access time close to cycle time
    • Lower density (6-8 transistors per bit), higher cost
    • Minimal power to retain charge in standby mode
    • Manufacturing compatible with logic process, typically integrated with the processor chip
  • DRAM (Dynamic Random Access Memory) - Capacitor charge state indicates stored value, cells lose charge over time requiring a refresh
    • Slower access (capacitor)
    • Higher density (1 transistor + 1 capacitor per bit), lower cost
    • Requires periodic refresh (read + write), (costs power, performance, circuitry)
    • Manufacturing requires capacitors and logic
  • SDRAM (Synchronous DRAM) - Uses a clock to eliminate the time memory and processor need to synchronize
    • Bandwidth improved by internal organization into multiple banks each with its own row buffer
    • Banks allow simultaneous read/write calle address interleaving
    • Fastest version called DDR (Double Data Rare SDRAM), data transfer at rising & falling edges of the clock

Persistent Memory

Persistent Memory (PM, pmem), aka SCM (Storage Class Memory):

  • Bridge the access-time gap between DRAM and NAND based flash-storage
    • Introducing a third tier in the memory hierarchy
    • Connected to the system memory bus (like DRAM DIMMs) via NVDIMMs
    • Accessed like volatile memory (processor load/store instructions)
  • Change in computing architecture…
Access time Description
1ns processor operation
<5ns read L2 cache
60ns access volatile memory (DRAM)
<<1us access persistant memory (NVM)
20us read from flash memory (NAND)
1ms random write to flash memory
<10ms read/write disk
40s read tape
  • Requires an [NVM Programming Model][pmem]
    • New block and file semantics to applications
    • Exposed as memory-mapped file by the operating system
  • Persistent memory aware file-system allows DAX (Direct Access) without using (bypass) the system page cache (unlike normal storage-based files)
    • Application has direct load/store access to persistence via the MMU
    • No interrupts or kernel context switches
    • OS (only) flushs CPU caches to get data into the persistence domain

NVM (Non-Volatile Memory), NVRAM (Non-Volatile RAM):

  • RRAM/ReRAM (Resistive Random-Access Memory)
    • Uses a dielectric solid-state material aka memristor
    • In development by multiple companies…
    • Scalable below 30nm, cycle time <10ns
  • Others…
    • CBRAM (Conductive-Bridging RAM)
    • PRAM (Phase-Change Memory)
    • MRAM (Magnetoresistive RAM)
    • FeRAM (Ferroelectric RAM)
    • STTRAM (Spin Torque Transfer RAM)
    • SHERAM (Spin Hall Effect RAM)
    • CNTRAM (Carbon-nanotube RAM)
  • Products:
    • 3D XPoint (Intel, Micron), called Intel Optane

DIMM Modules

Memory sold in small boards called DIMM (Dual Inline Memory Module)…

  • …typically contains 4-16 DRAMs chips
  • …normally organized to be 8 bytes wide
  • …variants of DIMM slots (i.e. DDR3 or DDR4) have different pin counts
  • ECC (Error-Correcting code) DIMMs have extra circuitry to detect/correct errors

Following a list of common RAM chips and their throughput:

Standard Chip GB/s
SDRAM (1993) SDR-66 0.53
SDR-133 0.8
DDR (1996) DDR-200 1.6
DDR-266 2.13
DDR2 (2003) DDR2-400 3.2
DDR2-800 6.4
DDR3 (2007) DDR3-1600 12.8
DDR3-1866 14.93
DDR4 (2012) DDR4-2133 17
DDR4-3200 24
DDR5 (2020) DDR5-4800 41.6
DDR5-5200 44.8
DDR5-6400 54.4
DDR5-6800 57.6

NVDIMM

NVDIMMs types:

  • NVDIMM-F - Flash only paired with DRAM DIMM
  • NVDIMM-N - Flash and DRAM together in the same DIMM
  • NVDIMM-P - True persistant memory (no DRAM/flash)

Supported modes (use ndctl for management):

  • Raw /dev/pemmN (block devices)
    • Default mode after installation
    • Supports file-systems with or without DAX (ext4,xfs)
  • Sector /dev/pemmNs (block device with sector atomicity)
    • Implemented with BTT (Block Translation Table)
    • Guarantees power-fail write atomicity
    • Only supports file-systems without DAX
  • Memory /dev/pemmN (block device supporting device DMA)
    • Supports file-system DAX, Recommended over raw mode
    • Requires storing extra “struct page” entries on regular system memory (or persistent memory)
  • DAX /dev/daxN.M(character device supporting DAX)
    • Allows memory allocation/mapping (without the need of a file-system)
    • No interactions with the kernel page cache
    • Character device (does not support a file-system)
    • Requires storing extra “struct page” entries on persistent memory

dmidecode

Display the memory vendor, identification numbers, and type

>>> dmidecode --type memory | egrep "Manufacturer|Serial|Part|Type" 
    Error Correction Type: Multi-bit ECC
    Type: DDR3
    Type Detail: Synchronous Registered (Buffered)
    Manufacturer: Samsung    
    Serial Number: 35244B2E
    Part Number: M393B2G70BH0-YK0
...

Maximum RAM capacity can be checked with dmidecode. The “Maximum Capacity” is the maximum RAM supported by your system, while “Number of Devices” is the number of memory (DIMM) slots available on your computer.

>>> dmidecode -t 16 | egrep "Capacity|Devices"
    Maximum Capacity: 384 GB
    Number Of Devices: 32

Check the memory support matrix for the system board to understand the correct DIMM distribution and their corresponding memory frequencies.

Frequency & Voltage

Check the memory speed with lshw (package lshw):

>>> lshw -short -C memory | grep DIMM
/0/1b/0                     memory     16GiB DIMM DDR3 Synchronous 800 MHz (1.2 ns)
...

Details about voltage and maximum memory frequency with decode-dimms from the Debian package i2c-tools:

>>> modprobe eeprom
>>> decode-dimms
[…]
Fundamental Memory type                         DDR3 SDRAM
Module Type                                     RDIMM
[…]
Maximum module speed                            1600MHz (PC3-12800)
Size                                            16384 MB
[…]
Operable voltages                               1.5V, 1.35V

HBM

HBM - (High-Bandwidth Memory) …standardized stacked memory technology

Standard Date Bandwidth¹ Stack² Size³
HBM 2013 256 GB/s
HBM2 2016 307 GB/s 4 8 GB
HBM2e 2020 460 GB/s 8 16 GB
HBM3 2022 819 GB/s 12 24 GB

¹ max bandwidth per package
² max number of memory dies in stack
³ max capacity per package

Increase memory interface performance…

  • improves bandwidth…access times…transfer rates
  • …more power-efficient in terms of bits per watt
    • …GDDR5…10.66GB/s per watt
    • …HBM2e…35GB/s per watt
  • …no fundamental change in the underlying memory technology

Utilizes 3D manufacturing technology…

  • 2.5D packaging solution
  • …stacks of DRAM chips on top of a bus interface
  • …placed side-by-side on top of an silicon interposer
  • …interposer acts as the bridge between the chips and a board
  • …requires the fabrication of what is basically a PCB in silicon
    • …brings logic closer to the memory, enabling more bandwidth
    • …comes with thermal management challenges

Capacity…

  • …limited compared to DRAM accessed through DDR
  • …memory defined is cubes…
    • …defined height…4,8,12 or 16 (with HBM3)
    • …defined number of data channels (64/128 bits)
  • …limited number of HBM dies can fit around the SoC
  • …HBM capacities can not rival the capacity of DDR