Central Processing Unit (CPU)

Hardware

Published

April 29, 2014

Modified

February 20, 2024

Architecture

A computer is a machine to process information (data):

A processor (typically an integrated circuit) performs operations on data
It processes input information to generate a desired output information
Input/output data loaded/stored from/to memory a data storage area
Electronic devices operate binary signals (electricity on/off)
Binary expressed by two symbols - 0 and 1 ➜ binary digits (bits)
Numbers, text represented as binary patterns ➜ combinations of zeros and ones

Example binary patterns (8bit ASCII encoding):

Binary	Character
0011 0000	0 (zero)
0011 0010	1 (one)
0011 0010	2 (two)
0100 0001	A
0100 0010	B
0100 0100	C

Machine Code

Machine code (machine language) ➜ instructions executed by a processor

Instruction ➜ operation code (opcode), operand
Operand ➜ (memory address to the) data to operate on
Opcodes & operands encoded as binary code
Machine code programm ➜ sequence of instructions (opcodes & operands)
Assembly language
- Symbolic representation of machine instructions
- Symbolic names for opcodes ➜ mnemonic codes
Assembler ➜ translates assembly language into machine code

Example machine operations code (Intel 8085):

Opcode (binary)	Mnemonic	Description
1000 0111	ADD	Add contents of register to accumulator
0011 1010	LDA	Load data from memory address
0011 0010	STA	Store data to memory address
0111 1001	MOV	Move data from between registers
1100 0011	JMP	Jump to memory address

Object Code

Object code is a sequence of instructions in machine code generated by an assembler

Executable programs are typically build from reusable code fragments (sub-program, function/modules)
- Fragments get individually compiled (translated) into object code
- A complete program it then build by combining various object code fragments
- Individual fragments are referenced using a symbol (function name)
Object file (relocatable format machine code)
- File format use to store object code and related data (e.g. ELF)
- Structured as separated segments/sections of different types of data
A linker program combines object code to generate executable machine code
- Relocation assigns load addresses to various object code fragments
- A linker resolves symbols using assigned memory locations and patching the calling object code to that location (call instruction reference)
A loader places executable machine code into (main) memory and prepares it for execution
- Allocates regions in memory corresponding to segments in the machine code
- A program loader is part of a computer operating system (starts the program once its loader)
- A microcontroller typically do not have a loader, instead the executable machine code is starter directly from memory

Object Files

Object files contain five kinds of information

Header: Metadata, code size, format specification, etc.
Object code: Binary instructions and data generated by an assembler(/compiler)
Relocation: List of places in the object code a linker needs to change the address of
Symbols: Global symbols defined in the module, symbols to be imported from other modules
Debugging: Information about the object code need by the linker for debugging

ELF (Executable and Linking Format) file com in three flavors:

Relocatable: Create by assemblers(/compilers), need to processed by a linker
Executable: Address relocation done, symbols resolved (except for shared library symbols), ready for execution
Shared object: Shared libraries including symbol information for linkers and executable code for run-time

Processor

A processing unit, aka CPU (Central Processing Unit):

Active part of a computer ➜ datapath and control
- Control ➜ Commands the datapath, memory, and input/output devices according to machine instructions
- Datapath ➜ Components of a processor that perform arithmetic operations
Processors ➜ fetch (read) instruction from memory before instruction execute

Separation of processor and memory distinguishes programmable computers

Control-flow Architecture

Stored program computer: instructions and data stored in memory

Harvard architecture: Separate memory for data and instructions
- Two sets of address/data buses between processor and memory
- Allow simultaneous memory fetches
Modified Harvard architecture: Separate memory for data and instructions
- Instruction memory can be used to store data
- Two pieces of data can be loaded in parallel
Von Neumann architecture: Single memory holds data and instructions
- Single set of address/data buses between processor and memory
- Values in memory interpreted depending on a control signal
Current instruction identified by the instruction pointer (program counter)
Sequential instruction processing (fetch, execute, and complete) one at a time
The instruction pointer is advanced sequentially except for control transfer
Instructions executed in control flow order

Data-flow Architecture

Instructions executed based on the availability of input arguments, data flow order
Conceptually no instruction pointer required since execute based on data dependencies
Inherently more parallel with the potential to execute many instructions at the same time

Control- vs data-flow trade-offs:

Ease of programming
Ease of compilation
Extraction of parallelism (performance)
Hardware complexity

Instruction Set Architecture

The Instruction Set Architecture (ISA) specifies how a programmer sees instructions to be executed:

Defines an interface between software and hardware enabling the implementation of programs
Modern ISAs are mostly control-flow architectures: x86, ARM, MIPS, SPARC, POWER
ISAs have a very long lifetime (compared to µarch) staying backwards-compatible while being extended with additional instructions

The ISA includes all functionality exposed to the programmer:

Instructions: Opcodes, addressing modes, data types, registers, condition code…
Memory: Address space, alignment, virtual memory…
Interrupt/exception handling, access control, priority/privileges
Task/thread management, power & thermal management
Multi-threading & multi-processing support

ISA Types:

Reduced Instruction Set Computer (RISC)
- Compact, uniform instruction size ➜ easier to decode ➜ facilitates pipelines
- Complexity implemented as series of smaller instructions
- More lines of code ➜ bigger memory footprint
- Allow effective compiler optimization
Complex Instruction Set Computer (CISC)
- Extremely specific instructions (doing as much work as possible)
- Instructions not uniform in size ➜ difficult to decode
- Pipelines requires break down of instructions into smaller components at processor level
- High code density
- Complex processor hardware
Very long instruction word (VLIW)
- Execute multiple instructions concurrently, in parallel
- Instruction Level Parallelism (ILP)
- Compiler bundles multiple instructions that can be executed in parallel into a single long instruction

Microarchitecture

The Microarchitecture (µarch) is the implementation of the ISA under specific design constrains and goals:

The microprocessor is the physical representation (circuits) of the ISA and µarch
Example: add instruction (ISA) vs adder implementation (µarch) [bit serial, ripple carry, carry lockahead, etc.]
Example: x86 ISA has many implementations - Intel [2,3,4]86, Intel Pentium [Pro, 4], Intel Core, AMD…
Design points: cost, performance, power consumption, reliability, time to market…

The µarch defines anything done in hardware and can execute instructions in any order (e.g. data-flow order) as long it obeys the semantics specified by the ISA:

Pipeline instruction execution (Intel 486)
Multiple instructions at a time (Intel Pentium)
Out-of-order execution (Intel Pentium Pro)
Speculative execution, branch prediction, prefetching
Memory access scheduling policy, cache (levels, size, associativity, replacement policy)
Clock gating, dynamic voltage and frequency scaling (energy efficiency)
Error handling, correction
Superscalar processing, multiple instructions (VLIW architecture, Intel Itanium)
SIMD processing (vector/array processors, GPUs)
Systolic arrays (Google tensor-processor)

Manufacturer

Intel

Xeon generations…

Date	Codename	Cores	Socket	Features
2017	Skylake	4-22	LGA 3647	6xDDR4-2666
2019	Cascade Lake	4-28	LGA 3647	6xDDR4-2933
2020	Cooper Lake	16-28	LGA 4189	6xDDR4-3200
2021	Ice Lake	8-40	LGA 4189	8xDDR4-3200, PCIe 4.0
2022	Sapphire Rapids	-56	LGA 4677	HBM2e, DDR5, PCIe 5.0, CXL 1.1
2023	Emerald Rapids	-60	LGA 4677
2024	Sierra Forest	-144E
^	Granite Rapids	?P		DDR5-8800
2025	Clearwater Forest	?E

Sierra Forest introduces E- & P-cores (efficiency & performance)
- …distinct product lines

Hybrid CPU-GPU…

…code-named “Falcon Shores”
XPU…X is a variable…denotes multiple kinds of compute
…first half of 2024
…20A processes from Intel Foundry Services
…Xeon SP socket (like “Granite Rapids” CPUs)

Intel on Demand (introduction with Sapphire Rapids)…

…software-defined silicon (SDSi) service
…optional service
- …act as a “try-before-you-buy program”
- …option to…
  - …select fully featured
  - …pick and choose features
…two modes…
- …activation model…enable features through a one-time activation
  - …state information…shared with Intel…SDSi-enabled data-center
- …consumption model…through as-a-service offerings

AMD

x86 processors….

Date	CPU-Family	Architecture
1996-1997	K5	x86
1997-1998	K6	x86
1999-2002	K7	x86
2003-2014	K8	x86-64
2007-2013	K10	x86-64
2011-2017	Bulldozer	x86-64
2017-present	Zen	x86-64

Brand names…

Desktop/Workstation…
- Athlon (2001-2019)
- Ryzen (2017-present)…high-end Ryzen Threadripper
Server…
- Opteron (2003-2012)
- Epyc (2017-present)

Ryzen

Naming schema…

Ryzen 5 …mid-range processor
Ryzen 7 …high-performance processor
Ryzen 9 …most powerful
Suffix…
- U ultra-low power
- X high performance …XT higher clock speeds
- G iGPU

Ryzen 5 3600X
      | ||| `--- CPU Power Suffix
      | ||`----- Model Number (Sku Number)
      | |`------ CPU Performance Level
      | `------- CPU Generation
      `--------- CPU Family

Ryzen (desktop-grade) CPU generations…

Date	Series	Arch.	Gen.	Features
2017	1000	Zen	1
2018	2000	Zen+	1
2019	3000	Zen 2	2
2020	4000	Zen 2	2	AM4
2021	5000G	Zen 3	3	AM4, DDR4-3200, PCIe 3.0
2022	5000	Zen 3	3	AM4, DDR4-3200
2022	6000	Zen 3+	3
2022	7000	Zen 4	4	AM5, DDR5-5600, PCIe 5.0
2024	8000G	Zen 4	4	AM5, DDR5-5600, PCIe 4.0
2024	9000	Zen 5	5

Sockets…

AM4 for Ryzen 4000,5000
AM5 for Ryzen 7000

Chipsets…

Chipset 300-series, 1st,2nd,3rd Gen CPUs
Chipset 400-series… 1st,2nd,3rd Gen CPUs
Chipset 500-series, 2nd,3rd,4th Gen CPUs

Chipset classes, X & B support overclocking

Premium X{3,4,5,6}70
Midrange B{3,4,5,6}50
A{3,4,5,6}20

Epyc

Epyc (server-grade) CPU generations

Date	Series	Arch.	Cores	Socket	Features
2017	7001 Naples	Zen	32	SP3	DDR4-2666, PCIe3
2019	7002 Rome	Zen 2	64	SP3	DDR4-3200, PCIe4
2021	7003 Milan	Zen 3	64	SP3	DDR4-3200, PCIe4
2023	8004 Siena	Zen 4c	96	SP6	DDR4-4800, PCIe5, CXL 1.1
2022	9004 Genoa	Zen 4/4c	128	SP5	DDR5-4800, PCIe5, CXL1.1
2024	Turin	Zen 5	128
2025	Venice

Naming Convention

EPYC  9554P
      ||||`---- feature modifier
      |||`----- generation
      ||`------ performance
      |`------- core count
      `-------- product series

Fabrication

Process node…
- …manufacturing process and design of a CPU made through lithography
- …nm (nano-meter) used to measure the size of the transistor
Lower nm…
- …more power efficient
- …less cooling requirements
- …faster transistor switching
- …higher transistor density
14nm, 7nm, etc …primarily marketing terms …refer to improved generation of chips

Foundries…

In order or market share in 2023…
- 55% TSMC (Taiwan)
- 13% Samsung (Korea)
- 7% Globalfoundries (USA)
- 5% SMIC (China)
EU Chips Act (2022/02) …strengthen semiconductor production in the EU

Chiplets …small, modular chips …combined to system-on-chip (SoC)

…used in a chiplet-based architecture
…increases design flexibility …reduce production cost
…can improve performance …reduce power consumption
UCIe (Universal Chiplet Interconnect Express)
- …standard bushed bu Intel, AMD & Samsung
- …integration of chiplets from different manufacturers
Modern CPUs composed of separate modules…
- …compute core, memory controller, PCIe bus…
- …modules typically build in different fabrication technologies

Configuration

NUMA

Non-Uniform Memory Access (NUMA)

Multiple processors, collectively called node (aka cell, zone), are physically grouped on a socket.
Each node has high speed access to a local dedicated memory bank.
An interconnect bus provides connections between nodes, so that all CPUs can still access all memory
There is a performance penalty for processors accessing non-local memory.
/sys/devices/system/node contains information about NUMA nodes in the system, and the relative distances between those nodes

dnf install -y hwloc numactl
numactl --hardware               # examine the NUMA layout 
lstopo                           # show memory and CPU topology

Frequency Scaling

Dynamic Frequency Scaling (aka CPU throttling)

CPU support: Intel SpeedStep, AMD Cool’n’Quiet
Note that firmware may configure frequency and thermal management
Lower clock speed results in a slower CPU consuming less energy
Frequency scaling governors in the kernel support:
- CPU frequency/voltage mappings
- Upper/lower frequency limits
- Strategies to switch between mappings

watch grep \"cpu MHz\" /proc/cpuinfo                         # monitor cpu speed
cpupower frequency-info                                      # show throttling configuration
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor    # current power scheme for the CPU
cpupower frequency-set -g <governor>                         # activate a particular power scheme
cpufreq-info                                                 # show throttling configuration
/etc/default/cpufrequtils                                    # power sheme configuration
### depricated with linux kernel <2.3.36
grep throttling /proc/acpi/processor/CPU*/info               # show state of throttling control
grep -e ^active -e \*T /proc/acpi/processor/CPU*/throttling  # active configuration if enabled

--- title: Central Processing Unit (CPU) categories: - Hardware date: 2014/04/29 date-modified: 2024/02/20 toc-expand: 3 --- # Architecture A computer is a machine to **process** information (data): * A **processor** (typically an integrated circuit) performs **operations** on data * It processes **input** information to generate a desired **output** information * Input/output data loaded/stored from/to **memory** a data storage area * Electronic devices operate **binary** signals (electricity on/off) * Binary expressed by two symbols - `0` and `1` ➜ **binary digits** (bits) * Numbers, text represented as **binary patterns** ➜ combinations of zeros and ones Example binary patterns (8bit ASCII encoding): Binary | Character -----------|------------ 0011 0000 | 0 (zero) 0011 0010 | 1 (one) 0011 0010 | 2 (two) 0100 0001 | A 0100 0010 | B 0100 0100 | C ## Machine Code Machine code (machine language) ➜ **instructions** executed by a processor * Instruction ➜ **operation code** (opcode), **operand** * Operand ➜ (memory address to the) data to operate on * Opcodes & operands encoded as **binary code** * Machine code **programm** ➜ sequence of instructions (opcodes & operands) * **Assembly language** - Symbolic representation of machine instructions - Symbolic names for opcodes ➜ **mnemonic codes** * **Assembler** ➜ translates assembly language into machine code Example machine operations code (Intel 8085): Opcode (binary) | Mnemonic | Description ----------------|----------|--------------------------------- 1000 0111 | ADD | Add contents of register to accumulator 0011 1010 | LDA | Load data from memory address 0011 0010 | STA | Store data to memory address 0111 1001 | MOV | Move data from between registers 1100 0011 | JMP | Jump to memory address ### Object Code Object code is a sequence of instructions in machine code generated by an assembler * Executable programs are typically build from reusable **code fragments** (sub-program, function/modules) - Fragments get individually compiled (translated) into object code - A complete program it then build by combining various object code fragments - Individual fragments are referenced using a **symbol** (function name) * **Object file** (relocatable format machine code) - File format use to store object code and related data (e.g. ELF) - Structured as separated **segments**/sections of different types of data * A **linker** program combines object code to generate executable machine code - **Relocation** assigns load addresses to various object code fragments - A linker resolves symbols using assigned memory locations and patching the calling object code to that location (call instruction reference) * A **loader** places executable machine code into (main) memory and prepares it for execution - Allocates regions in memory corresponding to segments in the machine code - A program loader is part of a computer operating system (starts the program once its loader) - A microcontroller typically do not have a loader, instead the executable machine code is starter directly from memory ### Object Files Object files contain five kinds of information * Header: Metadata, code size, format specification, etc. * Object code: Binary instructions and data generated by an assembler(/compiler) * Relocation: List of places in the object code a linker needs to change the address of * Symbols: Global symbols defined in the module, symbols to be imported from other modules * Debugging: Information about the object code need by the linker for debugging ELF (Executable and Linking Format) file com in three flavors: * Relocatable: Create by assemblers(/compilers), need to processed by a linker * Executable: Address relocation done, symbols resolved (except for shared library symbols), ready for execution * Shared object: Shared libraries including symbol information for linkers and executable code for run-time # Processor A **processing unit**, aka CPU (Central Processing Unit): * Active part of a computer ➜ **datapath** and **control** - Control ➜ Commands the datapath, memory, and input/output devices according to machine instructions - Datapath ➜ Components of a processor that perform arithmetic operations * Processors ➜ **fetch** (read) instruction from memory before instruction **execute** Separation of processor and memory distinguishes programmable computers ## Control-flow Architecture **Stored program computer**: instructions and data stored in memory * **Harvard architecture**: Separate memory for data and instructions - Two sets of address/data buses between processor and memory - Allow simultaneous memory fetches * **Modified Harvard architecture**: Separate memory for data and instructions - Instruction memory can be used to store data - Two pieces of data can be loaded in parallel * **Von Neumann architecture**: Single memory holds data and instructions - Single set of address/data buses between processor and memory - Values in memory interpreted depending on a control signal * Current instruction identified by the **instruction pointer** (program counter) * Sequential instruction processing (fetch, execute, and complete) one at a time * The instruction pointer is advanced sequentially except for control transfer * Instructions executed in **control flow order** ## Data-flow Architecture * Instructions executed based on the availability of input arguments, **data flow order** * Conceptually **no instruction pointer** required since execute based on data dependencies * Inherently more parallel with the potential to execute many instructions at the same time Control- vs data-flow trade-offs: * Ease of programming * Ease of compilation * Extraction of parallelism (performance) * Hardware complexity ## Instruction Set Architecture The Instruction Set Architecture (ISA) specifies how a programmer sees instructions to be executed: * Defines an **interface between software and hardware** enabling the implementation of programs * Modern ISAs are mostly control-flow architectures: x86, ARM, MIPS, SPARC, POWER * ISAs have a very long lifetime (compared to µarch) staying backwards-compatible while being extended with additional instructions The ISA includes all functionality exposed to the programmer: * Instructions: Opcodes, addressing modes, data types, registers, condition code... * Memory: Address space, alignment, virtual memory... * Interrupt/exception handling, access control, priority/privileges * Task/thread management, power & thermal management * Multi-threading & multi-processing support **ISA Types**: * Reduced Instruction Set Computer (RISC) - Compact, uniform instruction size ➜ easier to decode ➜ facilitates pipelines - Complexity implemented as series of smaller instructions - More lines of code ➜ bigger memory footprint - Allow effective compiler optimization * Complex Instruction Set Computer (CISC) - Extremely specific instructions (doing as much work as possible) - Instructions not uniform in size ➜ difficult to decode - Pipelines requires break down of instructions into smaller components at processor level - High code density - Complex processor hardware * Very long instruction word (VLIW) - Execute multiple instructions concurrently, in parallel - Instruction Level Parallelism (ILP) - Compiler bundles multiple instructions that can be executed in parallel into a single long instruction - ## Microarchitecture The Microarchitecture (µarch) is the implementation of the ISA under specific design constrains and goals: * The microprocessor is the physical representation (circuits) of the ISA and µarch * Example: add instruction (ISA) vs adder implementation (µarch) [bit serial, ripple carry, carry lockahead, etc.] * Example: x86 ISA has many implementations - Intel [2,3,4]86, Intel Pentium [Pro, 4], Intel Core, AMD... * Design points: cost, performance, power consumption, reliability, time to market... The µarch defines anything done in hardware and can execute instructions in any order (e.g. data-flow order) as long it obeys the **semantics specified by the ISA**: * Pipeline instruction execution (Intel 486) * Multiple instructions at a time (Intel Pentium) * Out-of-order execution (Intel Pentium Pro) * Speculative execution, branch prediction, prefetching * Memory access scheduling policy, cache (levels, size, associativity, replacement policy) * Clock gating, dynamic voltage and frequency scaling (energy efficiency) * Error handling, correction * Superscalar processing, multiple instructions (VLIW architecture, Intel Itanium) * SIMD processing (vector/array processors, GPUs) * Systolic arrays (Google tensor-processor) # Manufacturer ###################################################### ## Intel Xeon generations... Date | Codename | Cores | Socket | Features -----|-----------------|-------|----------|---------------- 2017 | Skylake | 4-22 | LGA 3647 | 6xDDR4-2666 2019 | Cascade Lake | 4-28 | LGA 3647 | 6xDDR4-2933 2020 | Cooper Lake | 16-28 | LGA 4189 | 6xDDR4-3200 2021 | Ice Lake | 8-40 | LGA 4189 | 8xDDR4-3200, PCIe 4.0 2022 | Sapphire Rapids | -56 | LGA 4677 | HBM2e, DDR5, PCIe 5.0, CXL 1.1 2023 | Emerald Rapids | -60 | LGA 4677 | 2024 | Sierra Forest | -144E | | ^ | Granite Rapids | ?P | | DDR5-8800 2025 | Clearwater Forest | ?E | | - Sierra Forest introduces E- & P-cores (efficiency & performance) - …distinct product lines Hybrid CPU-GPU... - ...code-named “Falcon Shores” - XPU...X is a variable...denotes multiple kinds of compute - ...first half of 2024 - ...20A processes from Intel Foundry Services - ...Xeon SP socket (like “Granite Rapids” CPUs) Intel on Demand (introduction with Sapphire Rapids)... * ...software-defined silicon (SDSi) service * ...optional service * ...act as a "try-before-you-buy program" * ...option to... * ...select fully featured * ...pick and choose features * ...two modes... * ...activation model...enable features through a one-time activation * ...state information...shared with Intel...SDSi-enabled data-center * ...consumption model...through as-a-service offerings ## AMD x86 processors.... Date | CPU-Family | Architecture ---------------|------------|-------------- 1996-1997 | K5 | x86 1997-1998 | K6 | x86 1999-2002 | K7 | x86 2003-2014 | K8 | x86-64 2007-2013 | K10 | x86-64 2011-2017 | Bulldozer | x86-64 2017-present | Zen | x86-64 Brand names... - Desktop/Workstation... - Athlon (2001-2019) - Ryzen (2017-present)...high-end Ryzen Threadripper - Server... - Opteron (2003-2012) - Epyc (2017-present) ### Ryzen Naming schema… - Ryzen 5 …mid-range processor - Ryzen 7 …high-performance processor - Ryzen 9 …most powerful - Suffix… - `U` ultra-low power - `X` high performance …`XT` higher clock speeds - `G` iGPU ```txt Ryzen 5 3600X | ||| `--- CPU Power Suffix | ||`----- Model Number (Sku Number) | |`------ CPU Performance Level | `------- CPU Generation `--------- CPU Family ``` Ryzen (desktop-grade) CPU generations... Date | Series | Arch. | Gen. | Features -------|--------|---------|------|----------- 2017 | 1000 | Zen | 1 | 2018 | 2000 | Zen+ | 1 | 2019 | 3000 | Zen 2 | 2 | 2020 | 4000 | Zen 2 | 2 | AM4 2021 | 5000G | Zen 3 | 3 | AM4, DDR4-3200, PCIe 3.0 2022 | 5000 | Zen 3 | 3 | AM4, DDR4-3200 2022 | 6000 | Zen 3+ | 3 | 2022 | 7000 | Zen 4 | 4 | AM5, DDR5-5600, PCIe 5.0 2024 | 8000G | Zen 4 | 4 | AM5, DDR5-5600, PCIe 4.0 2024 | 9000 | Zen 5 | 5 | Sockets… - AM4 for Ryzen 4000,5000 - AM5 for Ryzen 7000 Chipsets… - Chipset 300-series, 1st,2nd,3rd Gen CPUs - Chipset 400-series… 1st,2nd,3rd Gen CPUs - Chipset 500-series, 2nd,3rd,4th Gen CPUs Chipset classes, X & B support overclocking * Premium X{3,4,5,6}70 * Midrange B{3,4,5,6}50 * A{3,4,5,6}20 ### Epyc Epyc (server-grade) CPU generations Date | Series | Arch. | Cores | Socket | Features -----|--------|--------|--------|--------|--------- 2017 | 7001 Naples | Zen | 32 | SP3 | DDR4-2666, PCIe3 2019 | 7002 Rome | Zen 2 | 64 | SP3 | DDR4-3200, PCIe4 2021 | 7003 Milan | Zen 3 | 64 | SP3 | DDR4-3200, PCIe4 2023 | 8004 Siena | Zen 4c | 96 | SP6 | DDR4-4800, PCIe5, CXL 1.1 2022 | 9004 Genoa | Zen 4/4c | 128 | SP5 | DDR5-4800, PCIe5, CXL1.1 2024 | Turin | Zen 5 | 128 | | 2025 | Venice | | | | Naming Convention ```txt EPYC 9554P ||||`---- feature modifier |||`----- generation ||`------ performance |`------- core count `-------- product series ``` # Fabrication ################################################################# - Process node... - ...manufacturing process and design of a CPU made through lithography - ...nm (nano-meter) used to measure the size of the transistor - Lower nm... - ...more power efficient - ...less cooling requirements - ...faster transistor switching - ...higher transistor density - 14nm, 7nm, etc ...primarily marketing terms ...refer to improved generation of chips Foundries... - In order or market share in 2023... - 55% TSMC (Taiwan) - 13% Samsung (Korea) - 7% Globalfoundries (USA) - 5% SMIC (China) - EU Chips Act (2022/02) ...strengthen semiconductor production in the EU Chiplets …small, modular chips …combined to system-on-chip (SoC) - …used in a chiplet-based architecture - …increases design flexibility …reduce production cost - …can improve performance …reduce power consumption - **UCIe** (Universal Chiplet Interconnect Express) - …standard bushed bu Intel, AMD & Samsung - …integration of chiplets from different manufacturers - Modern CPUs composed of separate modules… - …compute core, memory controller, PCIe bus… - …modules typically build in different fabrication technologies # Configuration ## NUMA Non-Uniform Memory Access (NUMA) * Multiple processors, collectively called **node** (aka cell, zone), are physically grouped on a socket. * Each node has high speed access to a local **dedicated memory bank**. * An **interconnect bus** provides connections between nodes, so that all CPUs can still access all memory * There is a **performance penalty for processors accessing non-local memory**. * `/sys/devices/system/node` contains information about NUMA nodes in the system, and the relative distances between those nodes ```bash dnf install -y hwloc numactl numactl --hardware # examine the NUMA layout lstopo # show memory and CPU topology ``` ## Frequency Scaling **Dynamic Frequency Scaling** (aka CPU throttling) * CPU support: Intel SpeedStep, AMD Cool’n’Quiet * Note that firmware may configure frequency and thermal management * Lower clock speed results in a slower CPU consuming less energy * Frequency **scaling governors** in the kernel support: * CPU frequency/voltage mappings * Upper/lower frequency limits * Strategies to switch between mappings ```bash watch grep \"cpu MHz\" /proc/cpuinfo # monitor cpu speed cpupower frequency-info # show throttling configuration cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor # current power scheme for the CPU cpupower frequency-set -g <governor> # activate a particular power scheme cpufreq-info # show throttling configuration /etc/default/cpufrequtils # power sheme configuration ### depricated with linux kernel <2.3.36 grep throttling /proc/acpi/processor/CPU*/info # show state of throttling control grep -e ^active -e \*T /proc/acpi/processor/CPU*/throttling # active configuration if enabled ```