HPC Benchmarks
HPC
Micro-benchmarks for performance…
- …mini-applications to heavily test a specific function
- …tries to reach performance limitations
Fabric
STREAM
STREAM benchmark …https://www.cs.virginia.edu/stream
- …measures sustainable memory bandwidth…
- …works with datasets larger than the available cache
- List of results…
Usage
Source code…
# get the source code
git clone https://github.com/jeffhammond/STREAM && cd STREAM
# compile with OpenMP support for multi-core support
gcc -fopenmp stream.c -o stream
# execute benchmark
export OMP_NUM_THREADS=2 ; ./stream
References…
- Intel …ICC compiler
- AMD …AOCC compiler
Measurements
Uses synthetic vector style applications…
- …only measures execution time …everything else is derived
- …reports “bandwidth” values for each of the kernels…
# example output
Function Best Rate MB/s Avg time Min time Max time
Copy: 10917.2 0.014719 0.014656 0.014961
Scale: 10629.1 0.015092 0.015053 0.015121
Add: 14149.2 0.017029 0.016962 0.017103
Triad: 13763.1 0.017509 0.017438 0.017655
Name | Kernel | Bytes/Iteration | FLOPS/Iteration |
---|---|---|---|
COPY | a(i) = b(i) |
16 | 0 |
SCALE | a(i) = q*b(i) |
16 | 1 |
SUM | a(i) = b(i) + c(i) |
24 | 1 |
TRIAD | a(i) = b(i) + q*c(i) |
24 | 2 |
- copy …measures transfer rate in the absence of arithmetic
- scale …adds a simple arithmetic operation
- sum …adds a third operand
- triad …overlapped multiple add operations
Adjust the value of STREAM_ARRAY_SIZE
…
- …number of array elements used to run the benchmarks
- …depends on…
- …system cache size(s)
- …granularity of the system timer
- …adjust value…
- …array…4 times the size of the available cache
- …large enough for ‘timing calibration’ of at least 20 clock-ticks
Use lstopo
to identify L3 cache size… (multiply by 4)…
# set at compile time
gcc -O -DSTREAM_ARRAY_SIZE=100000000 stream.c -o stream.100M
IOR
IOR (Interleaved or Random) file system benchmarking application
http://wiki.lustre.org/IOR
https://github.com/LLNL/ior (deprecated)
https://github.com/IOR-LANL/ior
https://github.com/glennklockwood/ior-apex
- Tests performance of parallel file-systems (like Lustre)
- Use MPI for process synchronisation
- Configurable to operate in multiple modes:
- File-per-process: One file per task (measures peak throughput).
- Single-shared-file: Single shared file for all tasks.
- Buffered: Take advantage to I/O caches on the client.
- DirectIO: Bypass I/O cache by writing directly to the file-system.
>>> git clone https://github.com/LLNL/ior.git && cd ior
>>> ./bootstrap
>>> ./configure
>>> make clean && make
Deploy the ior
binary on all nodes used for benchmarking.
# 20 parallel task writing one file each with size 100MB
mpirun -np 20 ior -a POSIX -vwk -t100m -b100m -i 10 -F -o ior.dat
Options
File size (1.5x total main memory of a node):
filesize = segmentCount * blocksize * number_of_processes
transfersize
: Size (in bytes) of a single data buffer to be transferred in a single I/O call.blocksize
: Size (in bytes) of a contiguous chunk of data accessed by a single clientsegmentCount
: Number of segments in file. (A segment is a contiguous chunk of data accessed by multiple clients each writing/reading their own contiguous data; comprised of blocks accessed by multiple clients or more transfers.)
Configuration Files
>>> cat ior.conf
IOR START
api=MPIIO
testFile=ior.dat
repetitions=1
readFile=1
writeFile=1
filePerProc=0
keepFile=0
blockSize=1024M
transferSize=2M
verbose=0
numTasks=0
collective=1
IOR STOP
>>> ior -f ior.conf
HEPScore
HEPScore23 …replaces HEPSPEC06
- …WLCG community in favour an open source benchmark…
- …over a SPEC-CPU 2006 based benchmark requiring a licence
- …support benchmark for other processors (ARM and GPUs)
- …provided to the HEPiX Benchmark Working Group
- …in the HEP Benchmark Suite repository
- …results collected in a central scores table
References…
- Power Efficiency in HEP (a case between ARM and x86), ACAT 2022
- HEPiX Benchmarking Working Group Report, HEPiX Fall 2023
References
Regression tests and benchmarks for HPC systems…
- PVCS (Parallel Computing Validation System)
- ReFrame
- JuBE
- Pavilion2