Compiler Explorer
-
Compiler Explorer, or similar inspection tools are an invaluable tool for optimization and micro-benchmarking.
-
An alternative you might find interesting is the use of local tools to the same end. If you’re not familiar with the necessary tools you can use two of the scripts that I created for my own use:
vir_inspect.sh
(requires zshsudo apt install zsh
) andvir_dump_asm.sh
.-
vir_inspect.sh /path/to/executable
shows a filtered list of functions in the executable. Call
vir_inspect.sh /path/to/executable <pattern>
and it will filter the list of functions using the last argument. If a single function remains it skips the next step.
-
Enter the number of the function you want to inspect.
-
The tool will show a disassembly of the function. If debug information is available (compiled with
-g
), source code annotation will be shown. -
After the disassembly,
llvm-mca
will interpret the complete function. This is often not very useful, unless the function was carefully crafted to be interpreted byllvm-mca
. But feel free to extend the script to insert# LLVM-MCA-BEGIN name0
and# LLVM-MCA-END name0
markers before feeding intollvm-mca
. -
vir_dump_asm.sh <source file>
will compile and dump asm.
-
-
Another alternative for Vim users: I hacked up a Compiler Explorer-like vim plugin for myself. It’s available at vim-compilerexplorer.
-
When looking at x86 asm, I recommend to use Intel syntax instead of AT&T assembler syntax. (Makes it easier when consulting Intel documentation.)
- Quick x86 asm Introduction (by Matt Godbolt):
- Registers
-
rax
,rbx
,rcx
,rdx
,rsi
,rdi
,rbp
,rip
,rsp
,r8–r15
-
xmm0
–xmm15
-
rdi
,rsi
,rdx
, … as function arguments -
rax
is the return value
-
-
op
(often implicit src/dest) -
op dest
(often in/out and implicit src) -
op dest, src
(often in/out dest) op dest, src1, src2
-
mov eax, edi
“move” (eax = edi
) -
mov eax, DWORD PTR
[rdi+rsi*4]
“load from memory” (eax = *(int*)(rdi + rsi * 4)
) -
lea eax, [rdi+rsi]
“load effective address” (eax = rdi + rsi
)
- Registers
- Interesting floating-point instructions:
- All of these instructions may have a
v
prefix (e.g.vmovss
instead ofmovss
), which you can ignore. It’s only a different instruction encoding. -
movss
: move scalar single-precision (op1 = op2
) -
addss
: add scalar single-precision (op1 += op2
orop1 = op2 + op3
) -
fmadd132ss
: fused multiply-add 132 (argument order:op1 = op1 * op3 + op2
) scalar single-precision -
movd
: move doubleword (32 bits) (op1 = op2
) -
movsd
: move scalar double-precision -
addsd
: add scalar double-precision
- All of these instructions may have a
- Later we will also see instructions that use packed instead of scalar
in their mnemonic. E.g.
addps
instead ofaddss
. “packed” means SIMD.
Exercise
Inspect the example we benchmarked using Compiler Explorer. Remove FLOP/s computation. Short link
TIP
Use e.g.
std::vector<int>
in place ofbenchmark::State
.
Modify the benchmark to produce believable results.
TIP
Or invoke some magic yourself:
asm volatile("" : "+x"(x));
It is different from
benchmark::DoNotOptimize
. Is it better? more correct? Discuss.
TIP Local “Compiler Explorer”
Of course you can achieve a very similar result locally, using e.g. the following command. Compiler Explorer has the added feature of better annotation of the assembler output and easy testing of different compilers and compiler flags.
CXXFLAS=-std=gnu++2b -O2 -DNDEBUG watch "ccache g++ $CXXFLAGS -c -S -o - -masm=intel myfile.cpp|grep -vE '^\s+\.'|c++filt"
Drop
ccache
if you don’t have it available. But sincewatch
recompiles every 2s, caching recompiles of unchanged code is not such a bad idea.