Vc
0.7.5-dev
SIMD Vector Classes for C++
|
|
If you are new to vectorization please read this following part and make sure you understand it:
You can modify a function to use vector types and thus implement a horizontal vectorization. The original scalar function could look like this:
To vectorize the normalize
function with Vc, the types must be substituted by their Vc counterparts and math functions must use the Vc implementation (which is, per default, also imported into std
namespace):
The latter function is able to normalize four 3D vectors when compiled for SSE in the same time the former function normalizes one 3D vector.
For completeness, note that you can optimize the division in the normalize function further:
Then you can multiply x
, y
, and z
with d_inv
, which is considerably faster than three divisions.
As you can probably see, the new challenge with Vc is the use of good data-structures which support horizontal vectorization. Depending on your problem at hand this may become the main focus of design (it does not have to be, though).
If you do not know what alignment is, and why it is important, read on, otherwise skip to Tools. Normally the alignment of data is an implementation detail left to the compiler. Until C++11, the language did not even have any (official) means to query or modify alignment.
Most data types require more than one Byte for storage. Thus, even most atomic data types span several locations in memory. E.g. if you have a pointer to float
, the address stored in this pointer just determines the first of four Bytes of the float
. Naively, one could think that any address (which belongs to the process) can be used to store such a float. While this is true for some architectures, some architectures may terminate the process when a misaligned pointer is dereferenced. The natural alignment for atomic data types typically is the same as their size. Thus the address of a float
object should always be a multiple of 4 Bytes.
Alignment becomes more important for SIMD data types. 1. There are different instructions to load/store aligned and unaligned vectors. The unaligned load/stores recently were greatly improved in x86 CPUs. Still, the rule of thumb says that aligned loads/stores are faster. 2. Access to an unaligned vector with an instruction that expects an aligned vector crashes the application. Once you write vectorized code you might want to make it a habit to check crashes for unaligned addresses. 3. Memory allocation on the heap will return addresses aligned to some system specific alignment rule. E.g. Linux 32bit aligns on 8 Bytes, while Linux 64bit aligns on 16 Bytes. Both alignments are not strict enough for AVX vectors. Worse, if you develop on Linux 64bit with SSE you won't notice any problems until you switch to a 32bit build or AVX. 4. Placement on the stack is determined at compile time and requires the compiler to know the alignment restrictions of the type. 5. The size of a cache line is just two or four times larger than the SIMD types (if not equal). Thus, if you load several vectors consecutively from memory every fourth, second, or even every load will have to be read from two different cache lines. This is called a cache line split. They lead to degraded performance, which becomes very noticeable for memory intensive code.
Vc provides several classes and functions to get alignment right.
new
and delete
to return correctly aligned pointers to the heap. malloc
and free
. They can be used to allocate any type of memory with an abstract alignment restriction: Vc::MallocAlignment. Note, that (like malloc
) the memory is only allocated and not initialized. If you allocate memory for a type that has a constructor, use the placement new syntax to initialize the memory. T
. STL containers will already default to Vc::Allocator for Vc::Vector<T>. For all other composite types you want to use, you can take the VC_DECLARE_ALLOCATOR convenience macro to set is as default.