Vc  0.7.5-dev
SIMD Vector Classes for C++
Portability Issues

One of the major goals of Vc is to ease development of portable code, while achieving highest possible performance that requires target architecture specific instructions.

This is possible through having just a single type use different implementations of the same API depending on the target architecture. Many of the details of the target architecture are often dependent on the compiler flags that were used. Also there can be subtle differences between the implementations that could lead to problems. This page aims to document all issues you might need to know about.

Compiler Flags
  • GCC: The compiler should be called with the -march=<target> flag. Take a look at the GCC manpage to find all possibilities for <target>. Additionally it is best to also add the -msse2 -msse3 ... -mavx flags. If no SIMD instructions are enabled via compiler flags, Vc must fall back to the scalar implementation.
  • Clang: The same as for GCC applies.
  • ICC: Same as GCC, but the flags are called -xAVX -xSSE4.2 -xSSE4.1 -xSSSE3 -xSSE3 -xSSE2.
  • MSVC: On 32bit you can add the /arch:SSE2 flag. That's about all the MSVC documentation says. Still the MSVC compiler knows about the newer instructions in SSE3 and upwards. How you can determine what CPUs will be supported by the resulting binary is unclear.
Where does the final executable run?

You must be aware of the fact that a binary that is built for a given SIMD hardware may not run on a processor that does not have these instructions. The executable will work fine as long as no such instruction is actually executed and only crash at the place where such an instruction is used. Thus it is better to check at application start whether the compiled in SIMD hardware is really supported on the executing CPU. This can be determined with the currentImplementationSupported function.

If you want to distribute a binary that runs correctly on many different systems you either must restrict it to the least common denominator (which often is SSE2), or you must compile the code several times, with the different target architecture compiler options. A simple way to combine the resulting executables would be via a wrapping script/executable that determines the correct executable to use. A more sophisticated option is the use of the ifunc attribute GCC provides. Other compilers might provide similar functionality.

Guarantees

It is guaranteed that:

  • int_v::Size == uint_v::Size == float_v::Size
  • short_v::Size == ushort_v::Size == sfloat_v::Size
Important Differences between Implementations
  • Obviously the number of entries in a vector depends on the target architecture.
  • Because of the guarantees above, sfloat_v does not necessarily map to a single SIMD register and thus there could be a higher register pressure when this type is used.
  • Hardware that does not support 16-Bit integer vectors can implement the short_v and ushort_v API via 32-Bit integer vectors. Thus, some of the overflow behavior might be slightly different, and truncation will only happen when the vector is stored to memory.

Compiler Quirks

Since SIMD is not part of the C/C++ language standards Vc abstracts more or less standardized compiler extensions. Sadly, not every issue can be transparently abstracted. Therefore this will be the place where differences are documented:

  • MSVC is incapable of parameter passing by value, if the type has alignment restrictions. The consequence is that all Vc vector types and any type derived from Vc::VectorAlignedBase cannot be used as function parameters, unless a pointer is used (this includes reference and const-reference). So
    void foo(Vc::float_v) {}
    does not compile, while
    void foo(Vc::float_v &) {}
    void foo(const Vc::float_v &) {}
    void foo(Vc::float_v *) {}
    all work. Normally you should prefer passing by value since a sane compiler will then pass the data in a register and does not have to store/load the data to/from the stack. Vc defines VC_PASSING_VECTOR_BY_VALUE_IS_BROKEN for such cases. Also the Vc vector types contain a composite typedef AsArg which resolves to either const-ref or const-by-value. Thus, you can always use
    void foo(Vc::float_v::AsArg) {}
    .