Vc  0.7.5-dev
SIMD Vector Classes for C++
Utilities

Detailed Description

Additional classes, macros, and functions that help to work more easily with the main vector types.

Classes

class  CpuId
 This class is available for x86 / AMD64 systems to read and interpret information about the CPU's capabilities. More...
class  Allocator< T >
 An allocator that uses global new and supports over-aligned types, as per [C++11 20.6.9]. More...
class  VectorAlignedBase
 Helper class to ensure proper alignment. More...
class  VectorAlignedBaseT< V >
 Helper class to ensure proper alignment. More...
class  InterleavedMemoryWrapper< S, V >
 Wraps a pointer to memory with convenience functions to access it via vectors. More...
class  Memory< V, Size1, Size2 >
 A helper class for fixed-size two-dimensional arrays. More...
class  Memory< V, Size, 0u >
 A helper class to simplify usage of correctly aligned and padded memory, allowing both vector and scalar access. More...
class  Memory< V, 0u, 0u >
 A helper class that is very similar to Memory<V, Size> but with dynamically allocated memory and thus dynamic size. More...

Macros

#define VC_DECLARE_ALLOCATOR(Type)
 Convenience macro to set the default allocator for a given Type to Vc::Allocator.
#define Vc_foreach_bit(iterator, mask)
 Loop over all set bits in the mask.
#define foreach_bit(iterator, mask)
 Alias for Vc_foreach_bit unless VC_CLEAN_NAMESPACE is defined.

Enumerations

enum  MallocAlignment { AlignOnVector, AlignOnCacheline, AlignOnPage }
 Enum that specifies the alignment and padding restrictions to use for memory allocation with Vc::malloc. More...
enum  Implementation {
  ScalarImpl, SSE2Impl, SSE3Impl, SSSE3Impl,
  SSE41Impl, SSE42Impl, AVXImpl, AVX2Impl
}
 Enum to identify a certain SIMD instruction set. More...
enum  ExtraInstructions {
  Float16cInstructions = 0x01000, Fma4Instructions = 0x02000, XopInstructions = 0x04000, PopcntInstructions = 0x08000,
  Sse4aInstructions = 0x10000, FmaInstructions = 0x20000
}
 The list of available instructions is not easily described by a linear list of instruction sets. More...

Functions

void forceToRegisters (const vec &,...)
 Force the vectors passed to the function into registers.
template<typename V , typename Parent , typename Dimension , typename RM >
std::ostream & operator<< (std::ostream &s, const Vc::MemoryBase< V, Parent, Dimension, RM > &m)
 Prints the contents of a Memory object into a stream object.
const char * versionString ()
unsigned int versionNumber ()
template<typename T , Vc::MallocAlignment A>
T * malloc (size_t n)
 Allocates memory on the Heap with alignment and padding suitable for vectorized access.
template<typename T >
void free (T *p)
 Frees memory that was allocated with Vc::malloc.
void prefetchForOneRead (const void *addr)
 Prefetch the cacheline containing addr for a single read access.
void prefetchForModify (const void *addr)
 Prefetch the cacheline containing addr for modification.
void prefetchClose (const void *addr)
 Prefetch the cacheline containing addr to L1 cache.
void prefetchMid (const void *addr)
 Prefetch the cacheline containing addr to L2 cache.
void prefetchFar (const void *addr)
 Prefetch the cacheline containing addr to L3 cache.

Micro-Architecture Feature Tests

unsigned int extraInstructionsSupported ()
 Determines the extra instructions supported by the current CPU.
bool isImplementationSupported (Vc::Implementation impl)
 Tests whether the given implementation is supported by the system the code is executing on.
Vc::Implementation bestImplementationSupported ()
 Determines the best supported implementation for the current system.
bool currentImplementationSupported ()
 Tests that the CPU and Operating System support the vector unit which was compiled for.

SIMD Support Feature Macros

#define VC_IMPL
 This macro is set to the value of Vc::Implementation that the current translation unit is compiled with.
#define VC_IMPL_XOP
 This macro is defined if the current translation unit is compiled with XOP instruction support.
#define VC_IMPL_FMA4
 This macro is defined if the current translation unit is compiled with FMA4 instruction support.
#define VC_IMPL_F16C
 This macro is defined if the current translation unit is compiled with F16C instruction support.
#define VC_IMPL_POPCNT
 This macro is defined if the current translation unit is compiled with POPCNT instruction support.
#define VC_IMPL_SSE4a
 This macro is defined if the current translation unit is compiled with SSE4a instruction support.
#define VC_IMPL_Scalar
 This macro is defined if the current translation unit is compiled without any SIMD support.
#define VC_IMPL_SSE
 This macro is defined if the current translation unit is compiled with any version of SSE (but not AVX).
#define VC_IMPL_SSE2
 This macro is defined if the current translation unit is compiled with SSE2 instruction support (excluding SSE3 and up).
#define VC_IMPL_SSE3
 This macro is defined if the current translation unit is compiled with SSE3 instruction support (excluding SSSE3 and up).
#define VC_IMPL_SSSE3
 This macro is defined if the current translation unit is compiled with SSSE3 instruction support (excluding SSE4.1 and up).
#define VC_IMPL_SSE4_1
 This macro is defined if the current translation unit is compiled with SSE4.1 instruction support (excluding SSE4.2 and up).
#define VC_IMPL_SSE4_2
 This macro is defined if the current translation unit is compiled with SSE4.2 instruction support (excluding AVX and up).
#define VC_IMPL_AVX
 This macro is defined if the current translation unit is compiled with AVX instruction support (excluding AVX2 and up).

Version Macros

#define VC_VERSION_STRING
 Contains the version string of the Vc headers.
#define VC_VERSION_NUMBER
 Contains the encoded version number of the Vc headers.
#define VC_VERSION_CHECK(major, minor, patch)
 Helper macro to compare against an encoded version number.

SIMD Vector Size Macros

#define VC_DOUBLE_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a double_v.
#define VC_FLOAT_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a float_v.
#define VC_SFLOAT_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a sfloat_v.
#define VC_INT_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a int_v.
#define VC_UINT_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a uint_v.
#define VC_SHORT_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a short_v.
#define VC_USHORT_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a ushort_v.

Macro Definition Documentation

#define VC_DECLARE_ALLOCATOR (   Type)
Value:
namespace std \
{ \
template<> class allocator<Type> : public ::Vc::Allocator<Type> \
{ \
public: \
template<typename U> struct rebind { typedef ::std::allocator<U> other; }; \
}; \
}

Convenience macro to set the default allocator for a given Type to Vc::Allocator.

Parameters
TypeYour type that you want to use with STL containers.
Note
You have to use this macro in the global namespace.
#define Vc_foreach_bit (   iterator,
  mask 
)

Loop over all set bits in the mask.

The iterator variable will be set to the position of the set bits. A mask of e.g. 00011010 would result in the loop being called with the iterator being set to 1, 3, and 4.

This allows you to write:

float_v a = ...;
Vc_foreach_bit(int i, a < 0.f) {
std::cout << a[i] << "\n";
}

The example prints all the values in a that are negative, and only those.

Parameters
iteratorThe iterator variable. For example "int i".
maskThe mask to iterate over. You can also just write a vector operation that returns a mask.
Note
Since Vc 0.7 break and continue are supported in foreach_bit loops.
#define VC_VERSION_STRING

Contains the version string of the Vc headers.

Same as Vc::versionString().

#define VC_VERSION_NUMBER

Contains the encoded version number of the Vc headers.

Same as Vc::versionNumber().

#define VC_VERSION_CHECK (   major,
  minor,
  patch 
)

Helper macro to compare against an encoded version number.

Example:

#if VC_VERSION_CHECK(0.5.1) >= VC_VERSION_NUMBER

Enumeration Type Documentation

Enum that specifies the alignment and padding restrictions to use for memory allocation with Vc::malloc.

Enumerator:
AlignOnVector 

Align on boundary of vector sizes (e.g.

16 Bytes on SSE platforms) and pad to allow vector access to the end. Thus the allocated memory contains a multiple of VectorAlignment bytes.

AlignOnCacheline 

Align on boundary of cache line sizes (e.g.

64 Bytes on x86) and pad to allow full cache line access to the end. Thus the allocated memory contains a multiple of 64 bytes.

AlignOnPage 

Align on boundary of page sizes (e.g.

4096 Bytes on x86) and pad to allow full page access to the end. Thus the allocated memory contains a multiple of 4096 bytes.

Enum to identify a certain SIMD instruction set.

You can use VC_IMPL for the currently active implementation.

See Also
ExtraInstructions
Enumerator:
ScalarImpl 

uses only fundamental types

SSE2Impl 

x86 SSE + SSE2

SSE3Impl 

x86 SSE + SSE2 + SSE3

SSSE3Impl 

x86 SSE + SSE2 + SSE3 + SSSE3

SSE41Impl 

x86 SSE + SSE2 + SSE3 + SSSE3 + SSE4.1

SSE42Impl 

x86 SSE + SSE2 + SSE3 + SSSE3 + SSE4.1 + SSE4.2

AVXImpl 

x86 AVX

AVX2Impl 

x86 AVX + AVX2

The list of available instructions is not easily described by a linear list of instruction sets.

On x86 the following instruction sets always include their predecessors: SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2

But there are additional instructions that are not necessarily required by this list. These are covered in this enum.

Enumerator:
Float16cInstructions 

Support for float16 conversions in hardware.

Fma4Instructions 

Support for FMA4 instructions.

XopInstructions 

Support for XOP instructions.

PopcntInstructions 

Support for the population count instruction.

Sse4aInstructions 

Support for SSE4a instructions.

FmaInstructions 

Support for FMA instructions (3 operand variant)

Function Documentation

unsigned int Vc::extraInstructionsSupported ( )

Determines the extra instructions supported by the current CPU.

Returns
A combination of flags from Vc::ExtraInstructions that the current CPU supports.
bool Vc::isImplementationSupported ( Vc::Implementation  impl)

Tests whether the given implementation is supported by the system the code is executing on.

Returns
true if the OS and hardware support execution of instructions defined by impl.
false otherwise
Parameters
implThe SIMD target to test for.
Vc::Implementation Vc::bestImplementationSupported ( )

Determines the best supported implementation for the current system.

Returns
The enum value for the best implementation.
bool Vc::currentImplementationSupported ( )

Tests that the CPU and Operating System support the vector unit which was compiled for.

This function should be called before any other Vc functionality is used. It checks whether the program will work. If this function returns false then the program should exit with a useful error message before the OS has to kill it because of an invalid instruction exception.

If the program continues and makes use of any vector features not supported by hard- or software then the program will crash.

Example:

int main()
{
std::cerr << "CPU or OS requirements not met for the compiled in vector unit!\n";
exit -1;
}
...
}
Returns
true if the OS and hardware support execution of the currently selected SIMD instructions.
false otherwise
void Vc::forceToRegisters ( const vec &  ,
  ... 
)

Force the vectors passed to the function into registers.

This can be useful after looking at the emitted assembly to force the compiler to optimize properly.

Note
Currently only has an effect for SSE vectors.
MSVC does not support this function at all.
Warning
Be careful with this function, especially since it can render the compiler unable to compile for 32 bit systems if it forces more than 8 vectors in registers.
std::ostream& operator<< ( std::ostream &  s,
const Vc::MemoryBase< V, Parent, Dimension, RM > &  m 
)

Prints the contents of a Memory object into a stream object.

for (int i = 0; i < m.entriesCount(); ++i) {
m[i] = i;
}
std::cout << m << std::endl;

will output (with SSE):

{[0, 1, 2, 3] [4, 5, 6, 7] [8, 9, 0, 0]}
Parameters
sAny standard C++ ostream object. For example std::cout or a std::stringstream object.
mAny Vc::Memory object.
Returns
The ostream object: to chain multiple stream operations.
Note
With the GNU standard library this function will check, whether the output stream is a tty. In that case it will colorize the output.
Warning
Please do not forget that printing a large memory object can take a long time.
const char* Vc::versionString ( )
Returns
the version string of the Vc headers.
Note
There exists a built-in check that ensures on application startup that the Vc version of the library (link time) and the headers (compile time) are equal. A mismatch between headers and library could lead to errors that are very hard to debug.
If you need to disable the check (it costs a very small amount of application startup time) you can define VC_NO_VERSION_CHECK at compile time.
unsigned int Vc::versionNumber ( )
Returns
the version of the Vc headers encoded in an integer.
T* Vc::malloc ( size_t  n)

Allocates memory on the Heap with alignment and padding suitable for vectorized access.

Memory that was allocated with this function must be released with Vc::free! Other methods might work but are not portable.

Parameters
nSpecifies the number of objects the allocated memory must be able to store.
Template Parameters
TThe type of the allocated memory. Note, that the constructor is not called.
ADetermines the alignment of the memory. See Vc::MallocAlignment.
Returns
Pointer to memory of the requested type, or 0 on error. The allocated memory is padded at the end to be a multiple of the requested alignment A. Thus if you request memory for 21 int objects, aligned via Vc::AlignOnCacheline, you can safely read a full cacheline until the end of the array, without generating an out-of-bounds access. For a cacheline size of 64 Bytes and an int size of 4 Bytes you would thus get an array of 128 Bytes to work with.
Warning
  • The standard malloc function specifies the number of Bytes to allocate whereas this function specifies the number of values, thus differing in a factor of sizeof(T).
  • This function is mainly meant for use with builtin types. If you use a custom type with a sizeof that is not a multiple of 2 the results might not be what you expect.
  • The constructor of T is not called. You can make up for this:
    SomeType *array = new(Vc::malloc<SomeType, Vc::AlignOnCacheline>(N)) SomeType[N];
See Also
Vc::free
void Vc::free ( T *  p)

Frees memory that was allocated with Vc::malloc.

Parameters
pThe pointer to the memory to be freed.
Template Parameters
TThe type of the allocated memory.
Warning
The destructor of T is not called. If needed, you can call the destructor before calling free:
for (int i = 0; i < N; ++i) {
p[i].~T();
}
See Also
Vc::malloc
void Vc::prefetchForOneRead ( const void *  addr)

Prefetch the cacheline containing addr for a single read access.

This prefetch completely bypasses the cache, not evicting any other data.

Parameters
addrThe cacheline containing addr will be prefetched.
void Vc::prefetchForModify ( const void *  addr)

Prefetch the cacheline containing addr for modification.

This prefetch evicts data from the cache. So use it only for data you really will use. When the target system supports it the cacheline will be marked as modified while prefetching, saving work later on.

Parameters
addrThe cacheline containing addr will be prefetched.
void Vc::prefetchClose ( const void *  addr)

Prefetch the cacheline containing addr to L1 cache.

This prefetch evicts data from the cache. So use it only for data you really will use.

Parameters
addrThe cacheline containing addr will be prefetched.
void Vc::prefetchMid ( const void *  addr)

Prefetch the cacheline containing addr to L2 cache.

This prefetch evicts data from the cache. So use it only for data you really will use.

Parameters
addrThe cacheline containing addr will be prefetched.
void Vc::prefetchFar ( const void *  addr)

Prefetch the cacheline containing addr to L3 cache.

This prefetch evicts data from the cache. So use it only for data you really will use.

Parameters
addrThe cacheline containing addr will be prefetched.