Vc
0.7.5-dev
SIMD Vector Classes for C++
|
|
Additional classes, macros, and functions that help to work more easily with the main vector types.
Classes | |
class | CpuId |
This class is available for x86 / AMD64 systems to read and interpret information about the CPU's capabilities. More... | |
class | Allocator< T > |
An allocator that uses global new and supports over-aligned types, as per [C++11 20.6.9]. More... | |
class | VectorAlignedBase |
Helper class to ensure proper alignment. More... | |
class | VectorAlignedBaseT< V > |
Helper class to ensure proper alignment. More... | |
class | InterleavedMemoryWrapper< S, V > |
Wraps a pointer to memory with convenience functions to access it via vectors. More... | |
class | Memory< V, Size1, Size2 > |
A helper class for fixed-size two-dimensional arrays. More... | |
class | Memory< V, Size, 0u > |
A helper class to simplify usage of correctly aligned and padded memory, allowing both vector and scalar access. More... | |
class | Memory< V, 0u, 0u > |
A helper class that is very similar to Memory<V, Size> but with dynamically allocated memory and thus dynamic size. More... |
Macros | |
#define | VC_DECLARE_ALLOCATOR(Type) |
Convenience macro to set the default allocator for a given Type to Vc::Allocator. | |
#define | Vc_foreach_bit(iterator, mask) |
Loop over all set bits in the mask. | |
#define | foreach_bit(iterator, mask) |
Alias for Vc_foreach_bit unless VC_CLEAN_NAMESPACE is defined. |
Enumerations | |
enum | MallocAlignment { AlignOnVector, AlignOnCacheline, AlignOnPage } |
Enum that specifies the alignment and padding restrictions to use for memory allocation with Vc::malloc. More... | |
enum | Implementation { ScalarImpl, SSE2Impl, SSE3Impl, SSSE3Impl, SSE41Impl, SSE42Impl, AVXImpl, AVX2Impl } |
Enum to identify a certain SIMD instruction set. More... | |
enum | ExtraInstructions { Float16cInstructions = 0x01000, Fma4Instructions = 0x02000, XopInstructions = 0x04000, PopcntInstructions = 0x08000, Sse4aInstructions = 0x10000, FmaInstructions = 0x20000 } |
The list of available instructions is not easily described by a linear list of instruction sets. More... |
Functions | |
void | forceToRegisters (const vec &,...) |
Force the vectors passed to the function into registers. | |
template<typename V , typename Parent , typename Dimension , typename RM > | |
std::ostream & | operator<< (std::ostream &s, const Vc::MemoryBase< V, Parent, Dimension, RM > &m) |
Prints the contents of a Memory object into a stream object. | |
const char * | versionString () |
unsigned int | versionNumber () |
template<typename T , Vc::MallocAlignment A> | |
T * | malloc (size_t n) |
Allocates memory on the Heap with alignment and padding suitable for vectorized access. | |
template<typename T > | |
void | free (T *p) |
Frees memory that was allocated with Vc::malloc. | |
void | prefetchForOneRead (const void *addr) |
Prefetch the cacheline containing addr for a single read access. | |
void | prefetchForModify (const void *addr) |
Prefetch the cacheline containing addr for modification. | |
void | prefetchClose (const void *addr) |
Prefetch the cacheline containing addr to L1 cache. | |
void | prefetchMid (const void *addr) |
Prefetch the cacheline containing addr to L2 cache. | |
void | prefetchFar (const void *addr) |
Prefetch the cacheline containing addr to L3 cache. |
Micro-Architecture Feature Tests | |
unsigned int | extraInstructionsSupported () |
Determines the extra instructions supported by the current CPU. | |
bool | isImplementationSupported (Vc::Implementation impl) |
Tests whether the given implementation is supported by the system the code is executing on. | |
Vc::Implementation | bestImplementationSupported () |
Determines the best supported implementation for the current system. | |
bool | currentImplementationSupported () |
Tests that the CPU and Operating System support the vector unit which was compiled for. |
SIMD Support Feature Macros | |
#define | VC_IMPL |
This macro is set to the value of Vc::Implementation that the current translation unit is compiled with. | |
#define | VC_IMPL_XOP |
This macro is defined if the current translation unit is compiled with XOP instruction support. | |
#define | VC_IMPL_FMA4 |
This macro is defined if the current translation unit is compiled with FMA4 instruction support. | |
#define | VC_IMPL_F16C |
This macro is defined if the current translation unit is compiled with F16C instruction support. | |
#define | VC_IMPL_POPCNT |
This macro is defined if the current translation unit is compiled with POPCNT instruction support. | |
#define | VC_IMPL_SSE4a |
This macro is defined if the current translation unit is compiled with SSE4a instruction support. | |
#define | VC_IMPL_Scalar |
This macro is defined if the current translation unit is compiled without any SIMD support. | |
#define | VC_IMPL_SSE |
This macro is defined if the current translation unit is compiled with any version of SSE (but not AVX). | |
#define | VC_IMPL_SSE2 |
This macro is defined if the current translation unit is compiled with SSE2 instruction support (excluding SSE3 and up). | |
#define | VC_IMPL_SSE3 |
This macro is defined if the current translation unit is compiled with SSE3 instruction support (excluding SSSE3 and up). | |
#define | VC_IMPL_SSSE3 |
This macro is defined if the current translation unit is compiled with SSSE3 instruction support (excluding SSE4.1 and up). | |
#define | VC_IMPL_SSE4_1 |
This macro is defined if the current translation unit is compiled with SSE4.1 instruction support (excluding SSE4.2 and up). | |
#define | VC_IMPL_SSE4_2 |
This macro is defined if the current translation unit is compiled with SSE4.2 instruction support (excluding AVX and up). | |
#define | VC_IMPL_AVX |
This macro is defined if the current translation unit is compiled with AVX instruction support (excluding AVX2 and up). |
Version Macros | |
#define | VC_VERSION_STRING |
Contains the version string of the Vc headers. | |
#define | VC_VERSION_NUMBER |
Contains the encoded version number of the Vc headers. | |
#define | VC_VERSION_CHECK(major, minor, patch) |
Helper macro to compare against an encoded version number. |
SIMD Vector Size Macros | |
#define | VC_DOUBLE_V_SIZE |
An integer (for use with the preprocessor) that gives the number of entries in a double_v. | |
#define | VC_FLOAT_V_SIZE |
An integer (for use with the preprocessor) that gives the number of entries in a float_v. | |
#define | VC_SFLOAT_V_SIZE |
An integer (for use with the preprocessor) that gives the number of entries in a sfloat_v. | |
#define | VC_INT_V_SIZE |
An integer (for use with the preprocessor) that gives the number of entries in a int_v. | |
#define | VC_UINT_V_SIZE |
An integer (for use with the preprocessor) that gives the number of entries in a uint_v. | |
#define | VC_SHORT_V_SIZE |
An integer (for use with the preprocessor) that gives the number of entries in a short_v. | |
#define | VC_USHORT_V_SIZE |
An integer (for use with the preprocessor) that gives the number of entries in a ushort_v. |
#define VC_DECLARE_ALLOCATOR | ( | Type | ) |
Convenience macro to set the default allocator for a given Type
to Vc::Allocator.
Type | Your type that you want to use with STL containers. |
#define Vc_foreach_bit | ( | iterator, | |
mask | |||
) |
Loop over all set bits in the mask.
The iterator variable will be set to the position of the set bits. A mask of e.g. 00011010 would result in the loop being called with the iterator being set to 1, 3, and 4.
This allows you to write:
The example prints all the values in a
that are negative, and only those.
iterator | The iterator variable. For example "int i". |
mask | The mask to iterate over. You can also just write a vector operation that returns a mask. |
#define VC_VERSION_STRING |
Contains the version string of the Vc headers.
Same as Vc::versionString().
#define VC_VERSION_NUMBER |
Contains the encoded version number of the Vc headers.
Same as Vc::versionNumber().
#define VC_VERSION_CHECK | ( | major, | |
minor, | |||
patch | |||
) |
Helper macro to compare against an encoded version number.
Example:
enum MallocAlignment |
Enum that specifies the alignment and padding restrictions to use for memory allocation with Vc::malloc.
enum Implementation |
Enum to identify a certain SIMD instruction set.
You can use VC_IMPL for the currently active implementation.
enum ExtraInstructions |
The list of available instructions is not easily described by a linear list of instruction sets.
On x86 the following instruction sets always include their predecessors: SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2
But there are additional instructions that are not necessarily required by this list. These are covered in this enum.
unsigned int Vc::extraInstructionsSupported | ( | ) |
Determines the extra instructions supported by the current CPU.
bool Vc::isImplementationSupported | ( | Vc::Implementation | impl | ) |
Tests whether the given implementation is supported by the system the code is executing on.
true
if the OS and hardware support execution of instructions defined by impl
. false
otherwiseimpl | The SIMD target to test for. |
Vc::Implementation Vc::bestImplementationSupported | ( | ) |
Determines the best supported implementation for the current system.
bool Vc::currentImplementationSupported | ( | ) |
Tests that the CPU and Operating System support the vector unit which was compiled for.
This function should be called before any other Vc functionality is used. It checks whether the program will work. If this function returns false
then the program should exit with a useful error message before the OS has to kill it because of an invalid instruction exception.
If the program continues and makes use of any vector features not supported by hard- or software then the program will crash.
Example:
true
if the OS and hardware support execution of the currently selected SIMD instructions. false
otherwise void Vc::forceToRegisters | ( | const vec & | , |
... | |||
) |
Force the vectors passed to the function into registers.
This can be useful after looking at the emitted assembly to force the compiler to optimize properly.
std::ostream& operator<< | ( | std::ostream & | s, |
const Vc::MemoryBase< V, Parent, Dimension, RM > & | m | ||
) |
Prints the contents of a Memory object into a stream object.
will output (with SSE):
{[0, 1, 2, 3] [4, 5, 6, 7] [8, 9, 0, 0]}
s | Any standard C++ ostream object. For example std::cout or a std::stringstream object. |
m | Any Vc::Memory object. |
const char* Vc::versionString | ( | ) |
unsigned int Vc::versionNumber | ( | ) |
T* Vc::malloc | ( | size_t | n | ) |
Allocates memory on the Heap with alignment and padding suitable for vectorized access.
Memory that was allocated with this function must be released with Vc::free! Other methods might work but are not portable.
n | Specifies the number of objects the allocated memory must be able to store. |
T | The type of the allocated memory. Note, that the constructor is not called. |
A | Determines the alignment of the memory. See Vc::MallocAlignment. |
A
. Thus if you request memory for 21 int objects, aligned via Vc::AlignOnCacheline, you can safely read a full cacheline until the end of the array, without generating an out-of-bounds access. For a cacheline size of 64 Bytes and an int size of 4 Bytes you would thus get an array of 128 Bytes to work with.void Vc::free | ( | T * | p | ) |
Frees memory that was allocated with Vc::malloc.
p | The pointer to the memory to be freed. |
T | The type of the allocated memory. |
void Vc::prefetchForOneRead | ( | const void * | addr | ) |
Prefetch the cacheline containing addr
for a single read access.
This prefetch completely bypasses the cache, not evicting any other data.
addr | The cacheline containing addr will be prefetched. |
void Vc::prefetchForModify | ( | const void * | addr | ) |
Prefetch the cacheline containing addr
for modification.
This prefetch evicts data from the cache. So use it only for data you really will use. When the target system supports it the cacheline will be marked as modified while prefetching, saving work later on.
addr | The cacheline containing addr will be prefetched. |
void Vc::prefetchClose | ( | const void * | addr | ) |
Prefetch the cacheline containing addr
to L1 cache.
This prefetch evicts data from the cache. So use it only for data you really will use.
addr | The cacheline containing addr will be prefetched. |
void Vc::prefetchMid | ( | const void * | addr | ) |
Prefetch the cacheline containing addr
to L2 cache.
This prefetch evicts data from the cache. So use it only for data you really will use.
addr | The cacheline containing addr will be prefetched. |
void Vc::prefetchFar | ( | const void * | addr | ) |
Prefetch the cacheline containing addr
to L3 cache.
This prefetch evicts data from the cache. So use it only for data you really will use.
addr | The cacheline containing addr will be prefetched. |