Detailed Description

Additional classes, macros, and functions that help to work more easily with the main vector types.

Classes
class	CpuId
	This class is available for x86 / AMD64 systems to read and interpret information about the CPU's capabilities. More...
class	Allocator< T >
	An allocator that uses global new and supports over-aligned types, as per [C++11 20.6.9]. More...
class	VectorAlignedBase
	Helper class to ensure proper alignment. More...
class	VectorAlignedBaseT< V >
	Helper class to ensure proper alignment. More...
class	InterleavedMemoryWrapper< S, V >
	Wraps a pointer to memory with convenience functions to access it via vectors. More...
class	Memory< V, Size1, Size2 >
	A helper class for fixed-size two-dimensional arrays. More...
class	Memory< V, Size, 0u >
	A helper class to simplify usage of correctly aligned and padded memory, allowing both vector and scalar access. More...
class	Memory< V, 0u, 0u >
	A helper class that is very similar to Memory<V, Size> but with dynamically allocated memory and thus dynamic size. More...

Macros
#define	VC_DECLARE_ALLOCATOR(Type)
	Convenience macro to set the default allocator for a given `Type` to Vc::Allocator.
#define	Vc_foreach_bit(iterator, mask)
	Loop over all set bits in the mask.
#define	foreach_bit(iterator, mask)
	Alias for Vc_foreach_bit unless VC_CLEAN_NAMESPACE is defined.

Enumerations
enum	MallocAlignment { AlignOnVector, AlignOnCacheline, AlignOnPage }
	Enum that specifies the alignment and padding restrictions to use for memory allocation with Vc::malloc. More...
enum	Implementation { ScalarImpl, SSE2Impl, SSE3Impl, SSSE3Impl, SSE41Impl, SSE42Impl, AVXImpl, AVX2Impl }
	Enum to identify a certain SIMD instruction set. More...
enum	ExtraInstructions { Float16cInstructions = 0x01000, Fma4Instructions = 0x02000, XopInstructions = 0x04000, PopcntInstructions = 0x08000, Sse4aInstructions = 0x10000, FmaInstructions = 0x20000 }
	The list of available instructions is not easily described by a linear list of instruction sets. More...

Functions
void	forceToRegisters (const vec &,...)
	Force the vectors passed to the function into registers.
template<typename V , typename Parent , typename Dimension , typename RM >
std::ostream &	operator<< (std::ostream &s, const Vc::MemoryBase< V, Parent, Dimension, RM > &m)
	Prints the contents of a Memory object into a stream object.
const char *	versionString ()
unsigned int	versionNumber ()
template<typename T , Vc::MallocAlignment A>
T *	malloc (size_t n)
	Allocates memory on the Heap with alignment and padding suitable for vectorized access.
template<typename T >
void	free (T *p)
	Frees memory that was allocated with Vc::malloc.
void	prefetchForOneRead (const void *addr)
	Prefetch the cacheline containing `addr` for a single read access.
void	prefetchForModify (const void *addr)
	Prefetch the cacheline containing `addr` for modification.
void	prefetchClose (const void *addr)
	Prefetch the cacheline containing `addr` to L1 cache.
void	prefetchMid (const void *addr)
	Prefetch the cacheline containing `addr` to L2 cache.
void	prefetchFar (const void *addr)
	Prefetch the cacheline containing `addr` to L3 cache.

Micro-Architecture Feature Tests
unsigned int	extraInstructionsSupported ()
	Determines the extra instructions supported by the current CPU.
bool	isImplementationSupported (Vc::Implementation impl)
	Tests whether the given implementation is supported by the system the code is executing on.
Vc::Implementation	bestImplementationSupported ()
	Determines the best supported implementation for the current system.
bool	currentImplementationSupported ()
	Tests that the CPU and Operating System support the vector unit which was compiled for.

SIMD Support Feature Macros
#define	VC_IMPL
	This macro is set to the value of Vc::Implementation that the current translation unit is compiled with.
#define	VC_IMPL_XOP
	This macro is defined if the current translation unit is compiled with XOP instruction support.
#define	VC_IMPL_FMA4
	This macro is defined if the current translation unit is compiled with FMA4 instruction support.
#define	VC_IMPL_F16C
	This macro is defined if the current translation unit is compiled with F16C instruction support.
#define	VC_IMPL_POPCNT
	This macro is defined if the current translation unit is compiled with POPCNT instruction support.
#define	VC_IMPL_SSE4a
	This macro is defined if the current translation unit is compiled with SSE4a instruction support.
#define	VC_IMPL_Scalar
	This macro is defined if the current translation unit is compiled without any SIMD support.
#define	VC_IMPL_SSE
	This macro is defined if the current translation unit is compiled with any version of SSE (but not AVX).
#define	VC_IMPL_SSE2
	This macro is defined if the current translation unit is compiled with SSE2 instruction support (excluding SSE3 and up).
#define	VC_IMPL_SSE3
	This macro is defined if the current translation unit is compiled with SSE3 instruction support (excluding SSSE3 and up).
#define	VC_IMPL_SSSE3
	This macro is defined if the current translation unit is compiled with SSSE3 instruction support (excluding SSE4.1 and up).
#define	VC_IMPL_SSE4_1
	This macro is defined if the current translation unit is compiled with SSE4.1 instruction support (excluding SSE4.2 and up).
#define	VC_IMPL_SSE4_2
	This macro is defined if the current translation unit is compiled with SSE4.2 instruction support (excluding AVX and up).
#define	VC_IMPL_AVX
	This macro is defined if the current translation unit is compiled with AVX instruction support (excluding AVX2 and up).

Version Macros
#define	VC_VERSION_STRING
	Contains the version string of the Vc headers.
#define	VC_VERSION_NUMBER
	Contains the encoded version number of the Vc headers.
#define	VC_VERSION_CHECK(major, minor, patch)
	Helper macro to compare against an encoded version number.

SIMD Vector Size Macros
#define	VC_DOUBLE_V_SIZE
	An integer (for use with the preprocessor) that gives the number of entries in a double_v.
#define	VC_FLOAT_V_SIZE
	An integer (for use with the preprocessor) that gives the number of entries in a float_v.
#define	VC_SFLOAT_V_SIZE
	An integer (for use with the preprocessor) that gives the number of entries in a sfloat_v.
#define	VC_INT_V_SIZE
	An integer (for use with the preprocessor) that gives the number of entries in a int_v.
#define	VC_UINT_V_SIZE
	An integer (for use with the preprocessor) that gives the number of entries in a uint_v.
#define	VC_SHORT_V_SIZE
	An integer (for use with the preprocessor) that gives the number of entries in a short_v.
#define	VC_USHORT_V_SIZE
	An integer (for use with the preprocessor) that gives the number of entries in a ushort_v.

Macro Definition Documentation

#define VC_DECLARE_ALLOCATOR ( Type )

Value:

namespace std \
{ \
    template<> class allocator<Type> : public ::Vc::Allocator<Type> \
    { \
    public: \
        template<typename U> struct rebind { typedef ::std::allocator<U> other; }; \
    }; \
}

Convenience macro to set the default allocator for a given Type to Vc::Allocator.

Parameters

Type	Your type that you want to use with STL containers.

Note: You have to use this macro in the global namespace.

#define Vc_foreach_bit	(	iterator,
		mask
	)

Loop over all set bits in the mask.

The iterator variable will be set to the position of the set bits. A mask of e.g. 00011010 would result in the loop being called with the iterator being set to 1, 3, and 4.

This allows you to write:

float_v a = ...;
Vc_foreach_bit(int i, a < 0.f) {
  std::cout << a[i] << "\n";
}

The example prints all the values in a that are negative, and only those.

Parameters

iterator	The iterator variable. For example "int i".
mask	The mask to iterate over. You can also just write a vector operation that returns a mask.

Note: Since Vc 0.7 break and continue are supported in foreach_bit loops.

#define VC_VERSION_STRING

Contains the version string of the Vc headers.

Same as Vc::versionString().

#define VC_VERSION_NUMBER

Contains the encoded version number of the Vc headers.

Same as Vc::versionNumber().

#define VC_VERSION_CHECK	(	major,
		minor,
		patch
	)

Helper macro to compare against an encoded version number.

Example:

#if VC_VERSION_CHECK(0.5.1) >= VC_VERSION_NUMBER

Enumeration Type Documentation

enum MallocAlignment

Enum that specifies the alignment and padding restrictions to use for memory allocation with Vc::malloc.

Enumerator:

AlignOnVector

Align on boundary of vector sizes (e.g.

16 Bytes on SSE platforms) and pad to allow vector access to the end. Thus the allocated memory contains a multiple of VectorAlignment bytes.

AlignOnCacheline

Align on boundary of cache line sizes (e.g.

64 Bytes on x86) and pad to allow full cache line access to the end. Thus the allocated memory contains a multiple of 64 bytes.

AlignOnPage

Align on boundary of page sizes (e.g.

4096 Bytes on x86) and pad to allow full page access to the end. Thus the allocated memory contains a multiple of 4096 bytes.

enum Implementation

Enum to identify a certain SIMD instruction set.

You can use VC_IMPL for the currently active implementation.

See Also: ExtraInstructions

Enumerator:

ScalarImpl	uses only fundamental types
SSE2Impl	x86 SSE + SSE2
SSE3Impl	x86 SSE + SSE2 + SSE3
SSSE3Impl	x86 SSE + SSE2 + SSE3 + SSSE3
SSE41Impl	x86 SSE + SSE2 + SSE3 + SSSE3 + SSE4.1
SSE42Impl	x86 SSE + SSE2 + SSE3 + SSSE3 + SSE4.1 + SSE4.2
AVXImpl	x86 AVX
AVX2Impl	x86 AVX + AVX2

enum ExtraInstructions

The list of available instructions is not easily described by a linear list of instruction sets.

On x86 the following instruction sets always include their predecessors: SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2

But there are additional instructions that are not necessarily required by this list. These are covered in this enum.

Enumerator:

Float16cInstructions	Support for float16 conversions in hardware.
Fma4Instructions	Support for FMA4 instructions.
XopInstructions	Support for XOP instructions.
PopcntInstructions	Support for the population count instruction.
Sse4aInstructions	Support for SSE4a instructions.
FmaInstructions	Support for FMA instructions (3 operand variant)

Function Documentation

unsigned int Vc::extraInstructionsSupported ( )

Determines the extra instructions supported by the current CPU.

Returns: A combination of flags from Vc::ExtraInstructions that the current CPU supports.

bool Vc::isImplementationSupported ( Vc::Implementation impl )

Tests whether the given implementation is supported by the system the code is executing on.

Returns: true if the OS and hardware support execution of instructions defined by impl.; false otherwise

Parameters

impl	The SIMD target to test for.

Vc::Implementation Vc::bestImplementationSupported ( )

Determines the best supported implementation for the current system.

Returns: The enum value for the best implementation.

bool Vc::currentImplementationSupported ( )

Tests that the CPU and Operating System support the vector unit which was compiled for.

This function should be called before any other Vc functionality is used. It checks whether the program will work. If this function returns false then the program should exit with a useful error message before the OS has to kill it because of an invalid instruction exception.

If the program continues and makes use of any vector features not supported by hard- or software then the program will crash.

Example:

int main()
{
  if (!Vc::currentImplementationSupported()) {
    std::cerr << "CPU or OS requirements not met for the compiled in vector unit!\n";
    exit -1;
  }
  ...
}

Returns: true if the OS and hardware support execution of the currently selected SIMD instructions.; false otherwise

void Vc::forceToRegisters	(	const vec &	,
			...
	)

Force the vectors passed to the function into registers.

This can be useful after looking at the emitted assembly to force the compiler to optimize properly.

Note: Currently only has an effect for SSE vectors.; MSVC does not support this function at all.

Warning: Be careful with this function, especially since it can render the compiler unable to compile for 32 bit systems if it forces more than 8 vectors in registers.

std::ostream& operator<<	(	std::ostream &	s,
		const Vc::MemoryBase< V, Parent, Dimension, RM > &	m
	)

Prints the contents of a Memory object into a stream object.

Vc::Memory<int_v, 10> m;
for (int i = 0; i < m.entriesCount(); ++i) {
  m[i] = i;
}
std::cout << m << std::endl;

will output (with SSE):

{[0, 1, 2, 3] [4, 5, 6, 7] [8, 9, 0, 0]}

Parameters

s	Any standard C++ ostream object. For example std::cout or a std::stringstream object.
m	Any Vc::Memory object.

Returns: The ostream object: to chain multiple stream operations.

Note: With the GNU standard library this function will check, whether the output stream is a tty. In that case it will colorize the output.

Warning: Please do not forget that printing a large memory object can take a long time.

const char* Vc::versionString ( )

Returns: the version string of the Vc headers.

Note: There exists a built-in check that ensures on application startup that the Vc version of the library (link time) and the headers (compile time) are equal. A mismatch between headers and library could lead to errors that are very hard to debug.; If you need to disable the check (it costs a very small amount of application startup time) you can define VC_NO_VERSION_CHECK at compile time.

unsigned int Vc::versionNumber ( )

Returns: the version of the Vc headers encoded in an integer.

T* Vc::malloc ( size_t n )

Allocates memory on the Heap with alignment and padding suitable for vectorized access.

Memory that was allocated with this function must be released with Vc::free! Other methods might work but are not portable.

Parameters

n	Specifies the number of objects the allocated memory must be able to store.

Template Parameters

T	The type of the allocated memory. Note, that the constructor is not called.
A	Determines the alignment of the memory. See Vc::MallocAlignment.

Returns: Pointer to memory of the requested type, or 0 on error. The allocated memory is padded at the end to be a multiple of the requested alignment A. Thus if you request memory for 21 int objects, aligned via Vc::AlignOnCacheline, you can safely read a full cacheline until the end of the array, without generating an out-of-bounds access. For a cacheline size of 64 Bytes and an int size of 4 Bytes you would thus get an array of 128 Bytes to work with.

Warning

The standard malloc function specifies the number of Bytes to allocate whereas this function specifies the number of values, thus differing in a factor of sizeof(T).
This function is mainly meant for use with builtin types. If you use a custom type with a sizeof that is not a multiple of 2 the results might not be what you expect.
The constructor of T is not called. You can make up for this:
SomeType *array = new(Vc::malloc<SomeType, Vc::AlignOnCacheline>(N)) SomeType[N];

See Also: Vc::free

void Vc::free ( T * p )

Frees memory that was allocated with Vc::malloc.

Parameters

p	The pointer to the memory to be freed.

Template Parameters

T	The type of the allocated memory.

Warning: The destructor of T is not called. If needed, you can call the destructor before calling free:
for (int i = 0; i < N; ++i) {

p[i].~T();

}

Vc::free(p);

See Also: Vc::malloc

void Vc::prefetchForOneRead ( const void * addr )

Prefetch the cacheline containing addr for a single read access.

This prefetch completely bypasses the cache, not evicting any other data.

Parameters

addr	The cacheline containing `addr` will be prefetched.

void Vc::prefetchForModify ( const void * addr )

Prefetch the cacheline containing addr for modification.

This prefetch evicts data from the cache. So use it only for data you really will use. When the target system supports it the cacheline will be marked as modified while prefetching, saving work later on.

Parameters

addr	The cacheline containing `addr` will be prefetched.

void Vc::prefetchClose ( const void * addr )

Prefetch the cacheline containing addr to L1 cache.

This prefetch evicts data from the cache. So use it only for data you really will use.

Parameters

addr	The cacheline containing `addr` will be prefetched.

void Vc::prefetchMid ( const void * addr )

Prefetch the cacheline containing addr to L2 cache.

This prefetch evicts data from the cache. So use it only for data you really will use.

Parameters

addr	The cacheline containing `addr` will be prefetched.

void Vc::prefetchFar ( const void * addr )

Prefetch the cacheline containing addr to L3 cache.

This prefetch evicts data from the cache. So use it only for data you really will use.

Parameters

addr	The cacheline containing `addr` will be prefetched.

Detailed Description

Classes

Macros

Enumerations

Functions

Micro-Architecture Feature Tests

SIMD Support Feature Macros

Version Macros

SIMD Vector Size Macros

Macro Definition Documentation

Enumeration Type Documentation

Function Documentation