Detailed Description

Additional classes, macros, and functions that help to work more easily with the main vector types.

Classes
class	CpuId
	This class is available for x86 / AMD64 systems to read and interpret information about the CPU's capabilities. More...

struct	ImplementationT< Features >
	This class identifies the specific implementation Vc uses in the current translation unit in terms of a type. More...

class	Allocator< T >
	An allocator that uses global new and supports over-aligned types, as per [C++11 20.6.9]. More...

struct	AlignedBase< Alignment >
	Helper class to ensure a given alignment. More...

class	InterleavedMemoryWrapper< S, V >
	Wraps a pointer to memory with convenience functions to access it via vectors. More...

class	Memory< V, Size1, Size2, InitPadding >
	A helper class for fixed-size two-dimensional arrays. More...

class	Memory< V, Size, 0u, InitPadding >
	A helper class to simplify usage of correctly aligned and padded memory, allowing both vector and scalar access. More...

class	Memory< V, 0u, 0u, true >
	A helper class that is very similar to Memory<V, Size> but with dynamically allocated memory and thus dynamic size. More...

Macros
#define	Vc_DECLARE_ALLOCATOR(Type)
	Convenience macro to set the default allocator for a given `Type` to Vc::Allocator. More...

Typedefs
using	CurrentImplementation = ImplementationT< >
	Identifies the Vc implementation used in the current translation unit. More...

template<typename T , typename Allocator = std::allocator<T>>
using	vector = Common::AdaptSubscriptOperator< std::vector< T, Allocator >>
	An adapted `std::vector` container with an additional subscript operator which implements gather and scatter operations. More...

using	VectorAlignedBase = AlignedBase< Detail::max(alignof(Vector< float >), alignof(Vector< double >), alignof(Vector< ullong >), alignof(Vector< llong >), alignof(Vector< ulong >), alignof(Vector< long >), alignof(Vector< uint >), alignof(Vector< int >), alignof(Vector< ushort >), alignof(Vector< short >), alignof(Vector< uchar >), alignof(Vector< schar >))>
	Helper type to ensure suitable alignment for any Vc::Vector<T> type (using the default VectorAbi). More...

template<typename V >
using	VectorAlignedBaseT = AlignedBase< alignof(V)>
	Variant of the above type ensuring suitable alignment only for the specified vector type `V`. More...

using	MemoryAlignedBase = AlignedBase< Detail::max(Vector< float >::MemoryAlignment, Vector< double >::MemoryAlignment, Vector< ullong >::MemoryAlignment, Vector< llong >::MemoryAlignment, Vector< ulong >::MemoryAlignment, Vector< long >::MemoryAlignment, Vector< uint >::MemoryAlignment, Vector< int >::MemoryAlignment, Vector< ushort >::MemoryAlignment, Vector< short >::MemoryAlignment, Vector< uchar >::MemoryAlignment, Vector< schar >::MemoryAlignment)>
	Helper class to ensure suitable alignment for arrays of scalar objects for any Vc::Vector<T> type (using the default VectorAbi). More...

template<typename V >
using	MemoryAlignedBaseT = AlignedBase< V::MemoryAlignment >
	Variant of the above type ensuring suitable alignment only for the specified vector type `V`. More...

using	llong = long long
	long long shorthand

using	ullong = unsigned long long
	unsigned long long shorthand

using	ulong = unsigned long
	unsigned long shorthand

using	uint = unsigned int
	unsigned int shorthand

using	ushort = unsigned short
	unsigned short shorthand

using	uchar = unsigned char
	unsigned char shorthand

using	schar = signed char
	signed char shorthand

Enumerations
enum	MallocAlignment { AlignOnVector, AlignOnCacheline, AlignOnPage }
	Enum that specifies the alignment and padding restrictions to use for memory allocation with Vc::malloc. More...

enum	Implementation : std::uint_least32_t { ScalarImpl, SSE2Impl, SSE3Impl, SSSE3Impl, SSE41Impl, SSE42Impl, AVXImpl, AVX2Impl, MICImpl }
	Enum to identify a certain SIMD instruction set. More...

enum	ExtraInstructions : std::uint_least32_t { Float16cInstructions = 0x01000, Fma4Instructions = 0x02000, XopInstructions = 0x04000, PopcntInstructions = 0x08000, Sse4aInstructions = 0x10000, FmaInstructions = 0x20000, VexInstructions = 0x40000, Bmi2Instructions = 0x80000 }
	The list of available instructions is not easily described by a linear list of instruction sets. More...

Functions
const char *	versionString ()

constexpr unsigned int	versionNumber ()

template<typename V , typename Parent , typename Dimension , typename RM >
std::ostream &	operator<< (std::ostream &s, const Vc::MemoryBase< V, Parent, Dimension, RM > &m)
	Prints the contents of a Memory object into a stream object. More...

template<typename Mask , typename T >
enable_if< is_simd_mask< Mask >::value &&is_simd_vector< T >::value, T >	iif (const Mask &condition, const T &trueValue, const T &falseValue)
	Function to mimic the ternary operator '?:' (inline-if). More...

template<typename T >
constexpr T	iif (bool condition, const T &trueValue, const T &falseValue)
	Overload of the above for boolean conditions. More...

template<typename V , typename = enable_if<Traits::is_simd_vector<V>::value>>
std::pair< V, V >	interleave (const V &a, const V &b)
	Interleaves the entries from `a` and `b` into two vectors of the same type. More...

template<typename Container , typename T >
constexpr auto	makeContainer (std::initializer_list< T > list) -> decltype(make_container_helper< Container, T >::help(list))
	Construct a container of Vc vectors from a std::initializer_list of scalar entries. More...

template<typename T , Vc::MallocAlignment A>
T *	malloc (size_t n)
	Allocates memory on the Heap with alignment and padding suitable for vectorized access. More...

template<typename T >
void	free (T *p)
	Frees memory that was allocated with Vc::malloc. More...

void	prefetchForOneRead (const void *addr)
	Prefetch the cacheline containing `addr` for a single read access. More...

void	prefetchForModify (const void *addr)
	Prefetch the cacheline containing `addr` for modification. More...

void	prefetchClose (const void *addr)
	Prefetch the cacheline containing `addr` to L1 cache. More...

void	prefetchMid (const void *addr)
	Prefetch the cacheline containing `addr` to L2 cache. More...

void	prefetchFar (const void *addr)
	Prefetch the cacheline containing `addr` to L3 cache. More...

template<typename M >
constexpr WhereImpl::WhereMask< M >	where (const M &mask)
	Conditional assignment. More...

Variables
constexpr AlignedTag	Aligned
	Use this object for a `flags` parameter to request aligned loads and stores. More...

constexpr UnalignedTag	Unaligned
	Use this object for a `flags` parameter to request unaligned loads and stores. More...

constexpr StreamingTag	Streaming
	Use this object for a `flags` parameter to request streaming loads and stores. More...

constexpr LoadStoreFlags::LoadStoreFlags< PrefetchFlag<> >	PrefetchDefault
	Use this object for a `flags` parameter to request default software prefetches to be emitted.

constexpr VectorSpecialInitializerZero	Zero = {}
	The special object `Vc::Zero` can be used to construct Vector and Mask objects initialized to zero/`false`.

constexpr VectorSpecialInitializerOne	One = {}
	The special object `Vc::One` can be used to construct Vector and Mask objects initialized to one/`true`.

constexpr VectorSpecialInitializerIndexesFromZero	IndexesFromZero = {}
	The special object `Vc::IndexesFromZero` can be used to construct Vector objects initialized to values 0, 1, 2, 3, 4, ...

Compiler Identification Macros
#define	Vc_ICC __INTEL_COMPILER_BUILD_DATE
	This macro is defined to a number identifying the ICC version if the current translation unit is compiled with the Intel compiler. More...

#define	Vc_CLANG (__clang_major__ * 0x10000 + __clang_minor__ * 0x100 + __clang_patchlevel__)
	This macro is defined to a number identifying the Clang version if the current translation unit is compiled with the Clang compiler. More...

#define	Vc_GCC (__GNUC__ * 0x10000 + __GNUC_MINOR__ * 0x100 + __GNUC_PATCHLEVEL__)
	This macro is defined to a number identifying the GCC version if the current translation unit is compiled with the GCC compiler. More...

#define	Vc_MSVC _MSC_FULL_VER
	This macro is defined to a number identifying the Microsoft Visual C++ version if the current translation unit is compiled with the Visual C++ (MSVC) compiler. More...

#define	Vc_PASSING_VECTOR_BY_VALUE_IS_BROKEN 1
	This macro is defined if the compiler disallows passing over-aligned types by value. More...

Micro-Architecture Feature Tests
unsigned int	extraInstructionsSupported ()
	Determines the extra instructions supported by the current CPU. More...

bool	isImplementationSupported (Vc::Implementation impl)
	Tests whether the given implementation is supported by the system the code is executing on. More...

Vc::Implementation	bestImplementationSupported ()
	Determines the best supported implementation for the current system. More...

bool	currentImplementationSupported ()
	Tests that the CPU and Operating System support the vector unit which was compiled for. More...

Version Macros
#define	Vc_VERSION_STRING "1.1.0"
	Contains the version string of the Vc headers. More...

#define	Vc_VERSION_NUMBER 0x010100
	Contains the encoded version number of the Vc headers. More...

#define	Vc_VERSION_CHECK(major, minor, patch) ((major << 16) \| (minor << 8) \| (patch << 1))
	Helper macro to compare against an encoded version number. More...

SIMD Support Feature Macros
#define	Vc_IMPL_XOP
	This macro is defined if the current translation unit is compiled with XOP instruction support.

#define	Vc_IMPL_FMA4
	This macro is defined if the current translation unit is compiled with FMA4 instruction support.

#define	Vc_IMPL_F16C
	This macro is defined if the current translation unit is compiled with F16C instruction support.

#define	Vc_IMPL_POPCNT
	This macro is defined if the current translation unit is compiled with POPCNT instruction support.

#define	Vc_IMPL_SSE4a
	This macro is defined if the current translation unit is compiled with SSE4a instruction support.

#define	Vc_IMPL_Scalar
	This macro is defined if the current translation unit is compiled without any SIMD support.

#define	Vc_IMPL_SSE
	This macro is defined if the current translation unit is compiled with any version of SSE (but not AVX).

#define	Vc_IMPL_SSE2
	This macro is defined if the current translation unit is compiled with SSE2 instruction support (excluding SSE3 and up).

#define	Vc_IMPL_SSE3
	This macro is defined if the current translation unit is compiled with SSE3 instruction support (excluding SSSE3 and up).

#define	Vc_IMPL_SSSE3
	This macro is defined if the current translation unit is compiled with SSSE3 instruction support (excluding SSE4.1 and up).

#define	Vc_IMPL_SSE4_1
	This macro is defined if the current translation unit is compiled with SSE4.1 instruction support (excluding SSE4.2 and up).

#define	Vc_IMPL_SSE4_2
	This macro is defined if the current translation unit is compiled with SSE4.2 instruction support (excluding AVX and up).

#define	Vc_IMPL_AVX
	This macro is defined if the current translation unit is compiled with AVX instruction support (excluding AVX2 and up).

#define	Vc_IMPL_AVX2
	This macro is defined if the current translation unit is compiled with AVX2 instruction support.

#define	Vc_IMPL_MIC
	This macro is defined if the current translation unit is compiled for the Knights Corner Xeon Phi instruction set.

SIMD Vector Size Macros
#define	Vc_DOUBLE_V_SIZE
	An integer (for use with the preprocessor) that gives the number of entries in a double_v.

#define	Vc_FLOAT_V_SIZE
	An integer (for use with the preprocessor) that gives the number of entries in a float_v.

#define	Vc_INT_V_SIZE
	An integer (for use with the preprocessor) that gives the number of entries in a int_v.

#define	Vc_UINT_V_SIZE
	An integer (for use with the preprocessor) that gives the number of entries in a uint_v.

#define	Vc_SHORT_V_SIZE
	An integer (for use with the preprocessor) that gives the number of entries in a short_v.

#define	Vc_USHORT_V_SIZE
	An integer (for use with the preprocessor) that gives the number of entries in a ushort_v.

Boolean Reductions
template<typename Mask >
constexpr bool	all_of (const Mask &m)
	Returns whether all entries in the mask `m` are `true`.

constexpr bool	all_of (bool b)
	Returns `b`.

template<typename Mask >
constexpr bool	any_of (const Mask &m)
	Returns whether at least one entry in the mask `m` is `true`.

constexpr bool	any_of (bool b)
	Returns `b`.

template<typename Mask >
constexpr bool	none_of (const Mask &m)
	Returns whether all entries in the mask `m` are `false`.

constexpr bool	none_of (bool b)
	Returns `!b`.

template<typename Mask >
constexpr bool	some_of (const Mask &m)
	Returns whether at least one entry in `m` is `true` and at least one entry in `m` is `false`.

constexpr bool	some_of (bool)
	Returns `false`.

Macro Definition Documentation

#define Vc_ICC __INTEL_COMPILER_BUILD_DATE

This macro is defined to a number identifying the ICC version if the current translation unit is compiled with the Intel compiler.

For any other compiler this macro is not defined.

Definition at line 48 of file global.h.

#define Vc_CLANG (__clang_major__ * 0x10000 + __clang_minor__ * 0x100 + __clang_patchlevel__)

This macro is defined to a number identifying the Clang version if the current translation unit is compiled with the Clang compiler.

For any other compiler this macro is not defined.

Definition at line 57 of file global.h.

#define Vc_GCC (__GNUC__ * 0x10000 + __GNUC_MINOR__ * 0x100 + __GNUC_PATCHLEVEL__)

This macro is defined to a number identifying the GCC version if the current translation unit is compiled with the GCC compiler.

For any other compiler this macro is not defined.

Definition at line 66 of file global.h.

#define Vc_MSVC _MSC_FULL_VER

This macro is defined to a number identifying the Microsoft Visual C++ version if the current translation unit is compiled with the Visual C++ (MSVC) compiler.

For any other compiler this macro is not defined.

Definition at line 74 of file global.h.

#define Vc_PASSING_VECTOR_BY_VALUE_IS_BROKEN 1

This macro is defined if the compiler disallows passing over-aligned types by value.

If this is the case you must use parameter passing by const-ref exclusively.

Note: This is a bug in the compiler (or rather it's restriction to inefficient function call conventions). You may be able to work around the issue with a better (i.e. sane) calling convention.

Definition at line 86 of file global.h.

#define Vc_VERSION_STRING "1.1.0"

Contains the version string of the Vc headers.

Same as Vc::versionString().

Definition at line 41 of file version.h.

Referenced by Vc::versionString().

#define Vc_VERSION_NUMBER 0x010100

Contains the encoded version number of the Vc headers.

Same as Vc::versionNumber().

Definition at line 47 of file version.h.

Referenced by Vc::versionNumber().

#define Vc_VERSION_CHECK	(	major,
		minor,
		patch
	)	((major << 16) \| (minor << 8) \| (patch << 1))

Helper macro to compare against an encoded version number.

Example:

#if Vc_VERSION_CHECK(1, 0, 0) >= Vc_VERSION_NUMBER

Definition at line 58 of file version.h.

#define Vc_DECLARE_ALLOCATOR ( Type )

Value:

namespace std                                                                        \
{                                                                                    \
template <> class allocator<Type> : public ::Vc::Allocator<Type>                     \
{                                                                                    \
public:                                                                              \
    template <typename U> struct rebind {                                            \
        typedef ::std::allocator<U> other;                                           \
    };                                                                               \
};                                                                                   \
}

Convenience macro to set the default allocator for a given Type to Vc::Allocator.

Parameters

Type	Your type that you want to use with STL containers.

Note: You have to use this macro in the global namespace.

Definition at line 66 of file Allocator.

Typedef Documentation

using CurrentImplementation = ImplementationT< >

Identifies the Vc implementation used in the current translation unit.

See also: ImplementationT

Definition at line 621 of file global.h.

using vector = Common::AdaptSubscriptOperator<std::vector<T, Allocator>>

An adapted std::vector container with an additional subscript operator which implements gather and scatter operations.

Example:

struct Point {
  float x, y;
};
Vc::vector<Point> data;
data.resize(100);
// initialize values in data
float_v::IndexType indexes = ...;  // values between 0-99
float_v x = data[indexes][&Point::x];
float_v y = data[indexes][&Point::y];

Definition at line 51 of file vector.

using VectorAlignedBase = AlignedBase< Detail::max(alignof(Vector<float>), alignof(Vector<double>), alignof(Vector<ullong>), alignof(Vector<llong>), alignof(Vector<ulong>), alignof(Vector<long>), alignof(Vector<uint>), alignof(Vector<int>), alignof(Vector<ushort>), alignof(Vector<short>), alignof(Vector<uchar>), alignof(Vector<schar>))>

Helper type to ensure suitable alignment for any Vc::Vector<T> type (using the default VectorAbi).

This class reimplements the new and delete operators to align objects allocated on the heap suitably for objects of Vc::Vector<T> type. This is necessary since the standard new operator does not adhere to the alignment requirements of the type.

See also: Vc::VectorAlignedBaseT; Vc::MemoryAlignedBase; Vc::AlignedBase

Definition at line 91 of file alignedbase.h.

using VectorAlignedBaseT = AlignedBase<alignof(V)>

Variant of the above type ensuring suitable alignment only for the specified vector type V.

See also: Vc::VectorAlignedBase; Vc::MemoryAlignedBaseT

Definition at line 101 of file alignedbase.h.

using MemoryAlignedBase = AlignedBase< Detail::max(Vector<float>::MemoryAlignment, Vector<double>::MemoryAlignment, Vector<ullong>::MemoryAlignment, Vector<llong>::MemoryAlignment, Vector<ulong>::MemoryAlignment, Vector<long>::MemoryAlignment, Vector<uint>::MemoryAlignment, Vector<int>::MemoryAlignment, Vector<ushort>::MemoryAlignment, Vector<short>::MemoryAlignment, Vector<uchar>::MemoryAlignment, Vector<schar>::MemoryAlignment)>

Helper class to ensure suitable alignment for arrays of scalar objects for any Vc::Vector<T> type (using the default VectorAbi).

This class reimplements the new and delete operators to align objects allocated on the heap suitably for arrays of type Vc::Vector<T>::EntryType. Subsequent load and store operations are safe to use the aligned variant.

See also: Vc::MemoryAlignedBaseT; Vc::VectorAlignedBase; Vc::AlignedBase

Definition at line 123 of file alignedbase.h.

using MemoryAlignedBaseT = AlignedBase<V::MemoryAlignment>

Variant of the above type ensuring suitable alignment only for the specified vector type V.

See also: Vc::MemoryAlignedBase; Vc::VectorAlignedBaseT

Definition at line 133 of file alignedbase.h.

Enumeration Type Documentation

enum MallocAlignment

Enum that specifies the alignment and padding restrictions to use for memory allocation with Vc::malloc.

Enumerator

AlignOnVector

Align on boundary of vector sizes (e.g.

16 Bytes on SSE platforms) and pad to allow vector access to the end. Thus the allocated memory contains a multiple of VectorAlignment bytes.

AlignOnCacheline

Align on boundary of cache line sizes (e.g.

64 Bytes on x86) and pad to allow full cache line access to the end. Thus the allocated memory contains a multiple of 64 bytes.

AlignOnPage

Align on boundary of page sizes (e.g.

4096 Bytes on x86) and pad to allow full page access to the end. Thus the allocated memory contains a multiple of 4096 bytes.

Definition at line 451 of file global.h.

enum Implementation : std::uint_least32_t

Enum to identify a certain SIMD instruction set.

You can use CurrentImplementation for the currently active implementation.

See also: ExtraInstructions

Enumerator
ScalarImpl	uses only fundamental types
SSE2Impl	x86 SSE + SSE2
SSE3Impl	x86 SSE + SSE2 + SSE3
SSSE3Impl	x86 SSE + SSE2 + SSE3 + SSSE3
SSE41Impl	x86 SSE + SSE2 + SSE3 + SSSE3 + SSE4.1
SSE42Impl	x86 SSE + SSE2 + SSE3 + SSSE3 + SSE4.1 + SSE4.2
AVXImpl	x86 AVX
AVX2Impl	x86 AVX + AVX2
MICImpl	Intel Xeon Phi.

Definition at line 481 of file global.h.

enum ExtraInstructions : std::uint_least32_t

The list of available instructions is not easily described by a linear list of instruction sets.

On x86 the following instruction sets always include their predecessors: SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2

But there are additional instructions that are not necessarily required by this list. These are covered in this enum.

Enumerator
Float16cInstructions	Support for float16 conversions in hardware.
Fma4Instructions	Support for FMA4 instructions.
XopInstructions	Support for XOP instructions.
PopcntInstructions	Support for the population count instruction.
Sse4aInstructions	Support for SSE4a instructions.
FmaInstructions	Support for FMA instructions (3 operand variant)
VexInstructions	Support for ternary instruction coding (VEX)
Bmi2Instructions	Support for BMI2 instructions.

Definition at line 513 of file global.h.

Function Documentation

unsigned int Vc::extraInstructionsSupported ( )

Determines the extra instructions supported by the current CPU.

Returns: A combination of flags from Vc::ExtraInstructions that the current CPU supports.

bool Vc::isImplementationSupported ( Vc::Implementation impl )

Tests whether the given implementation is supported by the system the code is executing on.

Returns: true if the OS and hardware support execution of instructions defined by impl.; false otherwise

Parameters

impl	The SIMD target to test for.

Vc::Implementation Vc::bestImplementationSupported ( )

Determines the best supported implementation for the current system.

Returns: The enum value for the best implementation.

bool Vc::currentImplementationSupported ( )

inline

Tests that the CPU and Operating System support the vector unit which was compiled for.

This function should be called before any other Vc functionality is used. It checks whether the program will work. If this function returns false then the program should exit with a useful error message before the OS has to kill it because of an invalid instruction exception.

If the program continues and makes use of any vector features not supported by hard- or software then the program will crash.

Example:

int main()
{
  if (!Vc::currentImplementationSupported()) {
    std::cerr << "CPU or OS requirements not met for the compiled in vector unit!\n";
    exit -1;
  }
  ...
}

Returns: true if the OS and hardware support execution of the currently selected SIMD instructions.; false otherwise

Definition at line 147 of file support.h.

const char* Vc::versionString ( )

inline

Returns: the version string of the Vc headers.

Note: There exists a built-in check that ensures on application startup that the Vc version of the library (link time) and the headers (compile time) are equal. A mismatch between headers and library could lead to errors that are very hard to debug.; If you need to disable the check (it costs a very small amount of application startup time) you can define Vc_NO_VERSION_CHECK at compile time.

Definition at line 77 of file version.h.

constexpr unsigned int Vc::versionNumber ( )

Returns: the version of the Vc headers encoded in an integer.

Definition at line 85 of file version.h.

std::ostream& Vc::Common::operator<<	(	std::ostream &	s,
		const Vc::MemoryBase< V, Parent, Dimension, RM > &	m
	)

inline

Prints the contents of a Memory object into a stream object.

Vc::Memory<int_v, 10> m;
for (int i = 0; i < m.entriesCount(); ++i) {
  m[i] = i;
}
std::cout << m << std::endl;

will output (with SSE):

{[0, 1, 2, 3] [4, 5, 6, 7] [8, 9, 0, 0]}

Parameters

s	Any standard C++ ostream object. For example std::cout or a std::stringstream object.
m	Any Vc::Memory object.

Returns: The ostream object: to chain multiple stream operations.

Note: With the GNU standard library this function will check whether the output stream is a tty in which case it colorizes the output.

Warning: Please do not forget that printing a large memory object can take a long time.

enable_if<is_simd_mask<Mask>::value && is_simd_vector<T>::value, T> Vc::iif	(	const Mask &	condition,
		const T &	trueValue,
		const T &	falseValue
	)

inlinedelete

Function to mimic the ternary operator '?:' (inline-if).

Parameters

condition	Determines which values are returned. This is analog to the first argument to the ternary operator.
trueValue	The values to return where `condition` is `true`.
falseValue	The values to return where `condition` is `false`.

Returns: A combination of entries from trueValue and falseValue, according to condition.

So instead of the scalar variant

float x = a > 1.f ? b : b + c;

you'd write

float_v x = Vc::iif (a > 1.f, b, b + c);

Assuming a has the values [0, 3, 5, 1], b is [1, 1, 1, 1], and c is [1, 2, 3, 4], then x will be [2, 2, 3, 5].

Definition at line 61 of file iif.h.

constexpr T Vc::iif	(	bool	condition,
		const T &	trueValue,
		const T &	falseValue
	)

Overload of the above for boolean conditions.

This typically results in direct use of the ternary operator. This function makes it easier to switch from a Vc type to a builtin type.

Parameters

condition	Determines which value is returned. This is analog to the first argument to the ternary operator.
trueValue	The value to return if `condition` is `true`.
falseValue	The value to return if `condition` is `false`.

Returns: Either trueValue or falseValue, depending on condition.

Definition at line 91 of file iif.h.

std::pair<V, V> Vc::interleave	(	const V &	a,
		const V &	b
	)

Interleaves the entries from a and b into two vectors of the same type.

The order in the returned vector contains the elements a[0], b[0], a[1], b[1], a[2], b[2], a[3], b[3], ....

Example:

Vc::SimdArray<int, 4> a = { 1, 2, 3, 4 };
Vc::SimdArray<int, 4> b = { 9, 8, 7, 6 };
std::tie(a, b) = Vc::interleave(a, b);
std::cout << a << b;
// prints:
// <1 9 2 8><3 7 4 6>

Parameters

a	input vector whose data will appear at even indexes in the output
b	input vector whose data will appear at odd indexes in the output

Returns: two vectors with data from a and b interleaved

Definition at line 56 of file interleave.h.

constexpr auto Vc::makeContainer ( std::initializer_list< T > list ) -> decltype(make_container_helper<Container, T>::help(list))

Construct a container of Vc vectors from a std::initializer_list of scalar entries.

Parameters

list	An initializer list of arbitrary size. The type of the entries is important! If you pass a list of integers you will get a container filled with Vc::int_v objects. If, instead, you want to have a container of Vc::float_v objects, be sure the include a period (.) and the 'f' postfix in the literals.

Returns: Returns a container of the requested class filled with the minimum number of SIMD vectors to hold the values in the initializer list.

Example:

auto data = Vc::makeContainer<std::vector<float_v>>({ 1.f, 2.f, 3.f, 4.f, 5.f });
// data.size() == 5 if float_v::Size == 1 (i.e. Vc_IMPL=Scalar)
// data.size() == 2 if float_v::Size == 4 (i.e. Vc_IMPL=SSE)
// data.size() == 1 if float_v::Size == 8 (i.e. Vc_IMPL=AVX)

Definition at line 122 of file makeContainer.h.

T* Vc::malloc ( size_t n )

inline

Allocates memory on the Heap with alignment and padding suitable for vectorized access.

Memory that was allocated with this function must be released with Vc::free! Other methods might work but are not portable.

Parameters

n	Specifies the number of objects the allocated memory must be able to store.

Template Parameters

T	The type of the allocated memory. Note, that the constructor is not called.
A	Determines the alignment of the memory. See Vc::MallocAlignment.

Returns: Pointer to memory of the requested type, or 0 on error. The allocated memory is padded at the end to be a multiple of the requested alignment A. Thus if you request memory for 21 int objects, aligned via Vc::AlignOnCacheline, you can safely read a full cacheline until the end of the array, without generating an out-of-bounds access. For a cacheline size of 64 Bytes and an int size of 4 Bytes you would thus get an array of 128 Bytes to work with.

Warning

The standard malloc function specifies the number of Bytes to allocate whereas this function specifies the number of values, thus differing in a factor of sizeof(T).
This function is mainly meant for use with builtin types. If you use a custom type with a sizeof that is not a multiple of 2 the results might not be what you expect.
The constructor of T is not called. You can make up for this:
SomeType *array = new(Vc::malloc<SomeType, Vc::AlignOnCacheline>(N)) SomeType[N];

See also: Vc::free

Definition at line 76 of file memory.h.

void Vc::free ( T * p )

inline

Frees memory that was allocated with Vc::malloc.

Parameters

p	The pointer to the memory to be freed.

Template Parameters

T	The type of the allocated memory.

Warning: The destructor of T is not called. If needed, you can call the destructor before calling free:
for (int i = 0; i < N; ++i) {

p[i].~T();

}

Vc::free(p);

See also: Vc::malloc

Definition at line 103 of file memory.h.

Referenced by Memory< V, 0u, 0u, true >::~Memory().

void prefetchForOneRead ( const void * addr )

inline

Prefetch the cacheline containing addr for a single read access.

This prefetch completely bypasses the cache, not evicting any other data.

Parameters

addr	The cacheline containing `addr` will be prefetched.

Definition at line 582 of file memory.h.

void Vc::Common::prefetchForModify ( const void * addr )

inline

Prefetch the cacheline containing addr for modification.

This prefetch evicts data from the cache. So use it only for data you really will use. When the target system supports it the cacheline will be marked as modified while prefetching, saving work later on.

Parameters

addr	The cacheline containing `addr` will be prefetched.

Definition at line 599 of file memory.h.

void prefetchClose ( const void * addr )

inline

Prefetch the cacheline containing addr to L1 cache.

This prefetch evicts data from the cache. So use it only for data you really will use.

Parameters

addr	The cacheline containing `addr` will be prefetched.

Definition at line 614 of file memory.h.

void prefetchMid ( const void * addr )

inline

Prefetch the cacheline containing addr to L2 cache.

This prefetch evicts data from the cache. So use it only for data you really will use.

Parameters

addr	The cacheline containing `addr` will be prefetched.

Definition at line 629 of file memory.h.

void prefetchFar ( const void * addr )

inline

Prefetch the cacheline containing addr to L3 cache.

This prefetch evicts data from the cache. So use it only for data you really will use.

Parameters

addr	The cacheline containing `addr` will be prefetched.

Definition at line 644 of file memory.h.

constexpr WhereImpl::WhereMask<M> Vc::where ( const M & mask )

Conditional assignment.

Since compares between SIMD vectors do not return a single boolean, but rather a vector of booleans (mask), one often cannot use if / else statements. Instead, one needs to state that only a subset of entries of a given SIMD vector should be modified. The where function can be prepended to any assignment operation to execute a masked assignment.

Parameters

mask	The mask that selects the entries in the target vector that will be modified.

Returns: This function returns an opaque object that binds to the left operand of an assignment via the binary-or operator or the functor operator. (i.e. either where(mask) | x = y or where(mask)(x) = y)

Example:

template<typename T> void f1(T &x, T &y)
{
  if (x < 2) {
    x *= y;
    y += 2;
  }
}
template<typename T> void f2(T &x, T &y)
{
  where(x < 2) | x *= y;
  where(x < 2) | y += 2;
}

The block following the if statement in f1 will be executed if x < 2 evaluates to true. If T is a scalar type you normally get what you expect. But if T is a SIMD vector type, the comparison will use the implicit conversion from a mask to bool, meaning all_of(x < 2).

Most of the time the required operation is a masked assignment as stated in f2.

Definition at line 230 of file where.h.

Referenced by Vc::iif().

Variable Documentation

constexpr AlignedTag Aligned

Use this object for a flags parameter to request aligned loads and stores.

It specifies that a load/store can expect a memory address that is aligned on the correct boundary. (i.e. MemoryAlignment)

Warning: If you specify Aligned, but the memory address is not aligned the program will most likely crash.

Definition at line 184 of file loadstoreflags.h.

constexpr UnalignedTag Unaligned

Use this object for a flags parameter to request unaligned loads and stores.

It specifies that a load/store can not expect a memory address that is aligned on the correct boundary. (i.e. alignment is less than MemoryAlignment)

Note: If you specify Unaligned, but the memory address is aligned the load/store will execute slightly slower than necessary.

Definition at line 197 of file loadstoreflags.h.

constexpr StreamingTag Streaming

Use this object for a flags parameter to request streaming loads and stores.

It specifies that the cache should be bypassed for the given load/store. Whether this will actually be done depends on the target system's capabilities.

Streaming stores can be interesting when the code calculates values that, after being written to memory, will not be used for a long time or used by a different thread.

Note: Expect that most target systems do not support unaligned streaming loads or stores. Therefore, make sure that you also specify Aligned.

Definition at line 212 of file loadstoreflags.h.

Detailed Description

Classes

Macros

Typedefs

Enumerations

Functions

Variables

Compiler Identification Macros

Micro-Architecture Feature Tests

Version Macros

SIMD Support Feature Macros

SIMD Vector Size Macros

Boolean Reductions

Macro Definition Documentation

Typedef Documentation

Enumeration Type Documentation

Function Documentation

Variable Documentation