Vc  1.3.2-dev
SIMD Vector Classes for C++
Utilities

Detailed Description

Additional classes, macros, and functions that help to work more easily with the main vector types.

Classes

class  CpuId
 This class is available for x86 / AMD64 systems to read and interpret information about the CPU's capabilities. More...
 
struct  ImplementationT< Features >
 This class identifies the specific implementation Vc uses in the current translation unit in terms of a type. More...
 
class  Allocator< T >
 An allocator that uses global new and supports over-aligned types, as per [C++11 20.6.9]. More...
 
struct  AlignedBase< Alignment >
 Helper class to ensure a given alignment. More...
 
class  InterleavedMemoryWrapper< S, V >
 Wraps a pointer to memory with convenience functions to access it via vectors. More...
 
class  Memory< V, Size1, Size2, InitPadding >
 A helper class for fixed-size two-dimensional arrays. More...
 
class  Memory< V, Size, 0u, InitPadding >
 A helper class to simplify usage of correctly aligned and padded memory, allowing both vector and scalar access. More...
 
class  Memory< V, 0u, 0u, true >
 A helper class that is very similar to Memory<V, Size> but with dynamically allocated memory and thus dynamic size. More...
 

Macros

#define Vc_DECLARE_ALLOCATOR(Type)
 Convenience macro to set the default allocator for a given Type to Vc::Allocator. More...
 

Typedefs

using CurrentImplementation = ImplementationT< >
 Identifies the Vc implementation used in the current translation unit. More...
 
template<typename T , typename Allocator = std::allocator<T>>
using vector = Common::AdaptSubscriptOperator< std::vector< T, Allocator >>
 An adapted std::vector container with an additional subscript operator which implements gather and scatter operations. More...
 
using VectorAlignedBase = AlignedBase< Detail::max(alignof(Vector< float >), alignof(Vector< double >), alignof(Vector< ullong >), alignof(Vector< llong >), alignof(Vector< ulong >), alignof(Vector< long >), alignof(Vector< uint >), alignof(Vector< int >), alignof(Vector< ushort >), alignof(Vector< short >), alignof(Vector< uchar >), alignof(Vector< schar >))>
 Helper type to ensure suitable alignment for any Vc::Vector<T> type (using the default VectorAbi). More...
 
template<typename V >
using VectorAlignedBaseT = AlignedBase< alignof(V)>
 Variant of the above type ensuring suitable alignment only for the specified vector type V. More...
 
using MemoryAlignedBase = AlignedBase< Detail::max(Vector< float >::MemoryAlignment, Vector< double >::MemoryAlignment, Vector< ullong >::MemoryAlignment, Vector< llong >::MemoryAlignment, Vector< ulong >::MemoryAlignment, Vector< long >::MemoryAlignment, Vector< uint >::MemoryAlignment, Vector< int >::MemoryAlignment, Vector< ushort >::MemoryAlignment, Vector< short >::MemoryAlignment, Vector< uchar >::MemoryAlignment, Vector< schar >::MemoryAlignment)>
 Helper class to ensure suitable alignment for arrays of scalar objects for any Vc::Vector<T> type (using the default VectorAbi). More...
 
template<typename V >
using MemoryAlignedBaseT = AlignedBase< V::MemoryAlignment >
 Variant of the above type ensuring suitable alignment only for the specified vector type V. More...
 
using llong = long long
 long long shorthand
 
using ullong = unsigned long long
 unsigned long long shorthand
 
using ulong = unsigned long
 unsigned long shorthand
 
using uint = unsigned int
 unsigned int shorthand
 
using ushort = unsigned short
 unsigned short shorthand
 
using uchar = unsigned char
 unsigned char shorthand
 
using schar = signed char
 signed char shorthand
 

Enumerations

enum  MallocAlignment { AlignOnVector, AlignOnCacheline, AlignOnPage }
 Enum that specifies the alignment and padding restrictions to use for memory allocation with Vc::malloc. More...
 
enum  Implementation : std::uint_least32_t {
  ScalarImpl, SSE2Impl, SSE3Impl, SSSE3Impl,
  SSE41Impl, SSE42Impl, AVXImpl, AVX2Impl,
  MICImpl
}
 Enum to identify a certain SIMD instruction set. More...
 
enum  ExtraInstructions : std::uint_least32_t {
  Float16cInstructions = 0x01000, Fma4Instructions = 0x02000, XopInstructions = 0x04000, PopcntInstructions = 0x08000,
  Sse4aInstructions = 0x10000, FmaInstructions = 0x20000, VexInstructions = 0x40000, Bmi2Instructions = 0x80000
}
 The list of available instructions is not easily described by a linear list of instruction sets. More...
 

Functions

const char * versionString ()
 
constexpr unsigned int versionNumber ()
 
template<typename V , typename Parent , typename Dimension , typename RM >
std::ostream & operator<< (std::ostream &s, const Vc::MemoryBase< V, Parent, Dimension, RM > &m)
 Prints the contents of a Memory object into a stream object. More...
 
template<typename Mask , typename T >
enable_if< is_simd_mask< Mask >::value &&is_simd_vector< T >::value, T > iif (const Mask &condition, const T &trueValue, const T &falseValue)
 Function to mimic the ternary operator '?:' (inline-if). More...
 
template<typename T >
constexpr T iif (bool condition, const T &trueValue, const T &falseValue)
 Overload of the above for boolean conditions. More...
 
template<typename V , typename = enable_if<Traits::is_simd_vector<V>::value>>
std::pair< V, V > interleave (const V &a, const V &b)
 Interleaves the entries from a and b into two vectors of the same type. More...
 
template<typename Container , typename T >
constexpr auto makeContainer (std::initializer_list< T > list) -> decltype(make_container_helper< Container, T >::help(list))
 Construct a container of Vc vectors from a std::initializer_list of scalar entries. More...
 
template<typename T , Vc::MallocAlignment A>
T * malloc (size_t n)
 Allocates memory on the Heap with alignment and padding suitable for vectorized access. More...
 
template<typename T >
void free (T *p)
 Frees memory that was allocated with Vc::malloc. More...
 
void prefetchForOneRead (const void *addr)
 Prefetch the cacheline containing addr for a single read access. More...
 
void prefetchForModify (const void *addr)
 Prefetch the cacheline containing addr for modification. More...
 
void prefetchClose (const void *addr)
 Prefetch the cacheline containing addr to L1 cache. More...
 
void prefetchMid (const void *addr)
 Prefetch the cacheline containing addr to L2 cache. More...
 
void prefetchFar (const void *addr)
 Prefetch the cacheline containing addr to L3 cache. More...
 
template<typename V , typename T , typename Abi >
enable_if< (V::size()==Vector< T, Abi >::size()&&sizeof(typename V::VectorEntryType)==sizeof(typename Vector< T, Abi >::VectorEntryType)&&sizeof(V)==sizeof(Vector< T, Abi >)&&alignof(V)<=alignof(Vector< T, Abi >)), V > reinterpret_components_cast (const Vector< T, Abi > &x)
 Constructs a new Vector object of type V from the Vector x, reinterpreting the bits of x for the new type V. More...
 
template<typename M >
constexpr WhereImpl::WhereMask< M > where (const M &mask)
 Conditional assignment. More...
 

Variables

constexpr AlignedTag Aligned
 Use this object for a flags parameter to request aligned loads and stores. More...
 
constexpr UnalignedTag Unaligned
 Use this object for a flags parameter to request unaligned loads and stores. More...
 
constexpr StreamingTag Streaming
 Use this object for a flags parameter to request streaming loads and stores. More...
 
constexpr LoadStoreFlags::LoadStoreFlags< PrefetchFlag<> > PrefetchDefault
 Use this object for a flags parameter to request default software prefetches to be emitted.
 
constexpr VectorSpecialInitializerZero Zero = {}
 The special object Vc::Zero can be used to construct Vector and Mask objects initialized to zero/false.
 
constexpr VectorSpecialInitializerOne One = {}
 The special object Vc::One can be used to construct Vector and Mask objects initialized to one/true.
 
constexpr VectorSpecialInitializerIndexesFromZero IndexesFromZero = {}
 The special object Vc::IndexesFromZero can be used to construct Vector objects initialized to values 0, 1, 2, 3, 4, ...
 

Compiler Identification Macros

#define Vc_ICC   __INTEL_COMPILER_BUILD_DATE
 This macro is defined to a number identifying the ICC version if the current translation unit is compiled with the Intel compiler. More...
 
#define Vc_CLANG   (__clang_major__ * 0x10000 + __clang_minor__ * 0x100 + __clang_patchlevel__)
 This macro is defined to a number identifying the Clang version if the current translation unit is compiled with the Clang compiler. More...
 
#define Vc_APPLECLANG   (__clang_major__ * 0x10000 + __clang_minor__ * 0x100 + __clang_patchlevel__)
 This macro is defined to a number identifying the Apple Clang version if the current translation unit is compiled with the Apple Clang compiler. More...
 
#define Vc_GCC   (__GNUC__ * 0x10000 + __GNUC_MINOR__ * 0x100 + __GNUC_PATCHLEVEL__)
 This macro is defined to a number identifying the GCC version if the current translation unit is compiled with the GCC compiler. More...
 
#define Vc_MSVC   _MSC_FULL_VER
 This macro is defined to a number identifying the Microsoft Visual C++ version if the current translation unit is compiled with the Visual C++ (MSVC) compiler. More...
 

Micro-Architecture Feature Tests

unsigned int extraInstructionsSupported ()
 Determines the extra instructions supported by the current CPU. More...
 
bool isImplementationSupported (Vc::Implementation impl)
 Tests whether the given implementation is supported by the system the code is executing on. More...
 
Vc::Implementation bestImplementationSupported ()
 Determines the best supported implementation for the current system. More...
 
bool currentImplementationSupported ()
 Tests that the CPU and Operating System support the vector unit which was compiled for. More...
 

Version Macros

#define Vc_VERSION_STRING   "1.3.2-dev"
 Contains the version string of the Vc headers. More...
 
#define Vc_VERSION_NUMBER   0x010305
 Contains the encoded version number of the Vc headers. More...
 
#define Vc_VERSION_CHECK(major, minor, patch)   ((major << 16) | (minor << 8) | (patch << 1))
 Helper macro to compare against an encoded version number. More...
 

SIMD Support Feature Macros

#define Vc_IMPL_XOP
 This macro is defined if the current translation unit is compiled with XOP instruction support.
 
#define Vc_IMPL_FMA4
 This macro is defined if the current translation unit is compiled with FMA4 instruction support.
 
#define Vc_IMPL_F16C
 This macro is defined if the current translation unit is compiled with F16C instruction support.
 
#define Vc_IMPL_POPCNT
 This macro is defined if the current translation unit is compiled with POPCNT instruction support.
 
#define Vc_IMPL_SSE4a
 This macro is defined if the current translation unit is compiled with SSE4a instruction support.
 
#define Vc_IMPL_Scalar
 This macro is defined if the current translation unit is compiled without any SIMD support.
 
#define Vc_IMPL_SSE
 This macro is defined if the current translation unit is compiled with any version of SSE (but not AVX).
 
#define Vc_IMPL_SSE2
 This macro is defined if the current translation unit is compiled with SSE2 instruction support (excluding SSE3 and up).
 
#define Vc_IMPL_SSE3
 This macro is defined if the current translation unit is compiled with SSE3 instruction support (excluding SSSE3 and up).
 
#define Vc_IMPL_SSSE3
 This macro is defined if the current translation unit is compiled with SSSE3 instruction support (excluding SSE4.1 and up).
 
#define Vc_IMPL_SSE4_1
 This macro is defined if the current translation unit is compiled with SSE4.1 instruction support (excluding SSE4.2 and up).
 
#define Vc_IMPL_SSE4_2
 This macro is defined if the current translation unit is compiled with SSE4.2 instruction support (excluding AVX and up).
 
#define Vc_IMPL_AVX
 This macro is defined if the current translation unit is compiled with AVX instruction support (excluding AVX2 and up).
 
#define Vc_IMPL_AVX2
 This macro is defined if the current translation unit is compiled with AVX2 instruction support.
 
#define Vc_IMPL_MIC
 This macro is defined if the current translation unit is compiled for the Knights Corner Xeon Phi instruction set.
 

SIMD Vector Size Macros

#define Vc_DOUBLE_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a double_v.
 
#define Vc_FLOAT_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a float_v.
 
#define Vc_INT_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a int_v.
 
#define Vc_UINT_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a uint_v.
 
#define Vc_SHORT_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a short_v.
 
#define Vc_USHORT_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a ushort_v.
 

Boolean Reductions

template<typename Mask >
constexpr bool all_of (const Mask &m)
 Returns whether all entries in the mask m are true.
 
constexpr bool all_of (bool b)
 Returns b.
 
template<typename Mask >
constexpr bool any_of (const Mask &m)
 Returns whether at least one entry in the mask m is true.
 
constexpr bool any_of (bool b)
 Returns b.
 
template<typename Mask >
constexpr bool none_of (const Mask &m)
 Returns whether all entries in the mask m are false.
 
constexpr bool none_of (bool b)
 Returns !b.
 
template<typename Mask >
constexpr bool some_of (const Mask &m)
 Returns whether at least one entry in m is true and at least one entry in m is false.
 
constexpr bool some_of (bool)
 Returns false.
 

Macro Definition Documentation

#define Vc_ICC   __INTEL_COMPILER_BUILD_DATE

This macro is defined to a number identifying the ICC version if the current translation unit is compiled with the Intel compiler.

For any other compiler this macro is not defined.

Definition at line 47 of file global.h.

#define Vc_CLANG   (__clang_major__ * 0x10000 + __clang_minor__ * 0x100 + __clang_patchlevel__)

This macro is defined to a number identifying the Clang version if the current translation unit is compiled with the Clang compiler.

For any other compiler this macro is not defined.

Definition at line 56 of file global.h.

#define Vc_APPLECLANG   (__clang_major__ * 0x10000 + __clang_minor__ * 0x100 + __clang_patchlevel__)

This macro is defined to a number identifying the Apple Clang version if the current translation unit is compiled with the Apple Clang compiler.

For any other compiler this macro is not defined.

Definition at line 65 of file global.h.

#define Vc_GCC   (__GNUC__ * 0x10000 + __GNUC_MINOR__ * 0x100 + __GNUC_PATCHLEVEL__)

This macro is defined to a number identifying the GCC version if the current translation unit is compiled with the GCC compiler.

For any other compiler this macro is not defined.

Definition at line 74 of file global.h.

#define Vc_MSVC   _MSC_FULL_VER

This macro is defined to a number identifying the Microsoft Visual C++ version if the current translation unit is compiled with the Visual C++ (MSVC) compiler.

For any other compiler this macro is not defined.

Definition at line 82 of file global.h.

#define Vc_VERSION_STRING   "1.3.2-dev"

Contains the version string of the Vc headers.

Same as Vc::versionString().

Definition at line 40 of file version.h.

Referenced by Vc::versionString().

#define Vc_VERSION_NUMBER   0x010305

Contains the encoded version number of the Vc headers.

Same as Vc::versionNumber().

Definition at line 46 of file version.h.

Referenced by Vc::versionNumber().

#define Vc_VERSION_CHECK (   major,
  minor,
  patch 
)    ((major << 16) | (minor << 8) | (patch << 1))

Helper macro to compare against an encoded version number.

Example:

#if Vc_VERSION_NUMBER >= Vc_VERSION_CHECK(1, 0, 0)

Definition at line 57 of file version.h.

#define Vc_DECLARE_ALLOCATOR (   Type)
Value:
namespace std \
{ \
template <> class allocator<Type> : public ::Vc::Allocator<Type> \
{ \
public: \
template <typename U> struct rebind { \
typedef ::std::allocator<U> other; \
}; \
}; \
}

Convenience macro to set the default allocator for a given Type to Vc::Allocator.

Parameters
TypeYour type that you want to use with STL containers.
Note
You have to use this macro in the global namespace.

Definition at line 65 of file Allocator.

Typedef Documentation

using CurrentImplementation = ImplementationT< >

Identifies the Vc implementation used in the current translation unit.

See also
ImplementationT

Definition at line 616 of file global.h.

using vector = Common::AdaptSubscriptOperator<std::vector<T, Allocator>>

An adapted std::vector container with an additional subscript operator which implements gather and scatter operations.

Example:

struct Point {
float x, y;
};
data.resize(100);
// initialize values in data
float_v::IndexType indexes = ...; // values between 0-99
float_v x = data[indexes][&Point::x];
float_v y = data[indexes][&Point::y];

Definition at line 51 of file vector.

using VectorAlignedBase = AlignedBase< Detail::max(alignof(Vector<float>), alignof(Vector<double>), alignof(Vector<ullong>), alignof(Vector<llong>), alignof(Vector<ulong>), alignof(Vector<long>), alignof(Vector<uint>), alignof(Vector<int>), alignof(Vector<ushort>), alignof(Vector<short>), alignof(Vector<uchar>), alignof(Vector<schar>))>

Helper type to ensure suitable alignment for any Vc::Vector<T> type (using the default VectorAbi).

This class reimplements the new and delete operators to align objects allocated on the heap suitably for objects of Vc::Vector<T> type. This is necessary since the standard new operator does not adhere to the alignment requirements of the type.

See also
Vc::VectorAlignedBaseT
Vc::MemoryAlignedBase
Vc::AlignedBase

Definition at line 90 of file alignedbase.h.

using VectorAlignedBaseT = AlignedBase<alignof(V)>

Variant of the above type ensuring suitable alignment only for the specified vector type V.

See also
Vc::VectorAlignedBase
Vc::MemoryAlignedBaseT

Definition at line 100 of file alignedbase.h.

using MemoryAlignedBase = AlignedBase< Detail::max(Vector<float>::MemoryAlignment, Vector<double>::MemoryAlignment, Vector<ullong>::MemoryAlignment, Vector<llong>::MemoryAlignment, Vector<ulong>::MemoryAlignment, Vector<long>::MemoryAlignment, Vector<uint>::MemoryAlignment, Vector<int>::MemoryAlignment, Vector<ushort>::MemoryAlignment, Vector<short>::MemoryAlignment, Vector<uchar>::MemoryAlignment, Vector<schar>::MemoryAlignment)>

Helper class to ensure suitable alignment for arrays of scalar objects for any Vc::Vector<T> type (using the default VectorAbi).

This class reimplements the new and delete operators to align objects allocated on the heap suitably for arrays of type Vc::Vector<T>::EntryType. Subsequent load and store operations are safe to use the aligned variant.

See also
Vc::MemoryAlignedBaseT
Vc::VectorAlignedBase
Vc::AlignedBase

Definition at line 122 of file alignedbase.h.

using MemoryAlignedBaseT = AlignedBase<V::MemoryAlignment>

Variant of the above type ensuring suitable alignment only for the specified vector type V.

See also
Vc::MemoryAlignedBase
Vc::VectorAlignedBaseT

Definition at line 132 of file alignedbase.h.

Enumeration Type Documentation

enum MallocAlignment

Enum that specifies the alignment and padding restrictions to use for memory allocation with Vc::malloc.

Enumerator
AlignOnVector 

Align on boundary of vector sizes (e.g.

16 Bytes on SSE platforms) and pad to allow vector access to the end. Thus the allocated memory contains a multiple of VectorAlignment bytes.

AlignOnCacheline 

Align on boundary of cache line sizes (e.g.

64 Bytes on x86) and pad to allow full cache line access to the end. Thus the allocated memory contains a multiple of 64 bytes.

AlignOnPage 

Align on boundary of page sizes (e.g.

4096 Bytes on x86) and pad to allow full page access to the end. Thus the allocated memory contains a multiple of 4096 bytes.

Definition at line 446 of file global.h.

enum Implementation : std::uint_least32_t

Enum to identify a certain SIMD instruction set.

You can use CurrentImplementation for the currently active implementation.

See also
ExtraInstructions
Enumerator
ScalarImpl 

uses only fundamental types

SSE2Impl 

x86 SSE + SSE2

SSE3Impl 

x86 SSE + SSE2 + SSE3

SSSE3Impl 

x86 SSE + SSE2 + SSE3 + SSSE3

SSE41Impl 

x86 SSE + SSE2 + SSE3 + SSSE3 + SSE4.1

SSE42Impl 

x86 SSE + SSE2 + SSE3 + SSSE3 + SSE4.1 + SSE4.2

AVXImpl 

x86 AVX

AVX2Impl 

x86 AVX + AVX2

MICImpl 

Intel Xeon Phi.

Definition at line 476 of file global.h.

enum ExtraInstructions : std::uint_least32_t

The list of available instructions is not easily described by a linear list of instruction sets.

On x86 the following instruction sets always include their predecessors: SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2

But there are additional instructions that are not necessarily required by this list. These are covered in this enum.

Enumerator
Float16cInstructions 

Support for float16 conversions in hardware.

Fma4Instructions 

Support for FMA4 instructions.

XopInstructions 

Support for XOP instructions.

PopcntInstructions 

Support for the population count instruction.

Sse4aInstructions 

Support for SSE4a instructions.

FmaInstructions 

Support for FMA instructions (3 operand variant)

VexInstructions 

Support for ternary instruction coding (VEX)

Bmi2Instructions 

Support for BMI2 instructions.

Definition at line 508 of file global.h.

Function Documentation

unsigned int Vc::extraInstructionsSupported ( )

Determines the extra instructions supported by the current CPU.

Returns
A combination of flags from Vc::ExtraInstructions that the current CPU supports.
bool Vc::isImplementationSupported ( Vc::Implementation  impl)

Tests whether the given implementation is supported by the system the code is executing on.

Returns
true if the OS and hardware support execution of instructions defined by impl.
false otherwise
Parameters
implThe SIMD target to test for.
Vc::Implementation Vc::bestImplementationSupported ( )

Determines the best supported implementation for the current system.

Returns
The enum value for the best implementation.
bool Vc::currentImplementationSupported ( )
inline

Tests that the CPU and Operating System support the vector unit which was compiled for.

This function should be called before any other Vc functionality is used. It checks whether the program will work. If this function returns false then the program should exit with a useful error message before the OS has to kill it because of an invalid instruction exception.

If the program continues and makes use of any vector features not supported by hard- or software then the program will crash.

Example:

int main()
{
std::cerr << "CPU or OS requirements not met for the compiled in vector unit!\n";
exit -1;
}
...
}
Returns
true if the OS and hardware support execution of the currently selected SIMD instructions.
false otherwise

Definition at line 146 of file support.h.

const char* Vc::versionString ( )
inline
Returns
the version string of the Vc headers.
Note
There exists a built-in check that ensures on application startup that the Vc version of the library (link time) and the headers (compile time) are equal. A mismatch between headers and library could lead to errors that are very hard to debug.
If you need to disable the check (it costs a very small amount of application startup time) you can define Vc_NO_VERSION_CHECK at compile time.

Definition at line 81 of file version.h.

constexpr unsigned int Vc::versionNumber ( )
Returns
the version of the Vc headers encoded in an integer.

Definition at line 89 of file version.h.

std::ostream& Vc::Common::operator<< ( std::ostream &  s,
const Vc::MemoryBase< V, Parent, Dimension, RM > &  m 
)
inline

Prints the contents of a Memory object into a stream object.

Vc::Memory<int_v, 10> m;
for (int i = 0; i < m.entriesCount(); ++i) {
m[i] = i;
}
std::cout << m << std::endl;

will output (with SSE):

{[0, 1, 2, 3] [4, 5, 6, 7] [8, 9, 0, 0]}
Parameters
sAny standard C++ ostream object. For example std::cout or a std::stringstream object.
mAny Vc::Memory object.
Returns
The ostream object: to chain multiple stream operations.
Note
With the GNU standard library this function will check whether the output stream is a tty in which case it colorizes the output.
Warning
Please do not forget that printing a large memory object can take a long time.
enable_if<is_simd_mask<Mask>::value && is_simd_vector<T>::value, T> Vc::iif ( const Mask condition,
const T &  trueValue,
const T &  falseValue 
)
inlinedelete

Function to mimic the ternary operator '?:' (inline-if).

Parameters
conditionDetermines which values are returned. This is analog to the first argument to the ternary operator.
trueValueThe values to return where condition is true.
falseValueThe values to return where condition is false.
Returns
A combination of entries from trueValue and falseValue, according to condition.

So instead of the scalar variant

float x = a > 1.f ? b : b + c;

you'd write

float_v x = Vc::iif (a > 1.f, b, b + c);

Assuming a has the values [0, 3, 5, 1], b is [1, 1, 1, 1], and c is [1, 2, 3, 4], then x will be [2, 2, 3, 5].

Definition at line 60 of file iif.h.

constexpr T Vc::iif ( bool  condition,
const T &  trueValue,
const T &  falseValue 
)

Overload of the above for boolean conditions.

This typically results in direct use of the ternary operator. This function makes it easier to switch from a Vc type to a builtin type.

Parameters
conditionDetermines which value is returned. This is analog to the first argument to the ternary operator.
trueValueThe value to return if condition is true.
falseValueThe value to return if condition is false.
Returns
Either trueValue or falseValue, depending on condition.

Definition at line 90 of file iif.h.

std::pair<V, V> Vc::interleave ( const V &  a,
const V &  b 
)

Interleaves the entries from a and b into two vectors of the same type.

The order in the returned vector contains the elements a[0], b[0], a[1], b[1], a[2], b[2], a[3], b[3], ....

Example:

Vc::SimdArray<int, 4> a = { 1, 2, 3, 4 };
Vc::SimdArray<int, 4> b = { 9, 8, 7, 6 };
std::tie(a, b) = Vc::interleave(a, b);
std::cout << a << b;
// prints:
// <1 9 2 8><3 7 4 6>
Parameters
ainput vector whose data will appear at even indexes in the output
binput vector whose data will appear at odd indexes in the output
Returns
two vectors with data from a and b interleaved

Definition at line 55 of file interleave.h.

constexpr auto Vc::makeContainer ( std::initializer_list< T >  list) -> decltype(make_container_helper<Container, T>::help(list))

Construct a container of Vc vectors from a std::initializer_list of scalar entries.

Template Parameters
ContainerThe container type to construct.
TThe scalar type to use for the initializer_list.
Parameters
listAn initializer list of arbitrary size. The type of the entries is important! If you pass a list of integers you will get a container filled with Vc::int_v objects. If, instead, you want to have a container of Vc::float_v objects, be sure the include a period (.) and the 'f' postfix in the literals. Alternatively, you can pass the type as second template argument to makeContainer.
Returns
Returns a container of the requested class filled with the minimum number of SIMD vectors to hold the values in the initializer list. If the number of values in list does not match the number of values in the returned container object, the remaining values in the returned object will be zero-initialized.

Example:

auto data = Vc::makeContainer<std::vector<float_v>>({ 1.f, 2.f, 3.f, 4.f, 5.f });
// data.size() == 5 if float_v::Size == 1 (i.e. Vc_IMPL=Scalar)
// data.size() == 2 if float_v::Size == 4 (i.e. Vc_IMPL=SSE)
// data.size() == 1 if float_v::Size == 8 (i.e. Vc_IMPL=AVX)

Definition at line 138 of file makeContainer.h.

T* Vc::malloc ( size_t  n)
inline

Allocates memory on the Heap with alignment and padding suitable for vectorized access.

Memory that was allocated with this function must be released with Vc::free! Other methods might work but are not portable.

Parameters
nSpecifies the number of objects the allocated memory must be able to store.
Template Parameters
TThe type of the allocated memory. Note, that the constructor is not called.
ADetermines the alignment of the memory. See Vc::MallocAlignment.
Returns
Pointer to memory of the requested type, or 0 on error. The allocated memory is padded at the end to be a multiple of the requested alignment A. Thus if you request memory for 21 int objects, aligned via Vc::AlignOnCacheline, you can safely read a full cacheline until the end of the array, without generating an out-of-bounds access. For a cacheline size of 64 Bytes and an int size of 4 Bytes you would thus get an array of 128 Bytes to work with.
Warning
  • The standard malloc function specifies the number of Bytes to allocate whereas this function specifies the number of values, thus differing in a factor of sizeof(T).
  • This function is mainly meant for use with builtin types. If you use a custom type with a sizeof that is not a multiple of 2 the results might not be what you expect.
  • The constructor of T is not called. You can make up for this:
    SomeType *array = new(Vc::malloc<SomeType, Vc::AlignOnCacheline>(N)) SomeType[N];
See also
Vc::free

Definition at line 75 of file memory.h.

void Vc::free ( T *  p)
inline

Frees memory that was allocated with Vc::malloc.

Parameters
pThe pointer to the memory to be freed.
Template Parameters
TThe type of the allocated memory.
Warning
The destructor of T is not called. If needed, you can call the destructor before calling free:
for (int i = 0; i < N; ++i) {
p[i].~T();
}
See also
Vc::malloc

Definition at line 102 of file memory.h.

Referenced by Memory< V, 0u, 0u, true >::~Memory().

void prefetchForOneRead ( const void *  addr)
inline

Prefetch the cacheline containing addr for a single read access.

This prefetch completely bypasses the cache, not evicting any other data.

Parameters
addrThe cacheline containing addr will be prefetched.

Definition at line 603 of file memory.h.

void Vc::Common::prefetchForModify ( const void *  addr)
inline

Prefetch the cacheline containing addr for modification.

This prefetch evicts data from the cache. So use it only for data you really will use. When the target system supports it the cacheline will be marked as modified while prefetching, saving work later on.

Parameters
addrThe cacheline containing addr will be prefetched.

Definition at line 620 of file memory.h.

void prefetchClose ( const void *  addr)
inline

Prefetch the cacheline containing addr to L1 cache.

This prefetch evicts data from the cache. So use it only for data you really will use.

Parameters
addrThe cacheline containing addr will be prefetched.

Definition at line 635 of file memory.h.

void prefetchMid ( const void *  addr)
inline

Prefetch the cacheline containing addr to L2 cache.

This prefetch evicts data from the cache. So use it only for data you really will use.

Parameters
addrThe cacheline containing addr will be prefetched.

Definition at line 650 of file memory.h.

void prefetchFar ( const void *  addr)
inline

Prefetch the cacheline containing addr to L3 cache.

This prefetch evicts data from the cache. So use it only for data you really will use.

Parameters
addrThe cacheline containing addr will be prefetched.

Definition at line 665 of file memory.h.

enable_if< (V::size() == Vector<T, Abi>::size() && sizeof(typename V::VectorEntryType) == sizeof(typename Vector<T, Abi>::VectorEntryType) && sizeof(V) == sizeof(Vector<T, Abi>) && alignof(V) <= alignof(Vector<T, Abi>)), V> Vc::reinterpret_components_cast ( const Vector< T, Abi > &  x)
inline

Constructs a new Vector object of type V from the Vector x, reinterpreting the bits of x for the new type V.

This function is only applicable if:

  • the sizeof of the input and output types is equal
  • the Vector::size() of the input and output types is equal
  • the VectorEntryTypes of input and output have equal sizeof
Template Parameters
VThe requested type to change x into.
Parameters
xThe Vector to reinterpret as an object of type V.
Returns
A new object (rvalue) of type V.
Warning
This cast is non-portable since the applicability (see above) may change depending on the default vector types of the target platform. The function is perfectly safe to use with fully specified Abi, though.

Definition at line 834 of file vector.h.

constexpr WhereImpl::WhereMask<M> Vc::where ( const M &  mask)

Conditional assignment.

Since compares between SIMD vectors do not return a single boolean, but rather a vector of booleans (mask), one often cannot use if / else statements. Instead, one needs to state that only a subset of entries of a given SIMD vector should be modified. The where function can be prepended to any assignment operation to execute a masked assignment.

Parameters
maskThe mask that selects the entries in the target vector that will be modified.
Returns
This function returns an opaque object that binds to the left operand of an assignment via the binary-or operator or the functor operator. (i.e. either where(mask) | x = y or where(mask)(x) = y)

Example:

template<typename T> void f1(T &x, T &y)
{
if (x < 2) {
x *= y;
y += 2;
}
}
template<typename T> void f2(T &x, T &y)
{
where(x < 2) | x *= y;
where(x < 2) | y += 2;
}

The block following the if statement in f1 will be executed if x < 2 evaluates to true. If T is a scalar type you normally get what you expect. But if T is a SIMD vector type, the comparison will use the implicit conversion from a mask to bool, meaning all_of(x < 2).

Most of the time the required operation is a masked assignment as stated in f2.

Definition at line 229 of file where.h.

Referenced by Vc::iif().

Variable Documentation

constexpr AlignedTag Aligned

Use this object for a flags parameter to request aligned loads and stores.

It specifies that a load/store can expect a memory address that is aligned on the correct boundary. (i.e. MemoryAlignment)

Warning
If you specify Aligned, but the memory address is not aligned the program will most likely crash.

Definition at line 183 of file loadstoreflags.h.

Referenced by SimdArray< T, N, V, Wt >::reversed(), and SimdArray< T, N, V, Wt >::rotated().

constexpr UnalignedTag Unaligned

Use this object for a flags parameter to request unaligned loads and stores.

It specifies that a load/store can not expect a memory address that is aligned on the correct boundary. (i.e. alignment is less than MemoryAlignment)

Note
If you specify Unaligned, but the memory address is aligned the load/store will execute slightly slower than necessary.

Definition at line 196 of file loadstoreflags.h.

Referenced by SimdArray< T, N, V, Wt >::reversed(), SimdArray< T, N, V, Wt >::rotated(), and MemoryBase< V, Memory< V, Size, 0u, InitPadding >, 1, void >::vector().

constexpr StreamingTag Streaming

Use this object for a flags parameter to request streaming loads and stores.

It specifies that the cache should be bypassed for the given load/store. Whether this will actually be done depends on the target system's capabilities.

Streaming stores can be interesting when the code calculates values that, after being written to memory, will not be used for a long time or used by a different thread.

Note
Expect that most target systems do not support unaligned streaming loads or stores. Therefore, make sure that you also specify Aligned.

Definition at line 211 of file loadstoreflags.h.