Vc  0.7.5-dev
SIMD Vector Classes for C++
sfloat_v Class Reference

Detailed Description

SIMD Vector of single precision floats that is guaranteed to have as many entries as a Vc::short_v and Vc::ushort_v.

#include <Vc/sfloat_v>

Public Types

enum  { Size }
typedef ushort_v IndexType
 The type of the vector used for indexes in gather and scatter operations.
typedef float EntryType
 The type of the entries in the vector.
typedef sfloat_m Mask
 The type of the mask used for masked operations and returned from comparisons.

Public Member Functions

 sfloat_v ()
 Construct an uninitialized vector.
 sfloat_v (Vc::Zero)
 Construct a vector with the entries initialized to zero.
 sfloat_v (Vc::One)
 Construct a vector with the entries initialized to one.
 sfloat_v (Vc::IndexesFromZero)
 Construct a vector with the entries initialized to 0, 1, 2, 3, 4, 5, ...
 sfloat_v (float *alignedMemory)
 Construct a vector loading its entries from alignedMemory.
template<typename OtherVector >
 sfloat_v (const OtherVector &)
 Convert from another vector type.
 sfloat_v (float x)
 Broadcast Constructor.
void load (const float *memory, LoadStoreFlags align=Aligned)
 Construct a vector from an array of vectors with different Size.
void setZero ()
 Set all entries to zero.
void setZero (const sfloat_m &mask)
 Set all entries to zero where the mask is set.
void store (EntryType *memory, LoadStoreFlags align=Aligned) const
 Store the vector data to memory.
float & operator[] (int index)
 This operator can be used to modify scalar entries of the vector.
float operator[] (int index) const
 This operator can be used to read scalar entries of the vector.
MaskedVector operator() (const sfloat_m &mask)
 Writemask the vector before an assignment.
sfloat_v sorted () const
 Return a sorted copy of the vector.
sfloat_v copySign (sfloat_v reference) const
 Copies the sign of reference.
sfloat_v exponent () const
 Extracts the exponent.
sfloat_m isNegative () const
 Check the sign bit of each vector entry.
Gather and Scatter Functions

The gather and scatter functions allow you to easily use vectors with structured data and random accesses.

There are several variants:

  • random access in arrays (a[i])
  • random access of members of structs in an array (a[i].member)
  • random access of members of members of structs in an array (a[i].member1.member2)

All gather and scatter functions optionally take a mask as last argument. In that case only the entries that are selected in the mask are read in memory and copied to the vector. This allows you to have invalid indexes in the indexes vector if those are masked off in mask.

Note
If you use a constructor for a masked gather then the unmodified entries of the vector are initilized to 0 before the gather. If you really want them uninitialized you can create a uninitialized vector object first and then call the masked gather function on it.

The index type (IndexT) can either be a pointer to integers (array) or a vector of integers.

Accessing values of a struct works like this:

struct MyData {
float a;
int b;
};
void foo(MyData *data, uint_v indexes) {
const float_v v1(data, &MyData::a, indexes);
const int_v v2(data, &MyData::b, indexes);
v1.scatter(data, &MyData::a, indexes - float_v::Size);
v2.scatter(data, &MyData::b, indexes - 1);
}
Parameters
arrayA pointer into memory (without alignment restrictions).
member1If array points to a struct, member1 determines the member in the struct to be read. Thus the offsets in indexes are relative to the array and not to the size of the gathered type (i.e. array[i].*member1 is accessed instead of (&(array->*member1))[i])
member2If member1 is a struct then member2 selects the member to be read from that struct (i.e. array[i].*member1.*member2 is read).
indexesDetermines the offsets into array where the values are gathered from/scattered to. The type of indexes can either be an integer vector or a type that supports operator[] access.
maskIf a mask is given only the active entries will be gathered/scattered.
template<typename IndexT >
 sfloat_v (const float *array, const IndexT indexes)
 gather constructor
template<typename IndexT >
 sfloat_v (const float *array, const IndexT indexes, const sfloat_m &mask)
 masked gather constructor, initialized to zero
template<typename IndexT >
void gather (const float *array, const IndexT indexes)
 gather
template<typename IndexT >
void gather (const float *array, const IndexT indexes, const sfloat_m &mask)
 masked gather
template<typename IndexT >
void scatter (float *array, const IndexT indexes) const
 scatter
template<typename IndexT >
void scatter (float *array, const IndexT indexes, const sfloat_m &mask) const
 masked scatter
template<typename S1 , typename IndexT >
 sfloat_v (const S1 *array, const float S1::*member1, const IndexT indexes)
 struct member gather constructor
template<typename S1 , typename IndexT >
 sfloat_v (const S1 *array, const float S1::*member1, const IndexT indexes, const sfloat_m &mask)
 masked struct member gather constructor, initialized to zero
template<typename S1 , typename IndexT >
void gather (const S1 *array, const float S1::*member1, const IndexT indexes)
 struct member gather
template<typename S1 , typename IndexT >
void gather (const S1 *array, const float S1::*member1, const IndexT indexes, const sfloat_m &mask)
 masked struct member gather
template<typename S1 , typename IndexT >
void scatter (S1 *array, float S1::*member1, const IndexT indexes) const
 struct member scatter
template<typename S1 , typename IndexT >
void scatter (S1 *array, float S1::*member1, const IndexT indexes, const sfloat_m &mask) const
 masked struct member scatter
template<typename S1 , typename S2 , typename IndexT >
 sfloat_v (const S1 *array, const S2 S1::*member1, const float S2::*member2, const IndexT indexes)
 struct member of struct member gather constructor
template<typename S1 , typename S2 , typename IndexT >
 sfloat_v (const S1 *array, const S2 S1::*member1, const float S2::*member2, const IndexT indexes, const sfloat_m &mask)
 masked struct member of struct member gather constructor, initialized to zero
template<typename S1 , typename S2 , typename IndexT >
void gather (const S1 *array, const S2 S1::*member1, const float S2::*member2, const IndexT indexes)
 struct member of struct member gather
template<typename S1 , typename S2 , typename IndexT >
void gather (const S1 *array, const S2 S1::*member1, const float S2::*member2, const IndexT indexes, const sfloat_m &mask)
 masked struct member of struct member gather
template<typename S1 , typename S2 , typename IndexT >
void scatter (S1 *array, S2 S1::*member1, float S2::*member2, const IndexT indexes) const
 struct member of struct member scatter
template<typename S1 , typename S2 , typename IndexT >
void scatter (S1 *array, S2 S1::*member1, float S2::*member2, const IndexT indexes, const sfloat_m &mask) const
 maksed struct member of struct member scatter
Comparisons

All comparison operators return a mask object.

void foo(const float_v &a, const float_v &b) {
const float_m mask = a < b;
...
}
Parameters
xThe vector to compare with.
sfloat_m operator== (const sfloat_v &x) const
 Returns mask that is true where vector entries are equal and false otherwise.
sfloat_m operator!= (const sfloat_v &x) const
 Returns mask that is true where vector entries are not equal and false otherwise.
sfloat_m operator> (const sfloat_v &x) const
 Returns mask that is true where the left vector entries are greater than on the right and false otherwise.
sfloat_m operator>= (const sfloat_v &x) const
 Returns mask that is true where the left vector entries are greater than on the right or equal and false otherwise.
sfloat_m operator< (const sfloat_v &x) const
 Returns mask that is true where the left vector entries are less than on the right and false otherwise.
sfloat_m operator<= (const sfloat_v &x) const
 Returns mask that is true where the left vector entries are less than on the right or equal and false otherwise.
Arithmetic Operations

The vector classes implement all the arithmetic and (bitwise) logical operations as you know from builtin types.

void foo(const float_v &a, const float_v &b) {
const float_v product = a * b;
const float_v difference = a - b;
}
sfloat_v operator+ (sfloat_v x) const
 Returns a new vector with the sum of the respective entries of the left and right vector.
sfloat_voperator+= (sfloat_v x)
 Adds the respective entries of x to this vector.
sfloat_v operator- (sfloat_v x) const
 Returns a new vector with the difference of the respective entries of the left and right vector.
sfloat_voperator-= (sfloat_v x)
 Subtracts the respective entries of x from this vector.
sfloat_v operator* (sfloat_v x) const
 Returns a new vector with the product of the respective entries of the left and right vector.
sfloat_voperator*= (sfloat_v x)
 Multiplies the respective entries of x from to vector.
sfloat_v operator/ (sfloat_v x) const
 Returns a new vector with the quotient of the respective entries of the left and right vector.
sfloat_voperator/= (sfloat_v x)
 Divides the respective entries of this vector by x.
sfloat_v operator- () const
 Returns a new vector with all entries negated.
sfloat_v operator| (sfloat_v x) const
 Returns a new vector with the binary or of the respective entries of the left and right vector.
sfloat_v operator& (sfloat_v x) const
 Returns a new vector with the binary and of the respective entries of the left and right vector.
sfloat_v operator^ (sfloat_v x) const
 Returns a new vector with the binary xor of the respective entries of the left and right vector.
sfloat_v operator<< (int x) const
 Returns a new vector with each entry bitshifted to the left by x bits.
sfloat_voperator<<= (int x)
 Bitshift each entry to the left by x bits.
sfloat_v operator>> (int x) const
 Returns a new vector with each entry bitshifted to the right by x bits.
sfloat_voperator>>= (int x)
 Bitshift each entry to the right by x bits.
sfloat_v operator<< (sfloat_v x) const
 Returns a new vector with each entry bitshifted to the left by x[i] bits.
sfloat_voperator<<= (sfloat_v x)
 Bitshift each entry to the left by x[i] bits.
sfloat_v operator>> (sfloat_v x) const
 Returns a new vector with each entry bitshifted to the right by x[i] bits.
sfloat_voperator>>= (sfloat_v x)
 Bitshift each entry to the right by x[i] bits.
void fusedMultiplyAdd (sfloat_v factor, sfloat_v summand)
 Multiplies this vector with factor and then adds summand, without rounding between the multiplication and the addition.
Horizontal Reduction Operations

There are four horizontal operations available to reduce the values of a vector to a scalar value.

void foo(const float_v &v) {
float min = v.min(); // smallest value in v
float sum = v.sum(); // sum of all values in v
}
float min () const
 Returns the smallest entry in the vector.
float max () const
 Returns the largest entry in the vector.
float product () const
 Returns the product of all entries in the vector.
float sum () const
 Returns the sum of all entries in the vector.
Apply/Call/Fill Functions

There are still many situations where the code needs to switch from SIMD operations to scalar execution.

In this case you can, of course rely on operator[]. But there are also a number of functions that can help with common patterns.

The apply functions expect a function that returns a scalar value, i.e. a function of the form "T f(T)". The call functions do not return a value and thus the function passed does not need a return value. The fill functions are used to serially set the entries of the vector from the return values of a function.

Example:

void foo(float_v v) {
float_v logarithm = v.apply(std::log);
float_v exponential = v.apply(std::exp);
}

Of course, with C++11, you can also use lambdas here:

float_v power = v.apply([](float f) { return std::pow(f, 0.6f); })
Parameters
fA functor: this can either be a function or an object that implements operator().
template<typename Functor >
sfloat_v apply (Functor &f) const
 Return a new vector where each entry is the return value of f called on the current value.
template<typename Functor >
sfloat_v apply (const Functor &f) const
 Const overload of the above function.
template<typename Functor >
sfloat_v apply (Functor &f, sfloat_m mask) const
 As above, but skip the entries where mask is not set.
template<typename Functor >
sfloat_v apply (const Functor &f, sfloat_m mask) const
 Const overload of the above function.
template<typename Functor >
void call (Functor &f) const
 Call f with the scalar entries of the vector.
template<typename Functor >
void call (const Functor &f) const
 Const overload of the above function.
template<typename Functor >
void call (Functor &f, sfloat_m mask) const
 As above, but skip the entries where mask is not set.
template<typename Functor >
void call (const Functor &f, sfloat_m mask) const
 Const overload of the above function.
void fill (float(&f)())
 Fill the vector with the values [f(), f(), f(), ...].
template<typename IndexT >
void fill (float(&f)(IndexT))
 Fill the vector with the values [f(0), f(1), f(2), ...].
Swizzles

Swizzles are a special form of shuffles that, depending on the target hardware and swizzle type, may be used without extra cost.

The swizzles act on every successive four entries in the vector. Thus the swizzle

[0, 1, 2, 3, 4, 5, 6, 7].dcba() 

results in

[3, 2, 1, 0, 7, 6, 5, 4] 

.

This implies a portability issue. The swizzles can only work on vectors where Size is a multiple of four. On Vc::Scalar all swizzles are implemented as no-ops. If a swizzle is used on a vector of Size == 2 compilation will fail.

const sfloat_v abcd () const
 Identity.
const sfloat_v badc () const
 Permute pairs.
const sfloat_v cdab () const
 Permute pairs of two / Rotate twice.
const sfloat_v aaaa () const
 Broadcast a.
const sfloat_v bbbb () const
 Broadcast b.
const sfloat_v cccc () const
 Broadcast c.
const sfloat_v dddd () const
 Broadcast d.
const sfloat_v bcad () const
 Rotate three: cross-product swizzle.
const sfloat_v bcda () const
 Rotate left.
const sfloat_v dabc () const
 Rotate right.
const sfloat_v acbd () const
 Permute inner pair.
const sfloat_v dbca () const
 Permute outer pair.
const sfloat_v dcba () const
 Reverse.
Shift and Rotate

These functions allow to shift or rotate the entries in a vector by the given amount.

Both functions support positive and negative numbers for the shift/rotate value.

Example:

using namespace Vc;
int_v foo = int_v::IndexesFromZero() + 1; // e.g. [1, 2, 3, 4] with SSE
x = foo.shifted( 1); // [2, 3, 4, 0]
x = foo.shifted( 2); // [3, 4, 0, 0]
x = foo.shifted( 3); // [4, 0, 0, 0]
x = foo.shifted( 4); // [0, 0, 0, 0]
x = foo.shifted(-1); // [0, 1, 2, 3]
x = foo.shifted(-2); // [0, 0, 1, 2]
x = foo.shifted(-3); // [0, 0, 0, 1]
x = foo.shifted(-4); // [0, 0, 0, 0]
x = foo.rotated( 1); // [2, 3, 4, 1]
x = foo.rotated( 2); // [3, 4, 1, 2]
x = foo.rotated( 3); // [4, 1, 2, 3]
x = foo.rotated( 4); // [1, 2, 3, 4]
x = foo.rotated(-1); // [4, 1, 2, 3]
x = foo.rotated(-2); // [3, 4, 1, 2]
x = foo.rotated(-3); // [2, 3, 4, 1]
x = foo.rotated(-4); // [1, 2, 3, 4]

These functions are slightly related to the above swizzles. In any case, they are often useful for communication between SIMD lanes or binary decoding operations.

const sfloat_v shifted (int amount) const
 Shift vector entries to the left by amount; shifting in zeros.
const sfloat_v rotated (int amount) const
 Rotate vector entries to the left by amount.

Static Public Member Functions

static sfloat_v Zero ()
 Returns a vector with the entries initialized to zero.
static sfloat_v One ()
 Returns a vector with the entries initialized to one.
static sfloat_v IndexesFromZero ()
 Returns a vector with the entries initialized to 0, 1, 2, 3, 4, 5, ...
static sfloat_v Random ()
 Returns a vector with pseudo-random entries.

Member Enumeration Documentation

anonymous enum
Enumerator:
Size 

The size of the vector.

I.e. the number of scalar entries in the vector. Do not make any assumptions about the size of vectors. If you need a vector of float vs. integer of the same size make use of IndexType instead. Note that this still does not guarantee the same size (e.g. double_v on SSE has two entries but there exists no 64 bit integer vector type in Vc - which would have two entries; thus double_v::IndexType is uint_v).

Also you can easily use if clauses that compare sizes. The compiler can statically evaluate and fully optimize dead code away (very much like #ifdef, but with syntax checking).

Constructor & Destructor Documentation

Construct a vector with the entries initialized to zero.

See Also
Vc::Zero, Zero()

Construct a vector with the entries initialized to one.

See Also
Vc::One

Construct a vector with the entries initialized to 0, 1, 2, 3, 4, 5, ...

See Also
Vc::IndexesFromZero, IndexesFromZero()
sfloat_v ( float *  alignedMemory)

Construct a vector loading its entries from alignedMemory.

Parameters
alignedMemoryA pointer to data. The pointer must be aligned on a Vc::VectorAlignment boundary.
sfloat_v ( float  x)

Broadcast Constructor.

Constructs a vector with all entries of the vector filled with the given value.

Parameters
xThe scalar value to broadcast to all entries of the constructed vector.
Note
If you want to set it to 0 or 1 use the special initializer constructors above. Calling this constructor with 0 will cause a compilation error because the compiler cannot know which constructor you meant.

Member Function Documentation

static sfloat_v Random ( )
static

Returns a vector with pseudo-random entries.

Currently the state of the random number generator cannot be modified and starts off with the same state. Thus you will get the same sequence of numbers for the same sequence of calls.

Returns
a new random vector. Floating-point values will be in the 0-1 range. Integers will use the full range the integer representation allows.
Note
This function may use a very small amount of state and thus will be a weak random number generator.
void load ( const float *  memory,
LoadStoreFlags  align = Aligned 
)

Construct a vector from an array of vectors with different Size.

E.g. convert from two double_v to one float_v.

See Also
expand Expand the values into an array of vectors that have a different Size.

E.g. convert from one float_v to two double_v.

This is the reverse of the above constructor. Load the vector entries from memory, overwriting the previous values.

Parameters
memoryA pointer to data.
alignDetermines whether memory is an aligned pointer or not.
See Also
Memory
void setZero ( const sfloat_m mask)

Set all entries to zero where the mask is set.

I.e. a 4-vector with a mask of 0111 would set the last three entries to 0.

Parameters
maskSelects the entries to be set to zero.
void store ( EntryType memory,
LoadStoreFlags  align = Aligned 
) const

Store the vector data to memory.

Parameters
memoryA pointer to memory, where to store.
alignDetermines whether memory is an aligned pointer or not.
See Also
Memory
float& operator[] ( int  index)

This operator can be used to modify scalar entries of the vector.

Parameters
indexA value between 0 and Size. This value is not checked internally so you must make/be sure it is in range.
Returns
a reference to the vector entry at the given index.
Warning
This operator is known to miscompile with GCC 4.3.x.
The use of this function may result in suboptimal performance. Please check whether you can find a more vector-friendly way to do what you need.
float operator[] ( int  index) const

This operator can be used to read scalar entries of the vector.

Parameters
indexA value between 0 and Size. This value is not checked internally so you must make/be sure it is in range.
Returns
the vector entry at the given index.
MaskedVector operator() ( const sfloat_m mask)

Writemask the vector before an assignment.

Parameters
maskThe writemask to be used.
Returns
an object that can be used for any kind of masked assignment.

The returned object is only to be used for assignments and should not be assigned to a variable.

Examples:

float_v v = float_v::Zero(); // v = [0, 0, 0, 0]
int_v v2 = int_v::IndexesFromZero(); // v2 = [0, 1, 2, 3]
v(v2 < 2) = 1.f; // v = [1, 1, 0, 0]
v(v2 < 3) += 1.f; // v = [2, 2, 1, 0]
++v2(v < 1.f); // v2 = [0, 1, 2, 4]
void fusedMultiplyAdd ( sfloat_v  factor,
sfloat_v  summand 
)

Multiplies this vector with factor and then adds summand, without rounding between the multiplication and the addition.

Parameters
factorThe multiplication factor.
summandThe summand that will be added after multiplication.
Note
This operation may have explicit hardware support, in which case it is normally faster to use the FMA instead of separate multiply and add instructions.
If the target hardware does not have FMA support this function will be considerably slower than a normal a * b + c. This is due to the increased precision fusedMultiplyAdd provides.
sfloat_v sorted ( ) const

Return a sorted copy of the vector.

Returns
A sorted vector. The returned values are in ascending order:
v[0] <= v[1] <= v[2] <= v[3] ...

Example:

int_v v = int_v::Random();
int_v s = v.sorted();
std::cout << v << '\n' << s << '\n';

With SSE the output would be:

[1513634383, -963914658, 1763536262, -1285037745]
[-1285037745, -963914658, 1513634383, 1763536262]

With the Scalar implementation:

[1513634383]
[1513634383]
sfloat_v copySign ( sfloat_v  reference) const

Copies the sign of reference.

Parameters
referenceThis values sign bit will be transferred.
Returns
a value where the sign of the value equals the sign of reference. I.e. sign(v.copySign(r)) == sign(r).
sfloat_v exponent ( ) const

Extracts the exponent.

Returns
the exponent to base 2.

This function provides efficient access to the exponent of the floating point number. The returned value is a fast approximation to the logarithm of base 2. The absolute error of that approximation is between [0, 1[.

Examples:

 value | exponent | log2
=======|==========|=======
   1.0 |        0 | 0
   2.0 |        1 | 1
   3.0 |        1 | 1.585
   3.9 |        1 | 1.963
   4.0 |        2 | 2
   4.1 |        2 | 2.036
Warning
This function assumes a positive value (non-zero). If the value is negative the sign bit will modify the returned value. An input value of zero will return the bias of the floating-point representation. If you compile with Vc runtime checks, the function will assert values greater than or equal to zero.

You may use abs to apply this function to negative values:

sfloat_m isNegative ( ) const

Check the sign bit of each vector entry.

Returns
whether the sign bit is set.

This function is especially useful to distinguish negative zero.

float_v z = float_v::Zero(); // z.isNegative() will be m[0000], z < float_v::Zero() will be m[0000]
float_v nz = -0.f; // nz.isNegative() will be m[1111], nz < float_v::Zero() will be m[0000]
float_v n = -1.f; // n.isNegative() will be m[1111], n < float_v::Zero() will be m[1111]