12,245,287 members (51,615 online)
alternative version

53.4K views
58 bookmarked
Posted

# 2D Vector Class Wrapper SSE Optimized for Math Operations

, 20 May 2008 GPL3
 Rate this:
The article demonstrates a 2D vector wrapper, optimized with SSE intrinsics, for math operations with floating point precision.

## Introduction

As you all probably know, arrays are evil. Therefore, I've written a nice 1D/2D vector wrapper with matrix operations support, optimized with SSE for fast processing. It is useful in applications with matrix math where you have convolution, addition, subtraction, multiplication etc... of matrices with floating point precision. For example, I'm using it in neural networks, to support vector machine classifiers for computer vision programs. The class has been written with proper coding style, supporting read-only access, implemented with const-correctness.

## Background

You should be familiar with matrices and the math behind them. Have a look at SOS math for a quick start. Have a look at the various SSE programming articles present here in the CodeProject as well.

## Using the code

First, make sure you've included the following headers in stdafx.h:

```#include <span class="code-keyword"><mm3dnow.h></span>
#include <span class="code-keyword"><float.h></span>
#include <span class="code-keyword"><time.h></span>
#include <span class="code-keyword"><math.h></span>
#include <span class="code-keyword"><windows.h></span>
#include <span class="code-keyword"><stdio.h></span>```

The vector wrapper is presented with the `vec2D` class. You may have an N x M (N by M) 2D vector with N rows and M columns, or a simple 1D vector - either a row vector 1xM or a column vector Nx1.

• `vec2D(const wchar_t* file);`
• `vec2D(unsigned int ysize = 1, unsigned int xsize = 1, int yoffset = 0, int xoffset = 0, const float* data = 0);`

With the first constructor, you read the array from the file on the disk in text format:

```matrix.txt

2 3
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0```

The matrix.txt file provides a 2 by 3 matrix with all zero elements.

Or, you can initialize the vector with the second constructor. If you do not provide any arguments to it, you have a 1 by 1 matrix, which is a scalar. The `ysize` denotes the number of rows, and `xsize`, the number of columns. With `yoffset` and `xoffset`, you may choose the negative or positive index with which you access the first element in the matrix. Thus, you can build convolution filters like Gaussian for smoothing, or Sobel, Prewitt for edge detection, where you have a 3 by 3 matrix and the zero index element is situated exactly in the center of the matrix. You can also supply an external buffer, `data`, to initialize the matrix; otherwise, it is filled with zeros.

```vec2D v1(1, 10);    //is the 1x10 row vector
vec2D v2(10, 1);    //is the 10x1 column vector

//this one initializes gaus_filter matrix with read-only access
const float gblur[] = { 0.0625f,  0.125f,  0.0625f,
0.125f,   0.25f,   0.125f,
0.0625f,  0.125f,  0.0625f };
const vec2D gaus_filter(3, 3, -1, -1, gblur);
float val = gaus_filter(0, 0);    //val = 0.25f```

With the following functions, you can access the first and the last available indices of the row and the column for the matrix:

• `inline int xfirst() const; //return first column index into array`
• `inline int yfirst() const; //return first row index into array`
• `inline int xlast() const; //return last col index into array`
• `inline int ylast() const;//return last row index into array`

You can access all the elements of the matrix this way:

```//Let's print out our gaus_filter;
for (int y = gaus_filter.yfirst(); y <= gaus_filter.ylast(); y++) {
for (int x = gaus_filter.xfirst(); x <= gaus_filter.xlast(); x++)
wprintf(L" %g", gaus_filter(y, x));
wprintf(L"\n");
}```

Or you can do the same with:

• `void print(const wchar_t* file = 0) const;`

If the file is not `NULL`, you save the matrix to a file on the disk; otherwise, print it to stdout.

To get the sizes of the matrix, use the following functions:

• `inline unsigned int width() const; //width of array, number of columns`
• `inline unsigned int height() const; //height size of array, number of rows`
• `inline unsigned int length() const; //total size of array width*height`

To access the elements of the matrix, there are four overloaded operators present, with both read-write and read-only access:

• `inline float& operator()(int y, int x);`
• `inline float operator()(int y, int x) const;`
• `inline float& operator[](unsigned int i);`
• `inline float operator[](unsigned int i) const;`

With the `[]` operators, you can access all the elements in a Matlab like fashion. But, the first row and column indices in your matrix should start with 0.

```//In Matlab we define the array like:
// v = zeros(3, 5);
//and we can print its contents with [] operator
// v[:]

//The same applies to [] overloaded operator
vec2D v(3, 5);   //we have xfirst() and yfirst() equals to 0 as ordinary C array
for (unsigned int i = 0; i < v.length(); i++)
wprintf(L"%g\n", v[i]);```

You can access the elements of the matrix also with:

• `inline float get(int y, int x) const;`

It allows the indices to exceed the matrix dimensions. In that case, you will have a periodic boundary extension. So, if you have your first row and column indices equal to 0, and try to access `vec(-1, -1)`, you get the `vec(1, 1)` element.

With overloaded assignment operators, you can make a copy of the matrix, or initialize it with an external data buffer:

• `inline const vec2D& operator=(const vec2D& v);`
• `inline const vec2D& operator=(const float* pf);`
```const float gblur[] = { 0.0625f,  0.125f,  0.0625f,
0.125f,   0.25f,   0.125f,
0.0625f,  0.125f,  0.0625f };
vec2D v1(3, 3);
v1 = gblur;
vec2D v2(3, 3);
vec2D v3(10, 5);
v2 = v1;
v3 = v2;    //v3 will be resized to 3 by 3 matrix```

With `set` element functions, the matrix contents can be initialized to specific values:

• `void set(float scalar); //set all matrix elements to scalar`
• `void set(float scalar, RECT& r); //set matrix elements in the RECT to scalar`
• `void setrand(); //set matrix elements to random values in the range -1.0 ... 1.0`

The `RECT` is just a Windows structure.

### SSE optimized operations

The following functions deal with SSE optimized math operations:

• `void add(float scalar);`
• `void sub(float scalar);`
• `void mul(float scalar);`
• `void div(float scalar);`

You can add, subtract, multiply, or divide all the matrix elements with a scalar.

```vec2D v(3, 5);
v.print();
v.print();```

The functions for arithmetic operations on matrices are:

• `void add(const vec2D& a, const vec2D& b); //c = a.+b sse optimized`
• `void sub(const vec2D& a, const vec2D& b); //c = a.-b sse optimized`
• `void mul(const vec2D& a, const vec2D& b); //c = a*b`
• `void div(const vec2D& a, const vec2D& b); //c = a./b sse optimized`

Where `c` is the matrix whose function you are calling, and its elements will be the result of the operation. `add`, `sub`, and `div` are the element wise operations, so the `a` and `b` matrices along with `c` should be of the same size. The `mul` (multiplication) operation is not SSE optimized, as during multiplication, every row in `a` is multiplied by every column in `b`.

```vec2D a(10, 10);
vec2D b = a;
vec2D c = a;
a.setrand();
b.setrand();
c.add(a, b);      //c = a + b
a.print();
b.print();
c.print();

vec2D a(3, 5);
vec2D b(5, 2);
vec2D c(3, 2);
a.setrand();
b.setrand();
c.mul(a, b);     //c = a * b```

The SSE optimized matrices multiplication goes here:

• `void mult(const vec2D& a, const vec2D& b); //c = a*b' sse optimized`
• `void mule(const vec2D& a, const vec2D& b); //c = a.*b sse optimized`

The difference between `mult` and `mul` is that the `b` matrix is transposed, so during multiplication, we multiply every row from `a` by every row from `b`, which is easily SSE optimized.

```//arrange transposed version of b for mult
vec2D a(3, 5);
vec2D b(2, 5);
vec2D c(3, 2);
a.setrand();
b.setrand();
c.mult(a, b);     //c = a * b';   SSE optimized
a.print();
b.print();
c.print();```

And, `mule` is just element wise multiplication of equally sized `a` and `b` matrices.

### 2D convolution, normalization, and histogram equalization

The other function useful in signal processing is convolution with a filter:

• `void conv2D(const vec2D& a, const vec2D& filter);`
• `void conv2D(const vec2D& a, const vec2D& re, const vec2D& im);`

The first one is just convolution with a filter, and the second is convolution with a complex filter consisting of real and imaginary parts.

```//gaussian smoothing
vec2D v_original(100, 100);
vec2D v_smoothed = v_original;
v_smoothed.conv2D(v_original, gaus_filter);

//edge detection
const float Re[] = { 0.5f, 0.0f, -0.5f,
0.5f, 0.0f, -0.5f,
0.5f, 0.0f, -0.5f };
const float Im[] = { 0.5f,  0.5f,  0.5f,
0.0f,  0.0f,  0.0f,
-0.5f, -0.5f, -0.5f};
vec2D re(3, 3, -1, -1, Re);
vec2D im(3, 3, -1, -1, Im);
vec2D v_edges(100, 100);

//set the rectangle in the matrix to 0.0
//so there will be edges on the brim of it
RECT r(10, 10, 50, 50);
v_smoothed.set(0.0f, r);

v_edges.conv2D(v_smoothed, re, im);```

The normalization and histogram equalization are used in image processing, for example, during a face detection process:

• `void normalize(float a, float b); //normalize to a...b range`
• `void histeq(vec2D& hist); //histogram normalization`

For normalization, the matrix is just normalized so its minimum value becomes `a` and maximum value is `b`.

For histogram equalization, provide an external row vector `hist` of size 1 by 256. The matrix before equalization should contain values in the range of 0 ... 255 as the normal bitmap image. After equalization, its values are mapped to the range of 0 ... 1.0.

```vec2D hist(1, 256);
vec2D v_heq(100, 100);
v_heq.setrand();
v_heq.norm(0.0f, 255.0f);
v_heq.histeq(hist);```

## Share

 Engineer Russian Federation
Highly skilled Engineer with 14 years of experience in academia, R&D and commercial product development supporting full software life-cycle from idea to implementation and further support. During my academic career I was able to succeed in MIT Computers in Cardiology 2006 international challenge, as a R&D and SW engineer gain CodeProject MVP, find algorithmic solutions to quickly resolve tough customer problems to pass product requirements in tight deadlines. My key areas of expertise involve Object-Oriented
Analysis and Design OOAD, OOP, machine learning, natural language processing, face recognition, computer vision and image processing, wavelet analysis, digital signal processing in cardiology.

## You may also be interested in...

 First Prev Next
 [My vote of 1] What?! Squat21-Mar-09 22:59 Squat 21-Mar-09 22:59
 i dun understand why u need an offset in init function f216-Oct-08 3:04 f2 16-Oct-08 3:04
 Re: i dun understand why u need an offset in init function Chesnokov Yuriy16-Oct-08 21:53 Chesnokov Yuriy 16-Oct-08 21:53
 Re: i dun understand why u need an offset in init function f217-Oct-08 8:28 f2 17-Oct-08 8:28
 Re: i dun understand why u need an offset in init function Chesnokov Yuriy17-Oct-08 21:47 Chesnokov Yuriy 17-Oct-08 21:47
 Re: i dun understand why u need an offset in init function f217-Oct-08 22:58 f2 17-Oct-08 22:58
 Bad Kuryn30-May-08 8:38 Kuryn 30-May-08 8:38
 Re: Not Bad but Great! Chesnokov Yuriy1-Jun-08 4:05 Chesnokov Yuriy 1-Jun-08 4:05
 I found you blind looking at my code. There is a mul in overloaded operator. Taking just once a mul or div worth operator invokation, but consider taking 1000000000 of muls (convolution with a filter), preallocation is a much faster rather than allocating every time you do the mul. The code is specialized one for floats only with SSE support. I'm not going it to use in C also. Providing templates will be a bunch of work to support every operation in MMX, SSE, SSE2. You can rewrite it yourself for templates supporting every MMX, SSE, SSE2 so I will be able to put your modes here chesnokov
 Re: Bad Herbert Sauro22-Dec-08 12:21 Herbert Sauro 22-Dec-08 12:21
 A question Ali Khanlarkhani31-Oct-07 5:58 Ali Khanlarkhani 31-Oct-07 5:58
 Re: A question Chesnokov Yuriy31-Oct-07 8:12 Chesnokov Yuriy 31-Oct-07 8:12
 No SSE command was found tyjiang16-Oct-07 22:18 tyjiang 16-Oct-07 22:18
 Re: No SSE command was found Chesnokov Yuriy17-Oct-07 1:54 Chesnokov Yuriy 17-Oct-07 1:54
 Last Visit: 31-Dec-99 19:00     Last Update: 1-May-16 21:22 Refresh 1