Click here to Skip to main content
Click here to Skip to main content

Tagged as

Go to top

Single precision floating point and double precesion floating values operations in SSE optimization

, 27 Nov 2012
Rate this:
Please Sign up or sign in to vote.
Single precision floating point and double precesion floating values operations in SSE optimization

Introduction

The Intel SSE intrinsic technology boosts the performance of floating point calculations. Both GCC and Microsoft Visual Studio supports SSE intrinsic. The xmm0-xmm15 (16 xmm registers for 64bit operating system) or xmm0-xmm7(8 xmm registers for 32 bit operating system) registers used for floating point calculations in SSE. Operations in SSE for single precision floating point and double precision floating point is a bit different. My objective is to point the differences between the calculation between these two data types using simple summation operation in floating point array.

SSE Programming

All SSE instructions and data types are defined in #include <xmmintrin.h>. __m128 is used for single precision floating point number and __m128d is used for double precision numbers. _mm_load_pd is used for loading double precision floating point number and _mm_load_ps is used loading for single precision floating point numbers. Similarly, _mm_add_ps, _mm_hadd_ps are used for adding single precision floating point numbers. Meanwhile, _mm_add_pd and _mm_hadd_pd are used for adding double precision floating point numbers. The float point array has to be aligned 16 and that can be done using _mm_malloc.

_mm_add_ps adds the four single precision floating-point values

r0 := a0 + b0
r1 := a0 + b1
r2 := a2 + b2
r3 := a3 + b3

_mm_add_pd adds the two double precision floating-point values

r0 := a0 + b0
r1 := a1 + b1

Code

This is the plain C code which we are we wish to convert codes using SSE.

float sum = 0;  //for double precision: double sum = 0;   
for (int i = 0; i < n; i++) {
    sum += scores[i];            
}

Single precision floating point number addition Sample code:

float sum  = 0.0;		
__m128 rsum  = _mm_set1_ps(0.0);
for (int i = 0; i < n; i+=4)
{		
	__m128 mr  = _mm_load_ps(&a[i]);					
	rsum = _mm_add_ps(rsum, mr);
}
rsum = _mm_hadd_ps(rsum, rsum);
rsum = _mm_hadd_ps(rsum, rsum);
_mm_store_ss(&sum, rsum);

Double precision floating point number addition Sample code:

double sum  = 0.0;
double sum1  = 0.0;	
__m128d rsum  = _mm_set1_pd(0.0);
__m128d rsum1  = _mm_set1_pd(0.0);	
for (int i = 0; i < n; i += 4)
{		
	__m128d mr  = _mm_load_pd(&a[i]);	
	__m128d mr1 = _mm_load_pd(&a[i+2]);			
	rsum = _mm_add_pd(rsum, mr);
	rsum1 = _mm_add_pd(rsum1, mr1);			
}
rsum = _mm_hadd_pd(rsum, rsum1);
rsum = _mm_hadd_pd(rsum, rsum);
_mm_store_sd(&sum, rsum);

You can see the difference between single precision float and double precision float is that you can add 4 values in one operation of single precision floating point number

rsum = _mm_add_ps(rsum, mr);

You can add 2 values in one operation and therefore you need two operations for 4 values

rsum = _mm_add_pd(rsum, mr);
rsum1 = _mm_add_pd(rsum1, mr1);

Adding a timer you can see SSE code is very much faster than normal code. In my PC I observed that SSE code is almost 4 times faster than plain code.

Hence, using SSE instruction one can develop faster complex application where time optimization is required.

Last of All

This is my first post in CodeProject. There may be mistakes in this article. Please let me know and give me feedback.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author


Comments and Discussions

 
GeneralMy vote of 3 PinmemberYvesDaoust3-Dec-12 6:41 
QuestionnMore info PinmemberYvesDaoust3-Dec-12 6:40 
AnswerRe: nMore info PinmemberShahadat Hossain Mazumder3-Dec-12 18:05 
GeneralRe: nMore info [modified] PinmemberYvesDaoust3-Dec-12 20:45 
GeneralRe: nMore info PinmemberShahadat Hossain Mazumder3-Dec-12 23:18 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140926.1 | Last Updated 27 Nov 2012
Article Copyright 2012 by Shahadat Hossain Mazumder
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid