12,500,884 members (54,405 online)
alternative version

#### Stats

82.6K views
13 bookmarked
Posted

, 9 Oct 2004
 Rate this:
An introduction to one of the optimization methods for Intel Dual Xeon HT technology.

## Introduction

This article is demonstrating how the dual thread would perform better than single thread especially in Intel dual Xeon with HT technology. The demo that I post here is a simple PI calculation as an example of computational expensive function. Both PI functions will run at 100% CPU utilization.

## Requirement

Very simple. In order to see significant improvement, you need to practice it on dual-CPU machine.

## Code

The demo code is very simple. Here's the sample starts..

`SingleThreadPI` and `DualThreadPI` will do exactly the same thing, same result, but different timing. The main difference is that the `DualThreadPI` function is written by using Data Parallelization method.

```#include "stdafx.h"

#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include <string.h>
#include <time.h>

CRITICAL_SECTION hUpdateMutex;
static long num_steps = 100000 * 1000;
double step;
double global_sum = 0.0;

int main(int argc, char* argv[])
{

getchar();
return 0;
}

{
int start, end, total;
start = clock();
int i;
double x, pi, sum = 0.0;
step = 1.0/(double) num_steps;

for (i=1;i<= num_steps; i++)
{
x = (i-0.5)*step;
sum = sum + 4.0/(1.0+x*x);
}

pi = step * sum;
printf(" pi is %f \n",pi);
end = clock();
total = end - start;
printf("%d\n", total);

return 0;
}

{
int start, end, total;
start = clock();
double pi;
int i;

InitializeCriticalSection(&hUpdateMutex);

{
}

// wait until both threads end its processing

pi = global_sum * step;
printf(" pi is %f \n",pi);
end = clock();
total = end - start;
printf("%d\n", total);

return 0;
}

{
int i, start;
double x, sum = 0.0;
start = *(int *) arg; // arg is actually the thread ID, i in the loop.
step = 1.0/(double) num_steps;

// NUM_THREADS in this case is 2
// so that thread ID=1, ThreadPI will compute odd number in the loop
{
x = (i-0.5)*step;
sum = sum + 4.0/(1.0+x*x);
}

// try to make atomic statement, otherwise...
EnterCriticalSection(&hUpdateMutex);
global_sum += sum;
LeaveCriticalSection(&hUpdateMutex);
}```

## Test Performance

Based on profiling, this type of optimisation, as expected, running on uni-processor machine will not have significant improvement. Unless run on dual-processor machine, definitely you will see big difference between single thread and dual thread. here's the live sample...

```Machine -> Pentium 4 (2.4GHz, 1GB Memory)

Profile: Function timing, sorted by time
Date:    Sun Oct 10 02:23:09 2004

Program Statistics
------------------
Command line at 2004 Oct 10 02:23: "G:\j2\net\code project\article -
Optimise For Intel HT Technology\InLocal\PI\Release\PI"
Total time: 5342.205 millisecond
Time outside of functions: 38.445 millisecond
Call depth: 2
Total functions: 8
Total hits: 3
Function coverage: 37.5%
Module Statistics for pi.exe
----------------------------
Time in module: 5303.760 millisecond
Percent of time in module: 100.0%
Functions in module: 8
Hits in module: 3
Module function coverage: 37.5%
Func          Func+Child           Hit
Time   %         Time      %      Count  Function
---------------------------------------------------------
2246.857  42.4     5303.760 100.0        1 _main (pi.obj)
1620.240  30.5     1620.240  30.5        1 SingleThreadPI(void) (pi.obj)
1436.662  27.1     1436.662  27.1        1 DualThreadPI(void) (pi.obj)

Contributed by Nigel
Machine -> Dell Precision 530, Pentium Xeon (2 x 2.0GHz, 512MB Memory)

Profile: Function timing, sorted by time
Date: Sat Oct 09 16:58:43 2004

Program Statistics
------------------
Command line at 2004 Oct 09 16:58: "D:\somewhere\PI_demo\PI\Debug\PI"
Total time: 10076.987 millisecond
Time outside of functions: 11.837 millisecond
Call depth: 2
Total functions: 7
Total hits: 3
Function coverage: 42.9%

Module Statistics for pi.exe
----------------------------
Time in module: 10065.150 millisecond
Percent of time in module: 100.0%
Functions in module: 7
Hits in module: 3
Module function coverage: 42.9%

Func Func+Child Hit
Time % Time % Count Function
---------------------------------------------------------
7175.142 71.3 10065.150 100.0 1 _main (pi.obj)
1931.827 19.2 1931.827 19.2 1 SingleThreadPI(void) (pi.obj)
958.182 9.5 958.182 9.5 1 DualThreadPI(void) (pi.obj)

Machine -> Pentium Xeon (2 x 2.8GHz, 512MB Memory)

(Pending...ToBeContinued...)
This one will have significant improvement. Anyone could volunteer? ```

## Finally

Actually I've seen its performance in my office already (dual Xeon 2.8GHz, it doubles the performance), just that now I am at home, I only can provide my pc's performance (has a bit improvement), the rest will let yourself to prove it... *wink*

Learning is fun =)

## Recommendation/Tips

After you have read the article, probably you may want to try it out in your application. Assume you have put in lots of effort, after implement, as the result, you don't get the result you expected. Please don't jump at me. You might target wrongly in your application. Also, may be, you've tested your application on single-CPU machine, not much improvement you will get. See Requirement. This article will only show some light in the tunnel for you, it won't get you far. Here are some tips for you. To take advantage of this type of optimisation, my practice is to do profiling of my application, then target for frequently used function and especially computational expensive like compression, image filtering process, and other morphology process.. etc.

In large scale server application, its hardware probably may have at least 2-4 CPUs per machine, but I have never tried it before. Most likely I will not try it in near future as well, because I am not into that field. But you may want to try it! If the result is good, please don't mind to share the experience over here.. Good luck!

Intel Web Site

A list of licenses authors might use can be found here

## Share

 Software Developer (Senior) Singapore
He started programming in dBase, pascal, c then assembly. Actively working on image processing algorithm and customised vision applications. His major actually is more on control engineering, motion control, machine vision & satistics.
He did like to work on many projects that require careful analytical method.
He can be reached at albertoycc@hotmail.com.

## You may also be interested in...

 Pro Pro

 First Prev Next
 This is useful for Multi Core Processors sanun9-Mar-07 22:14 sanun 9-Mar-07 22:14
 Re: This is useful for Multi Core Processors f210-Mar-07 3:01 f2 10-Mar-07 3:01
 Profiling is fairly pointless Ralph Walden11-Oct-04 4:42 Ralph Walden 11-Oct-04 4:42
 Re: Profiling is fairly pointless f211-Oct-04 5:53 f2 11-Oct-04 5:53
 Re: Profiling is fairly pointless WREY11-Oct-04 7:16 WREY 11-Oct-04 7:16
 Re: Profiling is fairly pointless f211-Oct-04 8:16 f2 11-Oct-04 8:16
 ?? Re: Profiling is fairly pointless f211-Oct-04 8:56 f2 11-Oct-04 8:56
 Re: ?? Re: Profiling is fairly pointless f211-Oct-04 9:22 f2 11-Oct-04 9:22
 Re: ?? Re: Profiling is fairly pointless WREY11-Oct-04 10:17 WREY 11-Oct-04 10:17
 Re: ?? Re: Profiling is fairly pointless f211-Oct-04 18:30 f2 11-Oct-04 18:30
 You posted that already yesterday ... WREY11-Oct-04 19:38 WREY 11-Oct-04 19:38
 Re: You posted that already yesterday ... f211-Oct-04 22:10 f2 11-Oct-04 22:10
 Re: You posted that already yesterday ... WREY12-Oct-04 4:26 WREY 12-Oct-04 4:26
 Re: You posted that already yesterday ... f212-Oct-04 6:06 f2 12-Oct-04 6:06
 You are an EVOLUTIONARY MISTAKE !! WREY12-Oct-04 7:30 WREY 12-Oct-04 7:30
 Call yourself chemical waste! f212-Oct-04 15:12 f2 12-Oct-04 15:12
 Proof that you are an EVOLUTIONARY MISTAKE!! WREY13-Oct-04 0:19 WREY 13-Oct-04 0:19
 Re: Proof that you are an EVOLUTIONARY MISTAKE!! f213-Oct-04 6:04 f2 13-Oct-04 6:04
 Re: Proof that you are an EVOLUTIONARY MISTAKE!! WREY13-Oct-04 7:38 WREY 13-Oct-04 7:38
 WREY, you are dead!! f213-Oct-04 20:18 f2 13-Oct-04 20:18
 WRONG!!! "LOW LIFE". I am still ALIVE, WREY16-Oct-04 7:05 WREY 16-Oct-04 7:05
 WREY is trying to wakeup from dead.. f216-Oct-04 7:50 f2 16-Oct-04 7:50
 I was NEVER dead, "LOW LIFE" !!! WREY16-Oct-04 8:35 WREY 16-Oct-04 8:35
 Re: I was NEVER dead, "LOW LIFE" !!! f216-Oct-04 23:07 f2 16-Oct-04 23:07
 A very poorly written article. WREY10-Oct-04 6:30 WREY 10-Oct-04 6:30
 nah.. Re: A very poorly written article. f210-Oct-04 7:03 f2 10-Oct-04 7:03
 I stand by my comment that it's, " A very poorly written article." WREY10-Oct-04 7:37 WREY 10-Oct-04 7:37
 are u going to wrong site? writing competition? f210-Oct-04 15:52 f2 10-Oct-04 15:52
 Re: are u going to wrong site? writing competition? WREY11-Oct-04 3:03 WREY 11-Oct-04 3:03
 what's ur dual-processor result? f211-Oct-04 3:25 f2 11-Oct-04 3:25
 INTERESTING!! f210-Oct-04 16:04 f2 10-Oct-04 16:04
 "INTERESTING" by whose standard? WREY11-Oct-04 3:54 WREY 11-Oct-04 3:54
 Re: "INTERESTING" by whose standard? f211-Oct-04 5:13 f2 11-Oct-04 5:13
 LIAR !!! WREY11-Oct-04 6:36 WREY 11-Oct-04 6:36
 what's ur dual-processor result? f211-Oct-04 7:42 f2 11-Oct-04 7:42
 Evolution hasn't caught up with you as yet. WREY11-Oct-04 8:45 WREY 11-Oct-04 8:45
 Re: Evolution hasn't caught up with you as yet. f211-Oct-04 8:59 f2 11-Oct-04 8:59
 Get a room Anonymous11-Oct-04 13:36 Anonymous 11-Oct-04 13:36
 Re: I stand by my comment that it's, " A very poorly written article." cucul6-Nov-04 0:34 cucul 6-Nov-04 0:34
 Re: I stand by my comment that it's, " A very poorly written article." f26-Nov-04 4:05 f2 6-Nov-04 4:05
 How to do profiling f210-Oct-04 6:13 f2 10-Oct-04 6:13
 Re: How to do profiling WREY10-Oct-04 6:43 WREY 10-Oct-04 6:43
 something is missing... f210-Oct-04 5:56 f2 10-Oct-04 5:56
 Re: something is missing... Anonymous23-Mar-05 19:37 Anonymous 23-Mar-05 19:37
 Re: something is missing... f223-Mar-05 20:57 f2 23-Mar-05 20:57
 Dual Xeon Results NigelQ9-Oct-04 13:54 NigelQ 9-Oct-04 13:54
 Re: Dual Xeon Results f29-Oct-04 23:24 f2 9-Oct-04 23:24
 Last Visit: 31-Dec-99 18:00     Last Update: 24-Sep-16 13:29 Refresh 1