Click here to Skip to main content
11,568,180 members (37,617 online)
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C++
Hi,

I have a 3d structure that contains a Byte variable cCtrlVal.
I would like to set this at runtime to 255 as fast as possible.
There are 5760 instances of cCtrlVal.

Currently I am looping through the matrix with 3 nested for loops.

This there a better quicker way to quick set this var?
Thanks in advance.

:Ron

 
struct typControlMatrix
	{
	BOOL bEnabled; 
	BOOL bChanged;
	BOOL LastChange;	
	BOOL AdjStripNo;
        BYTE cCtrlVal; 
	enum enControlType eCtrlType;
	DWORD lCtrlNo;
	char sStripDesc[MAX_SCRIBBLE_STRIP_DESC]; 
	CString sCtrlVal;
	BOOL bDescChanged;
	int TrackPage; 
	int CompRatioIndex ;
	DWORD lStripNo;
	int lParamNo;
	} m_tControlMatrix[128][5][9];
Posted 21-Nov-12 16:33pm
Edited 21-Nov-12 18:45pm
v3
Comments
Mohibur Rashid at 22-Nov-12 0:48am
   
i don't think you have other choice
Sergey Alexandrovich Kryukov at 22-Nov-12 2:13am
   
Agree; this is easy to make worse, but hard to improve.
By the way, using ++index instead of usual index++ of "for" statement usually makes it faster -- try it...
--SA
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

You can flatten the loop to a 1D affair by taking the address of the first element of the 3D array and then do 128*5*9 iterations.

Today's optimizing compilers might do the same and in that case this technique would not yield any speed advantage. On the other hand, it's worth a try.
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

With nv3's suggested 'flat approach', using pointers I obtained a somewhat surprising result (about 3x speed improvement):

register int i, j, k;
  LARGE_INTEGER t[4];
 
  QueryPerformanceCounter(&t[0]);
 
  //-> 3 loops
  for (i=0; i<128; i++)
  {
    for (j=0; j<5; j++)
    {
      for (k=0; k<9; k++)
        m_tControlMatrix[i][j][k].cCtrlVal = 255;
    }
  }
  //<-
  QueryPerformanceCounter(&t[1]);
 
  QueryPerformanceCounter(&t[2]);
  //-> flat pointers
  register BYTE * p = &m_tControlMatrix[0][0][0].cCtrlVal;
  register BYTE * q = p + 5760 * sizeof(typControlMatrix);
  while (p < q)
  {
    *p = 255;
    p += sizeof(typControlMatrix);
  }
  //<-
  QueryPerformanceCounter(&t[3]);
  
  CString s;
  s.Format("3 loops: %I64d flat pointers: %I64d speed ratio %g ", (t[1].QuadPart-t[0].QuadPart), (t[3].QuadPart-t[2].QuadPart), ((double)(t[1].QuadPart-t[0].QuadPart))/(t[3].QuadPart-t[2].QuadPart));
  MessageBox(s, "Test");

The output:
 
3 loops: 195327 flat pointers: 67743 speed ratio 2.88335 
  Permalink  
v2
Comments
nv3 at 22-Nov-12 8:05am
   
Thanks for implementing the idea and doing the performance measurement! A factor of almost 3 is indeed a nice result and was probably so high because the loop overhead in the 3 nested loops outweighs the loop body, which is only a single byte copy. +5
CPallini at 22-Nov-12 8:49am
   
Thank you.
I suppose like you that improvement reasons are due to loops overhead.
BTW you already got my 5 for the good original hypothesis.
nv3 at 22-Nov-12 9:06am
   
Thank you!
Ron Anders at 22-Nov-12 9:44am
   
Nice! Thank you so much.
CPallini at 22-Nov-12 11:23am
   
You are welcome.
Please note I've used #define MAX_SCRIBBLE_STRIP_DESC 6, since I had no idea of its actual value. I don't know how much this had biased the result.
VuNic at 14-Dec-12 8:00am
   
Why threads are not considered? Flat pointer on multiple threads feels perfect fit for his performance requirement.
CPallini at 14-Dec-12 8:20am
   
I tried with threads, but got longer execution times.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS


Advertise | Privacy | Mobile
Web04 | 2.8.150624.2 | Last Updated 22 Nov 2012
Copyright © CodeProject, 1999-2015
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100