Click here to Skip to main content
15,886,060 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi,

I have a 3d structure that contains a Byte variable cCtrlVal.
I would like to set this at runtime to 255 as fast as possible.
There are 5760 instances of cCtrlVal.

Currently I am looping through the matrix with 3 nested for loops.

This there a better quicker way to quick set this var?
Thanks in advance.

:Ron

C++
struct typControlMatrix
	{
	BOOL bEnabled; 
	BOOL bChanged;
	BOOL LastChange;	
	BOOL AdjStripNo;
        BYTE cCtrlVal; 
	enum enControlType eCtrlType;
	DWORD lCtrlNo;
	char sStripDesc[MAX_SCRIBBLE_STRIP_DESC]; 
	CString sCtrlVal;
	BOOL bDescChanged;
	int TrackPage; 
	int CompRatioIndex ;
	DWORD lStripNo;
	int lParamNo;
	} m_tControlMatrix[128][5][9];
Posted
Updated 21-Nov-12 18:45pm
v3
Comments
Mohibur Rashid 22-Nov-12 0:48am    
i don't think you have other choice
Sergey Alexandrovich Kryukov 22-Nov-12 2:13am    
Agree; this is easy to make worse, but hard to improve.
By the way, using ++index instead of usual index++ of "for" statement usually makes it faster -- try it...
--SA

You can flatten the loop to a 1D affair by taking the address of the first element of the 3D array and then do 128*5*9 iterations.

Today's optimizing compilers might do the same and in that case this technique would not yield any speed advantage. On the other hand, it's worth a try.
 
Share this answer
 
With nv3's suggested 'flat approach', using pointers I obtained a somewhat surprising result (about 3x speed improvement):

C++
register int i, j, k;
  LARGE_INTEGER t[4];

  QueryPerformanceCounter(&t[0]);

  //-> 3 loops
  for (i=0; i<128; i++)
  {
    for (j=0; j<5; j++)
    {
      for (k=0; k<9; k++)
        m_tControlMatrix[i][j][k].cCtrlVal = 255;
    }
  }
  //<-
  QueryPerformanceCounter(&t[1]);

  QueryPerformanceCounter(&t[2]);
  //-> flat pointers
  register BYTE * p = &m_tControlMatrix[0][0][0].cCtrlVal;
  register BYTE * q = p + 5760 * sizeof(typControlMatrix);
  while (p < q)
  {
    *p = 255;
    p += sizeof(typControlMatrix);
  }
  //<-
  QueryPerformanceCounter(&t[3]);
  
  CString s;
  s.Format("3 loops: %I64d flat pointers: %I64d speed ratio %g ", (t[1].QuadPart-t[0].QuadPart), (t[3].QuadPart-t[2].QuadPart), ((double)(t[1].QuadPart-t[0].QuadPart))/(t[3].QuadPart-t[2].QuadPart));
  MessageBox(s, "Test");


The output:
3 loops: 195327 flat pointers: 67743 speed ratio 2.88335 
 
Share this answer
 
v2
Comments
nv3 22-Nov-12 8:05am    
Thanks for implementing the idea and doing the performance measurement! A factor of almost 3 is indeed a nice result and was probably so high because the loop overhead in the 3 nested loops outweighs the loop body, which is only a single byte copy. +5
CPallini 22-Nov-12 8:49am    
Thank you.
I suppose like you that improvement reasons are due to loops overhead.
BTW you already got my 5 for the good original hypothesis.
nv3 22-Nov-12 9:06am    
Thank you!
Ron Anders 22-Nov-12 9:44am    
Nice! Thank you so much.
CPallini 22-Nov-12 11:23am    
You are welcome.
Please note I've used #define MAX_SCRIBBLE_STRIP_DESC 6, since I had no idea of its actual value. I don't know how much this had biased the result.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900