Click here to Skip to main content
11,934,809 members (47,913 online)
Rate this:
Please Sign up or sign in to vote.
See more: C++

I have a 3d structure that contains a Byte variable cCtrlVal.
I would like to set this at runtime to 255 as fast as possible.
There are 5760 instances of cCtrlVal.

Currently I am looping through the matrix with 3 nested for loops.

This there a better quicker way to quick set this var?
Thanks in advance.


struct typControlMatrix
	BOOL bEnabled; 
	BOOL bChanged;
	BOOL LastChange;	
	BOOL AdjStripNo;
        BYTE cCtrlVal; 
	enum enControlType eCtrlType;
	DWORD lCtrlNo;
	CString sCtrlVal;
	BOOL bDescChanged;
	int TrackPage; 
	int CompRatioIndex ;
	DWORD lStripNo;
	int lParamNo;
	} m_tControlMatrix[128][5][9];
Posted 21-Nov-12 17:33pm
Edited 21-Nov-12 19:45pm
Mohibur Rashid 22-Nov-12 0:48am
i don't think you have other choice
Agree; this is easy to make worse, but hard to improve.
By the way, using ++index instead of usual index++ of "for" statement usually makes it faster -- try it...
Rate this: bad
Please Sign up or sign in to vote.

Solution 1

You can flatten the loop to a 1D affair by taking the address of the first element of the 3D array and then do 128*5*9 iterations.

Today's optimizing compilers might do the same and in that case this technique would not yield any speed advantage. On the other hand, it's worth a try.
Rate this: bad
Please Sign up or sign in to vote.

Solution 2

With nv3's suggested 'flat approach', using pointers I obtained a somewhat surprising result (about 3x speed improvement):

register int i, j, k;
  //-> 3 loops
  for (i=0; i<128; i++)
    for (j=0; j<5; j++)
      for (k=0; k<9; k++)
        m_tControlMatrix[i][j][k].cCtrlVal = 255;
  //-> flat pointers
  register BYTE * p = &m_tControlMatrix[0][0][0].cCtrlVal;
  register BYTE * q = p + 5760 * sizeof(typControlMatrix);
  while (p < q)
    *p = 255;
    p += sizeof(typControlMatrix);
  CString s;
  s.Format("3 loops: %I64d flat pointers: %I64d speed ratio %g ", (t[1].QuadPart-t[0].QuadPart), (t[3].QuadPart-t[2].QuadPart), ((double)(t[1].QuadPart-t[0].QuadPart))/(t[3].QuadPart-t[2].QuadPart));
  MessageBox(s, "Test");

The output:
3 loops: 195327 flat pointers: 67743 speed ratio 2.88335 
nv3 22-Nov-12 8:05am
Thanks for implementing the idea and doing the performance measurement! A factor of almost 3 is indeed a nice result and was probably so high because the loop overhead in the 3 nested loops outweighs the loop body, which is only a single byte copy. +5
CPallini 22-Nov-12 8:49am
Thank you.
I suppose like you that improvement reasons are due to loops overhead.
BTW you already got my 5 for the good original hypothesis.
nv3 22-Nov-12 9:06am
Thank you!
Ron Anders 22-Nov-12 9:44am
Nice! Thank you so much.
CPallini 22-Nov-12 11:23am
You are welcome.
Please note I've used #define MAX_SCRIBBLE_STRIP_DESC 6, since I had no idea of its actual value. I don't know how much this had biased the result.
VuNic 14-Dec-12 8:00am
Why threads are not considered? Flat pointer on multiple threads feels perfect fit for his performance requirement.
CPallini 14-Dec-12 8:20am
I tried with threads, but got longer execution times.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month

Advertise | Privacy | Mobile
Web04 | 2.8.151126.1 | Last Updated 22 Nov 2012
Copyright © CodeProject, 1999-2015
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100