Click here to Skip to main content
15,891,725 members
Articles / Programming Languages / C++
Article

Multi Dimensional Arrays using Template Meta Programming

Rate me:
Please Sign up or sign in to vote.
3.05/5 (8 votes)
20 Mar 2004CPOL3 min read 87.1K   465   17   20
A Real Multi-Dimensional Array

Introduction

Multi-dimensional arrays provide a nice mathematical data structure for mathematical and soft computing projects. Tensor is usually the data structure used to store data in multi-dimensional arrays. Helper classes helps in design subscripting operators to select rows, cols, and plates. Helper classes may be implemented by adding constant template arguments describing the state of the array, or sub-array. Refer to "Tensor templates", " 2D Matrix Container with [][] indexing", and "Template-Based Helper Classes: An application to Multidimensional Arrays" for more details. A tensor is a uni-dimensional array with indexing facility that allows manipulation as N-dimensional matrix. Computational cost for accessing a specific element grows exponentially as the number of dimensions increases. Helper classes force many constructors/destructors calls for temporary objects. Actually helper classes provide a nice solution for little projects with short loops.

In mathematics, finding a solution in some space costs a lot. One way to find fast solution is transforming the problem (actually the search space) into a larger space where more dimensions may lead to faster solution.

Now, consider the following piece of code:

int* pi1=new int[20];
int** pi2=new int*[4];
for(int i=0; i<4; i++)
    pi2[i]=new int[5];

Now, what do you think? How shall I access the element located in 2nd row and 3rd col? How much does it cost for each of them?

Obviously, pi2[2][3] is faster than pi1[2*4+3].

How about very large N-dimensional arrays with complex objects!? Can you imagine this!?

"Template-Based Helper Classes: An application to Multidimensional Arrays" provides a nice solution, but subscripting operators cost a lot for computations, costs much if repeated in a very long loop, uses lots of helper temporary classes, and fragments the memory.

As described in linear algebra, adding more dimensions fastens the access to data elements.

Problem Statement

How shall we develop a template class that generates multi-dimensional pointers!!?

int i; 
int* pi; 
int** ppi; 
int*** pppi; 
...

How shall we add number of dimensions as a template argument!?

Basic Solution

The basic idea makes use of Template Meta-Programming, implementing a template class with argument M indicating the number of nested pointers.

Consider the following class:

template<_sizetype M>
class multi_ptr
{
public:
    typedef double T;
    typedef multi_ptr<M-1>::pointer* pointer; 
};

class multi_ptr<0>
{
public:
    typedef double T;
    typedef T pointer;
};

Now, consider how to declare a multi-dimensional pointer:

multi_ptr<3>::pointer p3;

Now, more facilities are needed; we need to add class template argument T indicating the type of this array.

More Enhancement - Adding Class Template Argument

Template Meta-Programming works only for single argument template classes. Adding a class template argument cannot be achieved directly.

An outer class should be used to pass the template argument through. Consider the following multi_array class:

template<class T, _sizetype N>
class multi_array
{
public:
    template<_sizetype M>
    class multi_ptr
    {
    public:
        typedef multi_ptr<M-1>::pointer* pointer; 
    };
    
    class multi_ptr<0>
    {
    public:
        typedef T pointer;
    };

    typedef multi_ptr<N>::pointer pointer;
    typedef const pointer const_pointer;
};

Class template argument T is passed through the outer multi_array class with the number of dimensions N. The inner multi_ptr class makes use of the outer template argument T for type specialization.

Now the very important question is:

How shall we allocate this multi dimensional array!? We need to specialize nested allocating loops!!

How shall we build a varying number of nested allocating loops!?

We need a little addition for allocation!!

Little Enhancement - Adding Allocators

A simple compile-time loop provides a nice solution for generating a variable number of nested loops. An inner class called multi_allocator is responsible for allocating multi-dimensional pointers.

template<_sizetype M>
class multi_allocator
{
public:
    static multi_ptr<M>::pointer alloc(_sizetype* dims)
    {
        _sizetype i;
        multi_ptr<M>::pointer rtn;
        rtn=new multi_ptr<M-1>::pointer[dims[0]];
        for(i=0; i<dims[0]; i++)
            rtn[i]=multi_allocator<M-1>::alloc(dims+1);
        return rtn;
    }
};
    
class multi_allocator<1>
{
public:
    static multi_ptr<1>::pointer alloc(_sizetype* dims)
    {
        multi_ptr<1>::pointer rtn;
        rtn=new T[dims[0]];
        return rtn;
    }
};

Now, fine touches is needed to finalize the multi_array class. Constructors, casting operators, and data members should be added.

Finalizing multi_array - Adding constructors and casting operators

typedef unsigned int _sizetype;
    
template<class T, _sizetype N>
class multi_array
{
public:
    template<_sizetype M>
    class multi_ptr
    {
    public:
        typedef multi_ptr<M-1>::pointer* pointer; 
    };
    
    class multi_ptr<0>
    {
    public:
        typedef T pointer;
    };


    template<_sizetype M>
    class multi_allocator
    {
    public:
        static multi_ptr<M>::pointer alloc(_sizetype* dims)
        {
            _sizetype i;
            multi_ptr<M>::pointer rtn;
            rtn=new multi_ptr<M-1>::pointer[dims[0]];
            for(i=0; i<dims[0]; i++)
                rtn[i]=multi_allocator<M-1>::alloc(dims+1);
            return rtn;
        }
    };
    
    class multi_allocator<1>
    {
    public:
        static multi_ptr<1>::pointer alloc(_sizetype* dims)
        {
            multi_ptr<1>::pointer rtn;
            rtn=new T[dims[0]];
            return rtn;
        }
    };
    
    typedef multi_ptr<N>::pointer pointer;
    typedef const pointer const_pointer;
    typedef multi_allocator<N> allocator;

    
    operator pointer(){return _pdata;}
    
    multi_array(const _sizetype d1, ...)
    {
        memcpy(_pdims, &d1, N*sizeof(_sizetype));
        _pdata=allocator::alloc(_pdims);
    }
    
    multi_array(const _sizetype* pdims)
    {
        memcpy(_pdims, pdims, N*sizeof(_sizetype));
        _pdata=allocator::alloc(_pdims);
    }
    
protected:
    pointer _pdata;
    _sizetype _pdims[N+1];

};

Thank You!!

Thanx a lot for your time. I hope you enjoyed this idea!! Hear from you soon:).

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Engineer
Australia Australia
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
Generalerror using g++ 4.4.5 Pin
axllaruse10-May-11 2:55
axllaruse10-May-11 2:55 
QuestionDeallocate? Pin
GregoryBook29-Sep-06 15:50
GregoryBook29-Sep-06 15:50 
AnswerRe: Deallocate? Pin
Mo Hossny23-Oct-06 18:27
Mo Hossny23-Oct-06 18:27 
Generalinvalid template argument 'N' Pin
SMJT17-Sep-04 7:31
SMJT17-Sep-04 7:31 
GeneralRe: invalid template argument 'N' Pin
SMJT17-Sep-04 7:38
SMJT17-Sep-04 7:38 
GeneralRe: invalid template argument 'N' Pin
Mo Hossny17-Sep-04 12:24
Mo Hossny17-Sep-04 12:24 
GeneralRe: invalid template argument 'N' Pin
SMJT17-Sep-04 21:35
SMJT17-Sep-04 21:35 
GeneralRe: invalid template argument 'N' Pin
Mo Hossny19-Sep-04 9:12
Mo Hossny19-Sep-04 9:12 
GeneralCompiler error on VC++.NET 2003 Pin
Rui A. Rebelo8-Apr-04 4:11
Rui A. Rebelo8-Apr-04 4:11 
GeneralRe: Compiler error on VC++.NET 2003 Pin
Mo Hossny10-Apr-04 22:01
Mo Hossny10-Apr-04 22:01 
GeneralRe: Compiler error on VC++.NET 2003 Pin
Mo Hossny3-Jun-04 1:56
Mo Hossny3-Jun-04 1:56 
Questionpi2[2][3] is faster than pi1[2*4+3] ?? Pin
sahn021-Mar-04 3:20
sahn021-Mar-04 3:20 
AnswerRe: pi2[2][3] is faster than pi1[2*4+3] ?? Pin
Mo Hossny21-Mar-04 3:23
Mo Hossny21-Mar-04 3:23 
GeneralRe: pi2[2][3] is faster than pi1[2*4+3] ?? Pin
Tim Smith21-Mar-04 6:14
Tim Smith21-Mar-04 6:14 
GeneralRe: pi2[2][3] is faster than pi1[2*4+3] ?? Pin
Mo Hossny21-Mar-04 6:26
Mo Hossny21-Mar-04 6:26 
GeneralRe: pi2[2][3] is faster than pi1[2*4+3] ?? Pin
Tim Smith21-Mar-04 7:16
Tim Smith21-Mar-04 7:16 
Lets look at it again with routines called GetAt (array, i, j)

For pi1 [20], here is what we have:

int GetAt (int pi1 [20], int i, int j)
{
	return pi1 [i * 5 + j];
}

  00000	8b 44 24 08	 mov	 eax, DWORD PTR _i$[esp-4]
  00004	8b 4c 24 0c	 mov	 ecx, DWORD PTR _j$[esp-4]
  00008	8d 14 81	 lea	 edx, DWORD PTR [ecx+eax*4]
  0000b	8b 4c 24 04	 mov	 ecx, DWORD PTR _pi1$[esp-4]
  0000f	03 c2		 add	 eax, edx
  00011	8b 04 81	 mov	 eax, DWORD PTR [ecx+eax*4]

(without the arglist access instructions)

  00008	8d 14 81	 lea	 edx, DWORD PTR [ecx+eax*4]
  0000f	03 c2		 add	 eax, edx
  00011	8b 04 81	 mov	 eax, DWORD PTR [ecx+eax*4]


Here is pi2 [4][5]

int GetAt (int pi2 [4][5], int i, int j)
{
	return pi2 [i][j];
}

  00000	8b 44 24 08	 mov	 eax, DWORD PTR _i$[esp-4]
  00004	8b 4c 24 0c	 mov	 ecx, DWORD PTR _j$[esp-4]
  00008	8d 14 81	 lea	 edx, DWORD PTR [ecx+eax*4]
  0000b	8b 4c 24 04	 mov	 ecx, DWORD PTR _pi2$[esp-4]
  0000f	03 c2		 add	 eax, edx
  00011	8b 04 81	 mov	 eax, DWORD PTR [ecx+eax*4]

(without the arglist access instructions)

  00008	8d 14 81	 lea	 edx, DWORD PTR [ecx+eax*4]
  0000f	03 c2		 add	 eax, edx
  00011	8b 04 81	 mov	 eax, DWORD PTR [ecx+eax*4]


Here is int **pi4;

int GetAt (int **pi4, int i, int j)
{
	return pi4 [i] [j];
}

  00000	8b 44 24 08	 mov	 eax, DWORD PTR _i$[esp-4]
  00004	8b 4c 24 04	 mov	 ecx, DWORD PTR _pi4$[esp-4]
  00008	8b 14 81	 mov	 edx, DWORD PTR [ecx+eax*4]
  0000b	8b 44 24 0c	 mov	 eax, DWORD PTR _j$[esp-4]
  0000f	8b 04 82	 mov	 eax, DWORD PTR [edx+eax*4]

(without the arglist access instructions)

  00008	8b 14 81	 mov	 edx, DWORD PTR [ecx+eax*4]
  0000f	8b 04 82	 mov	 eax, DWORD PTR [edx+eax*4]


Here is the multi dim

int GetAt (multi_array <int, 2> &pi3, int i, int j)
{
	return pi3 [i][j];
}

  00000	8b 44 24 04	 mov	 eax, DWORD PTR _pi3$[esp-4]
  00004	8b 08		 mov	 ecx, DWORD PTR [eax]
  00006	8b 54 24 08	 mov	 edx, DWORD PTR _i$[esp-4]
  0000a	8b 04 91	 mov	 eax, DWORD PTR [ecx+edx*4]
  0000d	8b 4c 24 0c	 mov	 ecx, DWORD PTR _j$[esp-4]
  00011	8b 04 88	 mov	 eax, DWORD PTR [eax+ecx*4]

(without the arglist access instructions)
(note, the first instruction is an artifact of the 
calling method and really shouldn't be held against the implementation.)
  00004	8b 08		 mov	 ecx, DWORD PTR [eax]
  0000a	8b 04 91	 mov	 eax, DWORD PTR [ecx+edx*4]
  00011	8b 04 88	 mov	 eax, DWORD PTR [eax+ecx*4]


As to the first question of is p[20] slower or faster than p[4][5] where we are using normal C arrays (as stated in the article), they are the same thing EXACTLY.

Now let us take a look at your multidim. What we are doing is trading off an LEA and an addition for an extra memory access. If we have an array of something that isn't nicely sized, such as a 54 byte class, then everyone ends up doing one IMUL and multidim has an extra memory access. Now when you consider that the multidim system has to allocate and deallocate memory, it doesn't make it very good when you have transient objects in a tight loop. Also, in a tight loop accessing a fair amount of memory, the extra memory access of multidim can blow your L1 or L2 cache and thus slow down your program.

I'm not saying multidim is bad. Not in the least. However, when you start talking about performance, things get very tricky, very fast. I have no idea which of the four is the fastest without running the program through a series of different array sizes and types using a tool such as VTune (Intel). However, the memory allocations and deallocations would kill a program with transient arrays in a tight loop. I would NEVER use this solution in places where I just need a simple multidim array that can be allocated on the stack. In the case of a multithreaded application, memory allocations run the risk of stalling threads on a multiprocessor box. (malloc/free is a LOT slower than "ADD SP, 48").

Tim Smith

I'm going to patent thought. I have yet to see any prior art.
GeneralRe: pi2[2][3] is faster than pi1[2*4+3] ?? Pin
Don Clugston21-Mar-04 15:28
Don Clugston21-Mar-04 15:28 
GeneralRe: pi2[2][3] is faster than pi1[2*4+3] ?? Pin
Nitron22-Mar-04 3:16
Nitron22-Mar-04 3:16 
GeneralRe: pi2[2][3] is faster than pi1[2*4+3] ?? Pin
John M. Drescher22-Mar-04 3:58
John M. Drescher22-Mar-04 3:58 
GeneralRe: pi2[2][3] is faster than pi1[2*4+3] ?? Pin
Mo Hossny22-Mar-04 22:58
Mo Hossny22-Mar-04 22:58 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.