Click here to Skip to main content
Email Password   helpLost your password?

Introduction

64 bit is wonderful - no longer we are limited to a just 2GB (per Win32 process) - you can now address a full 8TB of memory (per Win64 process)... which is great if your application really needs 8TB of memory.

If you don't need the full address space, 64-bit can be wasteful: pointers take twice the memory, which may actually result in a slower application, especially when using pointer-laden data structure such as a binary tree (larger structures take more cache space).

A simple scheme is presented that allows using 32-bit "pointers" to access up to 64GB of memory on a 64-bit machine

Encoding 64-bit pointers in 32-bits

The main idea of sptr is data alignment - modern processors feel much more comfortable when accessing "aligned" addresses: accessing a char at address 0xC0001 means more work than accessing an int at 0xC0000.

malloc()(which does the heavy lifting for new) only returns aligned addresses (see http://msdn.microsoft.com/en-us/library/ycsb6wwf.aspx) - a pointer returned by malloc() will always be aligned to a 16-byte boundary on Visual Studio - the 4 lowest bits will always be 0.

Therefore, at least for pointers we allocated on the heap, we can just "delete" the 4 lowest bits (by shifting right) without losing information.

If all memory addresses were allocated from the "bottom" (the highest bits are also 0), we could just encode a pointer as follows:

#include <stddef.h>  // for uintptr_t
typedef unsigned __int32 uint32_t;

uint32_t encode(void* p)
{
  uintptr_t i = reinterpret_cast<uintptr_t>(p) >> 4;
  return static_cast<uint32_t>(i);
}   

(uintptr_t is an integral data type that is guaranteed to be the same size as a pointer)

This will encode any (16-byte aligned) address in the range 0x00 0000 0000 - 0x10 0000 0000 into a 32-bit integer.

However, the operating system is free to map physical memory into other virtual memory segments - for example, mapping kernel memory "at the top", so it is preferable to keep such a map ourselves:

#include <boost/thread/mutex.hpp>
#include <boost/thread/locks.hpp>

class sptr_base
{   
protected:
  static const uint32_t ALIGNMENT_BITS = 4;
  static const uint32_t ALIGNMENT = (1<<ALIGNMENT_BITS);
  static const uintptr_t ALIGNMENT_MASK = ALIGNMENT - 1;

protected:
  static uintptr_t     _seg_map[ALIGNMENT];
  static uintptr_t     _segs;
  static boost::mutex  _m;
  inline static uintptr_t ptr2seg(uintptr_t p)
  {
    p &= ~0xFFFFFFFFULL; // Keep only high part
    uintptr_t s = _segs;
    uintptr_t i = 0;
    for (; i < s; ++i)
      if (_seg_map[i] == p)
    return i;

    // Not found - now we do it the "right" way (mutex and all)
    boost::lock_guard<boost::mutex> lock(_m);
    for (i = 0; i < s; ++i)
      if (_seg_map[i] == p)
        return i;

    i = _segs++;
    if (_segs > ALIGNMENT)
      throw bad_segment("Segment out of range");

    _seg_map[i] = p;
    return i;
  }
};

ptr2seg() maps the high bits of a pointer to one of a few segments. The code refrains from using a mutex when reading the map based on the atomicity of integer assignment.

The actual pointer encode/decode is then as follows:

#include <boost/pool/detail/singleton.hpp>
typedef boost::details::pool::singleton_default<segment_mapper> the_segment_mapper;

uint32_t encode(const_native_pointer_type ptr)
{
  uintptr_t p = reinterpret_cast<uintptr_t>(ptr);
  if ((p & ALIGNMENT_MASK) != 0)
    throw bad_alignment("Pointer is not aligned");
  return (uint32_t)(ptr2seg(p) + p);
}

inline native_pointer_type decode(uint32_t e)
{
   uintptr_t el = e;
   uintptr_t ptr = (_seg_map[el & ALIGNMENT_MASK] + el) & ~ALIGNMENT_MASK;
   return reinterpret_cast<native_pointer_type>(ptr);
} 

sptr

sptr gives a poitner-like interface to the "compressed" pointer. It allows accessing our "pointer" similarly to a real pointer:

sptr<int> p = new int;
*p = 5;
delete p;

// Conversions from/to pointers and pointer arithmetics work
int* q = new int;
p = q;
p++;
q = p;

Limitations

sptr has several limitations (which may make it inappropriate in many places).

First, sptr can't access the whole 64-bit address space (duh!). It is appropriate only for application requiring just a bit more than 32-bit address space (server machines are usually limited to 16GB-32GB of memory. Desktops usually can't deal with more than 8GB).

Second, although an address allocated on the heap will be aligned properly, pointer arithmetics may complicate things:

sptr<char> buf = new char[100]; 

will work just fine as new returns an aligned address.

However, something like:

for (sptr<char> p = buf; p != buf + 100; ++p)
   *p = '0';

will not work as p will now point to an unaligned address (the sptr implementation will assert at compile time).

However, in most cases a standard pointer can be used:

for (char* p = buf; p != buf + 100; ++p)
   *p = '0';   

Third, mix sptr and multiple-inheritence with caution.

For example, given:

struct A { char i; };
struct B { int j; };
struct C : public A, public B { }; 

This may or may not work (depending on structure alignement):

sptr<C> c = new C;
sptr<B> b = c;

This is because b does not point to the same address as c (this is not the place for a discussion on multiple-inheritence, but you can try the following code to convince yourself:

C* c = new C;
B* b = c; 
std::cout << ((uintptr_t)b - (uintptr_t)c) << std::endl; 

)

There is no problem accessing the object via a standard pointer:

sptr<C> c = new C;
B* b = c;  

Performance

A small benchmark is provided with the code, implementing a simple linked-list which is repeatedly traversed.

The list's node is:

struct node
{
   int val;
   node_ptr prev;
   node_ptr next;
};

The size of the node is therefore 20 bytes with standard pointers vs. 12 bytes with sptr.

However, this comes at a tradeoff - the extra bit shuffling impacts performance - the standard pointer version will perform 1.5x faster on the linked-list benchmark.

Accessing the map accounts for much of this performance hit. If it could be guaranteed that all addresses are allocated from the region 0x00 0000 0000 - 0x10 0000 0000 we could discard the map significantly improving performance.

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
GeneralUse with caution
TF_Productions
21:18 2 Sep '08  
Nice idea, but I'd advise caution using this technique since it's a prime candidate for causing hard-to-trace errors, especially if you're working in a team.

Also, 64 bit isn't that wasteful. Computers are fast enough and memory is inexpensive. But with 64 bit, your program is ready (or more ready, put carefully) for future challenges. So, reverting "back" to 32 seems short-sighted.

It's a nice idea, but not very useful - if I may be so bold. Nevertheless, a very good article!
GeneralRe: Use with caution
Stefan63
5:00 3 Sep '08  
I agree - using such a concept is not for the faint of heart. It may indeed save some storage - and as a result, some performance. But never even close to a factor of 2.

I remember the hassle about memory management when we still had to deal with the concepts of extended memory, with 24 bit pointers that indirectly pointed to 'real' addresses via a list of 'memory pages'. It was a total PITA and a constant source of errors. But *that* was what the system provided, and we had to deal with it.

Of course, with some compile time checking your solution will be less error prone, but those errors that the compiler does miss will be hard to find nevertheless.
AnswerRe: Use with caution
cppnow
5:04 10 Oct '08  
There is one important case where such a solution is worthwhile: when implementing an abstract data-structure (ADT).
A data-structure such as a binary-tree or a trie will contain many pointers which can amount to lots of memory.

Memory allocations for the data-structures can easily meet the alignment requirements (and the ADT author can easily enforce these requirements).

As for memory saved - the actual figure will of course be lower than 50%. As an actual example, let's look at a binary tree. If an element stored in the tree has a size S and we keep 3 pointers of size P per element (parent, left, right), then we need S+3*P per element. If we can halve the size of a pointer, we get: S+1.5*P, so we save: (S+1.5*P)/(S+3*P). For S=12 bytes and P=8 bytes (64 bits) we get a saving of 1/3. If S is smaller, this will of course get closer to the 1/2 limit.

Note that we save 12 bytes per element. Assuming the tree is very large (we only optimize when we have to - if the footprint of the application is small, don't optimize) - let's say 100 million entries, the saving amounts to 1.2GB (for the 12-byte elements above, this means the memory footprint drops from 3.6GB to 2.4GB).

We are only really interested in 64 bits and optimizing memory footprint for memory-hungry applications - e.g. when we really need those GBs.
GeneralRe: Use with caution
Stefan63
5:14 10 Oct '08  
Your memory analysis doesn't take into account the actual data stored in the tree! If it's only 4 bytes per node, then you'll be saving 25% instead of 33 in the best case. If it's more the reduction might be a lot less.

If your main reason for introducing such a pointer type is being able to run an application on a lower footprint you have to look at the total amount of memory used, not just the pointers that are part of your data structures!
GeneralRe: Use with caution
cppnow
7:16 10 Oct '08  
Stefan63 wrote:
our memory analysis doesn't take into account the actual data stored in the tree! If it's only 4 bytes per node, then you'll be saving 25% instead of 33 in the best case. If it's more the reduction might be a lot less.



It does take account - that's "S".
The smaller the data element you actually want to store in the tree - the greater the saving.

If you only need 4 bytes per node, then with 64 bit pointers, a tree node will take 28 bytes (3x8 + 4) while with 32 bit pointers a node is only 16 bytes (3x4 + 4). In addition, alignment will also play a part - in this case, alignment will probably mean the 64 bit node will actually take 32 bytes, so the saving (in this particular case) is in fact 50%. If the data stored is larger, the saving will be lower. For example, if the data element is 12 bytes long (let's say an int + "standard" pointer), then we have for 64 bit: 3x8+12 = 36 bytes, while with 32-bit pointers: 3x4+12 = 24, or 33% saving (8 and 16 byte alignment will give 40 vs 24 and 48 vs 32 respectively)

Stefan63 wrote:
If your main reason for introducing such a pointer type is being able to run an application on a lower footprint you have to look at the total amount of memory used, not just the pointers that are part of your data structures!


More often than not, there are only few big data-structures that take obscene amount of memory - optimize those structures and you save memory for the entire application - same as with optimizing performance. And of course - the percentage memory saving will be lower for the whole application - but it doesn't mean it's not worth the effort: if you have one piece of code which the profiler shows takes 50% of the CPU's cycles, and you can improve its performance by 50%, unless the effort is prohibitive, the 25% performance gain for the entire application is probably worth it. If the performance is still not satisfactory, you will just have to find other places to optimize.
GeneralRe: Use with caution
Stefan63
7:30 10 Oct '08  
Sorry, got the 'S' part at the start but dropped it's meaning half way through reading Wink

There's something else however that came to my mind: If I had a huge structure like the one you describe I'd probably do it quite differently, using objekt pools for the tree nodes and 4 byte index values to access the correct nodes within those pools. This would speed up memory allocation and deallocation by several orders of magnitude while maintaining the storage savings.

As a matter of fact I think just about any structure that requires boatloades of pointers in a 64 bit environment should probably be modelled using object pools. You won't even have to write your own, the boost library has it. (Although I believe at the sizes indicated above boost::pool doesn't scale very well. It might need some modification after all)
GeneralRe: Use with caution
cppnow
7:58 10 Oct '08  
From my (limited) experience, boost pool doesn't scale that well.
Of course, if you app's memory allocations have a specific pattern you may do better than the general purpose memory manager. That said, I'm not sure about MSVC's memory manager, but at least GNU's optimizes small allocations by creating special pools for each allocation size, so allocating 16 or 32 bytes is requires not much more than removing a block from the "free-list" and returning it. Not a lot of room for improvement.

As for index-based memory allocation - this is a good solution when applicable - e.g. if you know in advance the memory footprint. Otherwise, you may find yourself reallocating a huge array (not to mention loose pointers that may be pointing to it).

Note that it is quite easy to write an allocator wrapper that will allow you to use 32-bit pointers with standard containers (assuming those are well written, and use typedefs provided by allocator). You can then enforce the alignment requirement (actually, given malloc() returns aligned addresses anyway, this should be trivial).
GeneralRe: Use with caution
Stefan63
23:52 12 Oct '08  
cppnow wrote:
From my (limited) experience, boost pool doesn't scale that well.


That's what I said Wink IIRC the pool implementation is meant to allocate memory blocks at ever increasing sizes unto a certain limit - however, the limitation part appears to be broken and to my knowledge was never fixed so far. Someone else on the boost list should fix it, as the original implementer seems to have dropped maintenance.

To me it appeared quite easy to modify it so it doesn't indefinitely double the allocated blocks' sizes. Unfortunately I never had the time to properly test that modification although it appeared to work quite well for the tests I ran. I did it for only one type of pool only (fast_pool) and my knowledge isn't quite at the level of the orignal author, so I'm reluctant to jump in that gap.

cppnow wrote:
Otherwise, you may find yourself reallocating a huge array (not to mention loose pointers that may be pointing to it).


Erm, pools don't reallocate memory. Not at all! They just add new blocks of memory to the available pool, or release blocks that are no longer used. So you won't ever have to worry about pointer issues. Basically what you have is a list of memory blocks with a wrapper that simulates continuous memory.

Effectively a pool is a memory manager that specializes in allocating chunks of memory of a given size, instead of just byte size. The specialization consists of two parts:

1. minimizing the effort of allocating (and releasing) huge numbers of objects and

2. optimizing memory allocation to seamlessly fill memory block sizes.

The latter aspect is exactly what the above article appears to focus on, this is what gave me the idea to just go for pools instead.

Regarding your example - if you have large size data blocks in each node then your gain in memory for using 32 bit pointers will be negligible. If you have small data blocks however, then you'll run into another problem, which is the performance of allocating (and possibly deallocating) billions of minute chunks of memory one at a time. Try allocating, say, one million blocks of 32 byte each using new. Get yourself a coffee while your program is busy. Now do the same with one billion blocks... In such a case you absolutely will need something like pools!
GeneralRe: Use with caution
cppnow
6:20 13 Oct '08  
Stefan63 wrote:
Erm, pools don't reallocate memory. Not at all! They just add new blocks of memory to the available pool, or release blocks that are no longer used. So you won't ever have to worry about pointer issues. Basically what you have is a list of memory blocks with a wrapper that simulates continuous memory.


I know what pools are - and as I said in an earlier post - standard memory allocators implement pools for small objects.
What I meant was is that if you want to use index-based pointers (not pools), you are essentially stuck with an array you need to reallocate.


Stefan63 wrote:
2. optimizing memory allocation to seamlessly fill memory block sizes.

The latter aspect is exactly what the above article appears to focus on, this is what gave me the idea to just go for pools instead.


Pools are great when you need to allocate lots of objects of the same size - but once you extend the size of the pool by allocated another block from the heap, you can't use indexes as pointers.


Stefan63 wrote:
Regarding your example - if you have large size data blocks in each node then your gain in memory for using 32 bit pointers will be negligible.


Sure.

Stefan63 wrote:
If you have small data blocks however, then you'll run into another problem, which is the performance of allocating (and possibly deallocating) billions of minute chunks of memory one at a time. Try allocating, say, one million blocks of 32 byte each using new. Get yourself a coffee while your program is busy. Now do the same with one billion blocks... In such a case you absolutely will need something like pools!


Also true - which is why general purpose memory allocators do implement pools for allocating small objects (off topic: if you know the size of the allocated block, you can also avoid the overhead of a header at for the memory block - that's what the standard STL allocator does - at least in the GNU implementation).

However, regardless of the memory allocator you use - you still are left with a pointer that in some circumstances may bloat your app. Of course - if your applications doesn't use pointer-heavy structures and/or the actual elements are large compared to the pointers, the suggested solution doesn't apply.
GeneralBenchmarks?
Leo Davidson
14:10 25 Aug '08  
Have you done any testing or benchmarks to prove that doing this is better than using 64-bit pointers?

For example, the extra arithmetic on every pointer access could offset the time spent on cache misses due to the increased data size of 64-bit pointers.

I imagine your code will be faster for some types of usage and much slower for others, so benchmarks examining the pros and cons in different situations would seem essential, else how does anyone know when to use it and whether or not it really makes things better (or worse) in return for the added complexity?
GeneralRe: Benchmarks?
sunnyim4
5:20 26 Aug '08  
This depends on the specific application.
In general, you are right - the added arithmetic juggling will cost more than using a native 64-bit pointer.
However, in some applications the reduced memory footprint will make it worthwhile.
GeneralError
steveb
10:24 25 Aug '08  
Error on win32 process space. It is 4GB instead of 2GB. 2 in power of 32.
GeneralRe: Error
JeanLuc_
11:30 25 Aug '08  
It is not an error: the higher 2GB are reserved by/for the OS. You can increase to 3GB for processes with a boot setting. So in effect, WIN32 processes are limited to 2GB of memory space.
GeneralRe: Error
sunnyim4
11:49 25 Aug '08  
Not so:
http://msdn.microsoft.com/en-us/library/aa366912(VS.85).aspx[^]

Windows splits the virtual address space into 2 parts - user and kernel memory. The default split is 2GB for the user process and 2GB for the kernel (you can change boot.ini to give the kernel only 1GB of address space, leaving 3GB for the application)

Similarly, although 64-bit allows addressing up to 16 Exabyte (2^64 = 16,777,216 TB), Win64 will allow you to address only 8TB (maybe rename it to Win43?)


Last Updated 25 Aug 2008 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010