Click here to Skip to main content
Click here to Skip to main content

32-bit pointers in a 64-bit world

, 24 Aug 2008 CPOL
Rate this:
Please Sign up or sign in to vote.
64 bit pointers are wasteful if you don't need to access TBs of RAM

Introduction

64 bit is wonderful - we are no longer limited to just 2GB (per Win32 process) - you can now address a full 8TB of memory (per Win64 process)... which is great if your application really needs 8TB of memory.

If you don't need the full address space, 64-bit can be wasteful: pointers take twice the memory, which may actually result in a slower application, especially when using a pointer-laden data structure such as a binary tree (larger structures take more cache space).

A simple scheme is presented that allows using 32-bit "pointers" to access up to 64GB of memory on a 64-bit machine.

Encoding 64-bit pointers in 32-bits

The main idea of sptr is data alignment - modern processors feel much more comfortable when accessing "aligned" addresses: accessing a char at address 0xC0001 means more work than accessing an int at 0xC0000.

malloc() (which does the heavy lifting for new) only returns aligned addresses (see http://msdn.microsoft.com/en-us/library/ycsb6wwf.aspx) - a pointer returned by malloc() will always be aligned to a 16-byte boundary on Visual Studio - the 4 lowest bits will always be 0.

Therefore, at least for pointers we allocate on the heap, we can just "delete" the 4 lowest bits (by shifting right) without losing information.

If all memory addresses were allocated from the "bottom" (the highest bits are also 0), we could just encode a pointer as follows:

#include <stddef.h>  // for uintptr_t
typedef unsigned __int32 uint32_t;

uint32_t encode(void* p)
{
  uintptr_t i = reinterpret_cast<uintptr_t>(p) >> 4;
  return static_cast<uint32_t>(i);
}

(uintptr_t is an integral data type that is guaranteed to be the same size as a pointer.)

This will encode any (16-byte aligned) address in the range 0x00 0000 0000 - 0x10 0000 0000 into a 32-bit integer.

However, the Operating System is free to map physical memory into other virtual memory segments - for example, mapping kernel memory "at the top", so it is preferable to keep such a map ourselves:

#include <boost/thread/mutex.hpp>
#include <boost/thread/locks.hpp>

class sptr_base
{   
protected:
  static const uint32_t ALIGNMENT_BITS = 4;
  static const uint32_t ALIGNMENT = (1<<ALIGNMENT_BITS);
  static const uintptr_t ALIGNMENT_MASK = ALIGNMENT - 1;

protected:
  static uintptr_t     _seg_map[ALIGNMENT];
  static uintptr_t     _segs;
  static boost::mutex  _m;
  inline static uintptr_t ptr2seg(uintptr_t p)
  {
    p &= ~0xFFFFFFFFULL; // Keep only high part
    uintptr_t s = _segs;
    uintptr_t i = 0;
    for (; i < s; ++i)
      if (_seg_map[i] == p)
    return i;

    // Not found - now we do it the "right" way (mutex and all)
    boost::lock_guard<boost::mutex> lock(_m);
    for (i = 0; i < s; ++i)
      if (_seg_map[i] == p)
        return i;

    i = _segs++;
    if (_segs > ALIGNMENT)
      throw bad_segment("Segment out of range");

    _seg_map[i] = p;
    return i;
  }
};

ptr2seg() maps the high bits of a pointer to one of a few segments. The code refrains from using a mutex when reading the map, based on the atomicity of integer assignment.

The actual pointer encode/decode is then as follows:

#include <boost/pool/detail/singleton.hpp>
typedef boost::details::pool::singleton_default<segment_mapper> the_segment_mapper;

uint32_t encode(const_native_pointer_type ptr)
{
  uintptr_t p = reinterpret_cast<uintptr_t>(ptr);
  if ((p & ALIGNMENT_MASK) != 0)
    throw bad_alignment("Pointer is not aligned");
  return (uint32_t)(ptr2seg(p) + p);
}

inline native_pointer_type decode(uint32_t e)
{
   uintptr_t el = e;
   uintptr_t ptr = (_seg_map[el & ALIGNMENT_MASK] + el) & ~ALIGNMENT_MASK;
   return reinterpret_cast<native_pointer_type>(ptr);
}

sptr

sptr gives a pointer-like interface to the "compressed" pointer. It allows accessing our "pointer" similar to a real pointer:

sptr<int> p = new int;
*p = 5;
delete p;

// Conversions from/to pointers and pointer arithmetics work
int* q = new int;
p = q;
p++;
q = p;

Limitations

sptr has several limitations (which may make it inappropriate in many places).

First, sptr can't access the whole 64-bit address space (duh!). It is appropriate only for applications requiring just a bit more than the 32-bit address space (server machines are usually limited to 16GB-32GB of memory; Desktops usually can't deal with more than 8GB).

Second, although an address allocated on the heap will be aligned properly, pointer arithmetic may complicate things:

sptr<char> buf = new char[100];

will work just fine as new returns an aligned address.

However, something like:

for (sptr<char> p = buf; p != buf + 100; ++p)
   *p = '0';

will not work as p will now point to an unaligned address (the sptr implementation will assert at compile time).

However, in most cases, a standard pointer can be used:

for (char* p = buf; p != buf + 100; ++p)
   *p = '0';

Third, mix sptr and multiple-inheritence with caution.

For example, given:

struct A { char i; };
struct B { int j; };
struct C : public A, public B { };

This may or may not work (depending on structure alignment):

sptr<C> c = new C;
sptr<B> b = c;

This is because b does not point to the same address as c (this is not the place for a discussion on multiple-inheritance, but you can try the following code to convince yourself):

C* c = new C;
B* b = c; 
std::cout << ((uintptr_t)b - (uintptr_t)c) << std::endl;

There is no problem accessing the object via a standard pointer:

sptr<C> c = new C;
B* b = c;

Performance

A small benchmark is provided with the code, implementing a simple linked-list which is repeatedly traversed.

The list's node is:

struct node
{
   int val;
   node_ptr prev;
   node_ptr next;
};

The size of the node is therefore 20 bytes with standard pointers vs. 12 bytes with sptr.

However, this comes at a tradeoff - the extra bit shuffling impacts performance - the standard pointer version will perform 1.5x faster on the linked-list benchmark.

Accessing the map accounts for much of this performance hit. If it could be guaranteed that all addresses are allocated from the region 0x00 0000 0000 - 0x10 0000 0000, we could discard the map, significantly improving performance.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

cppnow

United States United States
No Biography provided

Comments and Discussions

 
QuestionGood solution, would add just1 improvement PinmemberJCems16-Jan-15 8:04 
GeneralUse with caution PinmemberTF_Productions2-Sep-08 21:18 
GeneralRe: Use with caution PinmemberStefan633-Sep-08 5:00 
AnswerRe: Use with caution Pinmembercppnow10-Oct-08 5:04 
GeneralRe: Use with caution PinmemberStefan6310-Oct-08 5:14 
GeneralRe: Use with caution Pinmembercppnow10-Oct-08 7:16 
GeneralRe: Use with caution PinmemberStefan6310-Oct-08 7:30 
GeneralRe: Use with caution Pinmembercppnow10-Oct-08 7:58 
GeneralRe: Use with caution PinmemberStefan6312-Oct-08 23:52 
GeneralRe: Use with caution Pinmembercppnow13-Oct-08 6:20 
QuestionBenchmarks? PinmemberLeo Davidson25-Aug-08 14:10 
AnswerRe: Benchmarks? Pinmembersunnyim426-Aug-08 5:20 
GeneralError Pinmembersteveb25-Aug-08 10:24 
GeneralRe: Error PinmemberJeanLuc_25-Aug-08 11:30 
GeneralRe: Error Pinmembersunnyim425-Aug-08 11:49 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.150224.1 | Last Updated 25 Aug 2008
Article Copyright 2008 by cppnow
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid