C++/CLI in Action - Using interior and pinning pointers

Nish Nishant

4.93/5 (38 votes)

Feb 28, 2007

Ms-PL

21 min read

84758

Excerpt from Chapter 4 on interior and pinning pointers

Title	C++/CLI in Action
Author	Nishant Sivakumar
Publisher	Manning
Published	March 2007
ISBN-10	1-932394-81-8
ISBN-13	978-1932394818
Price	USD 44.99

This is a chapter excerpt from C++/CLI in Action authored by Nishant Sivakumar and published by Manning Publications. The content has been reformatted for CodeProject and may differ in layout from the printed book and the e-book.

4.1 Using interior and pinning pointers

You can't use native pointers with CLI objects on the managed heap. That is like trying to write Hindi text using the English alphabet—they're two different languages with entirely different alphabets. Native pointers are essentially variables that hold memory address locations. They point to a memory location rather than to a specific object. When we say a pointer points to an object, we essentially mean that a specific object is at that particular memory location.

This approach won't work with CLI objects because managed objects in the CLR heap don't remain at the same location for the entire period of their lifetime. Figure 4.1 shows a diagrammatic view of this problem. The Garbage Collector (GC) moves objects around during garbage-collection and heap-compaction cycles. A native pointer that points to a CLI object becomes garbage once the object has been relocated. By then, it's pointing to random memory. If an attempt is made to write to that memory, and that memory is now used by some other object, you end up corrupting the heap and possibly crashing your application.

C++/CLI provides two kinds of pointers that work around this problem. The first kind is called an interior pointer, which is updated by the runtime to reflect the new location of the object that's pointed to every time the object is relocated. The physical address pointed to by the interior pointer never remains the same, but it always points to the same object. The other kind is called a pinning pointer, which prevents the GC from relocating the object; in other words, it pins the object to a specific physical location in the CLR heap. With some restrictions, conversions are possible between interior, pinning, and native pointers.

Pointers by nature aren't safe, because they allow you to directly manipulate memory. For that reason, using pointers affects the type-safety and verifiability of your code. I strongly urge you to refrain from using CLI pointers in pure-managed applications (those compiled with /clr:safe or /clr:pure) and to use them strictly to make interop calls more convenient.

4.1.1 Interior pointers

An interior pointer is a pointer to a managed object or a member of a managed object that is updated automatically to accommodate for garbage-collection cycles that may result in the pointed-to object being relocated on the CLR heap. You may wonder how that's different from a managed handle or a tracking reference; the difference is that the interior pointer exhibits pointer semantics, and you can perform pointer operations such as pointer arithmetic on it. Although this isn't an exact analogy, think of it like a cell phone. People can call you on your cell phone (which is analogous to an interior pointer) wherever you are, because your number goes with you—the mobile network is constantly updated so that your location is always known. They wouldn't be able to do that with a landline (which is analogous to a native pointer), because a landline's physical location is fixed.

Interior pointer declarations use the same template-like syntax that is used for CLI arrays, as shown here:

interior_ptr< type > var = [address];

Listing 4.1 shows how an interior pointer gets updated when the object it points to is relocated.

ref struct CData
{
    int age;
};

int main()
{
    for(int i=0; i<100000; i++) // ((1))
        gcnew CData();
        
    CData^ d = gcnew CData();
    d->age = 100;
    
    interior_ptr<int> pint = &d->age; // ((2))
    
    printf("%p %d\r\n",pint,*pint);
    
    for(int i=0; i<100000; i++) // ((3))
        gcnew CData();
        
    printf("%p %d\r\n",pint,*pint); // ((4))
    return 0;
}

Listing 4.1 Code that shows how an interior pointer is updated by the CLR

In the sample code, you create 100,000 orphan CData objects ((1)) so that you can fill up a good portion of the CLR heap. You then create a CData object that's stored in a variable and ((2)) an interior pointer to the int member age of this CData object. You then print out the pointer address as well as the int value that is pointed to. Now, ((3)) you create another 100,000 orphan CData objects; somewhere along the line, a garbage-collection cycle occurs (the orphan objects created earlier ((1)) get collected because they aren't referenced anywhere). Note that you don't use a GC::Collect call because that's not guaranteed to force a garbage-collection cycle. As you've already seen in the discussion of the garbage-collection algorithm in the previous chapter, the GC frees up space by removing the orphan objects so that it can do further allocations. At the end of the code (by which time a garbage collection has occurred), you again ((4)) print out the pointer address and the value of age. This is the output I got on my machine (note that the addresses will vary from machine to machine, so your output values won't be the same):

012CB4C8 100
012A13D0 100

As you can see, the address pointed to by the interior pointer has changed. Had this been a native pointer, it would have continued to point to the old address, which may now belong to some other data variable or may contain random data. Thus, using a native pointer to point to a managed object is a disastrous thing to attempt. The compiler won't let you do that: You can't assign the address of a CLI object to a native pointer, and you also can't convert from an interior pointer to a native pointer.

Passing by reference

Assume that you need to write a function that accepts an integer (by reference) and changes that integer using some predefined rule. Here's what such a function looks like when you use an interior pointer as the pass-by-reference argument:

void ChangeNumber(interior_ptr<int> num, int constant)
{
    *num += constant * *num;
}

And here's how you call the function:

CData^ d = gcnew CData();
d->age = 7;
interior_ptr<int> pint = &d->age;
ChangeNumber(pint, 3);
Console::WriteLine(d->age); // outputs 28

Because you pass an interior pointer, the original variable (the age member of the CData object) gets changed. Of course, for this specific scenario, you may as well have used a tracking reference as the first argument of the ChangeNumber function; but one advantage of using an interior pointer is that you can also pass a native pointer to the function, because a native pointer implicitly converts to an interior pointer (although the reverse isn't allowed). The following code works:

int number = 8;
ChangeNumber(&number, 3); // ((1)) Pass native pointer to function
Console::WriteLine(number); // outputs 32

It's imperative that you remember this. You can pass a native pointer to function that expects an interior pointer as you do here ((1)), because there is an implicit conversion from the interior pointer to the native pointer. But you can't pass an interior pointer to a native pointer; if you try that, you'll get a compiler error. Because native pointers convert to interior pointers, you should be aware that an interior pointer need not necessarily always point to the CLR heap: If it contains a converted native pointer, it's then pointing to the native C++ heap. Next, you'll see how interior pointers can be used in pointer arithmetic (something that can't be done with a tracking reference).

Pointer arithmetic

Interior pointers (like native pointers) support pointer arithmetic; thus, you may want to optimize a performance-sensitive piece of code by using direct pointer arithmetic on some data. Here's an example of a function that uses pointer arithmetic on an interior pointer to quickly sum the contents of an array of ints:

int SumArray(array<int>^% intarr)
{
    int sum = 0;
    interior_ptr<int> p = &intarr[0]; // ((1)) Get interior pointer to array

    while(p != &intarr[0]+ intarr->Length) // ((2)) Iterate through array 
        sum += *p++;                       

    return sum;
}

In this code, p is an interior pointer to the array ((1)) (the address of the first element of the array is also the address of the array). You don't need to worry about the GC relocating the array in the CLR heap. You iterate through the array by using the ++ operator on the interior pointer ((2)), and you add each element to the variable sum as you do so. This way, you avoid the overhead of going through the System::Array interface to access each array element.

It's not just arrays that can be manipulated using an interior pointer. Here's another example of using an interior pointer to manipulate the contents of a System::String object:

StString^ str = "Nish wrote this book for Manning Publishing";
interior_ptr<Char> ptxt = const_cast< interior_ptr<Char> >(
    PtrToStringChars(str)); // ((1))
interior_ptr<Char> ptxtorig = ptxt; // ((2))
while((*ptxt++)++); // ((3))
Console::WriteLine(str); // ((4))
while((*ptxtorig++)--); // ((5))
Console::WriteLine(str); // ((6))

You use the PtrToStringChars helper function ((1)) to get an interior pointer to the underlying string buffer of a System::Stringobject. The PtrToStringChars function is a helper function declared in <vcclr.h> that returns a const interior pointer to the first character of a System::String. Because it returns a const interior pointer, you have to use const_cast to convert it to a non-const pointer. You go through the string using a while-loop ((3)) that increments the pointer as well as each character until a nullptr is encountered, because the underlying buffer of a String object is always nullptr-terminated. Next, when you use Console::WriteLine on the String object ((4)), you can see that the string has changed to:

Ojti!xspuf!uijt!cppl!gps!Nboojoh!Qvcmjtijoh

You've achieved encryption! (Just kidding.) Because you saved the original pointer in ptxtorig ((2)), you can use it to convert the string back to its original form using another while loop. The second while loop ((5)) increments the pointer but decrements each character until it reaches the end of the string (determined by the nullptr). Now, ((6)) when you do a Console::WriteLine, you get the original string:

Nish wrote this book for Manning Publishing

A dangerous side-effect of using interior pointers to manipulate
String objects

The CLR performs something called string interning on managed strings, so that multiple variables or literal occurrences of the same textual string always refer to a single instance of the System::Stringobject. This is possible because System::Stringis immutable—the moment you change one of those variables, you change the reference, which now refers to a new String object (quite possibly another interned string). All this is fine as long as the strings are immutable. But when you use an interior or pinning pointer to directly access and change the underlying character array, you break the immutability of String objects. Here's some code that demonstrates what can go wrong:

String^ s1 = "Nishant Sivakumar";
String^ s2 = "Nishant Sivakumar";

interior_ptr<Char> p1 = const_cast<interior_ptr<Char> >(
    PtrToStringChars(s1)); // Get a pointer to s1
while(*p1) // Change s1 through pointer p1
    (*p1++) = 'X';

Console::WriteLine("s1 = {0}\r\ns2 = {1}",s1,s2);

The output of that is as follows:

s1 = XXXXXXXXXXXXXXXXX
s2 = XXXXXXXXXXXXXXXXX

You only changed one string, but both strings are changed. If you don't understand what's happening, this can be incredibly puzzling. You have two String handle variables, s1 and s2, both containing the same string literal. You get an interior pointer p1 to the string s1 and change each character in s1 to X (basically blanking out the string with the character X). Common logic would say that you have changed the string s1, and that's that. But because of string interning, s1 and s2 were both handles to the same String object on the CLR heap. When you change the underlying buffer of the string s1 through the interior pointer, you change the interned string. This means any string handle to that String object now points to an entirely different string (the X-string in this case). The output of the Console::WriteLine should now make sense to you.

In this case, figuring out the problem was easy, because both string handles were in the same block of code, but the CLR performs string interning across application domains. This means changing an interned string can result in extremely hard-to-debug errors in totally disconnected parts of your application. My recommendation is to try to avoid directly changing a string through a pointer, except when you're sure you won't cause havoc in other parts of the code. Note that it's safe to read a string through a pointer; it's only dangerous when you change it, because you break the "strings are immutable" rule of the CLR. Alternatively, you can use the String::IsInterned function to determine if a specific string is interned, and change it only if it isn't an interned string.

Whenever you use an interior pointer, it's represented as a managed pointer in the generated MSIL. To distinguish it from a reference (which is also represented as a managed pointer in IL), a modopt of type IsExplicitlyDereferenced is emitted by the compiler. A modopt is an optional modifier that can be applied to a type's signature. Another interesting point in connection with interior pointers is that the this pointer of an instance of a value type is a non-const interior pointer to the type. Look at the value class shown here, which obtains an interior pointer to the class by assigning it to the this pointer:

value class V
{    
    void Func()
    {
        interior_ptr<V> pV1 = this;
        //V* pV2 = this; <-- this won't compile
    }
};

As is obvious, in a value class, if you need to get a pointer to this, you should use an interior pointer, because the compiler won't allow you to use a native pointer. If you specifically need a native pointer to a value object that's on the managed heap, you have to pin the object using a pinning pointer and then assign it to the native pointer. We haven't discussed pinning pointers yet, but that's what we'll talk about in the next section.

4.1.2 Pinning pointers

As we discussed in the previous section, the GC moves CLI objects around the CLR heap during garbage-collection cycles and during heap-compaction operations. Native pointers don't work with CLI objects, for reasons previously mentioned. This is why we have interior pointers, which are self-adjusting pointers that update themselves to always refer to the same object, irrespective of where the object is located in the CLR heap. Although this is convenient when you need pointer access to CLI objects, it only works from managed code. If you need to pass a pointer to a CLI object to a native function (which runs outside the CLR), you can't pass an interior pointer, because the native function doesn't know what an interior pointer is, and an interior pointer can't convert to a native pointer. That's where pinning pointers come into play.

A pinning pointer pins a CLI object on the CLR heap; as long as the pinning pointer is alive (meaning it hasn't gone out of scope), the object remains pinned. The GC knows about pinned objects and won't relocate pinned objects. To continue the phone analogy, imagine a pinned pointer as being similar to your being forced to remain stationary (analogous to being pinned). Although you have a cell phone, your location is fixed; it's almost as if you had a fixed landline.

Because pinned objects don't move around, it's legal to convert a pinned pointer to a native pointer that can be passed to the native caller that's running outside the control of the CLR. The word pinning or pinned is a good choice; try to visualize an object that's pinned to a memory address, just like you pin a sticky note to your cubicle's side-board.

The syntax used for a pinning pointer is similar to that used for an interior pointer:

pin_ptr< type > var = [address];

The duration of pinning is the lifetime of the pinning pointer. As long as the pinning pointer is in scope and pointing to an object, that object remains pinned. If the pinning pointer is set to nullptr, then the object isn't pinned any longer; or if the pinning pointer is set to another object, the new object becomes pinned and the previous object isn't pinned any more.

Listing 4.2 demonstrates the difference between interior and pinning pointers. To simulate a real-world scenario within a short code snippet, I used for loops to create a large number of objects to bring the GC into play.

for(int i=0; i<100000; i++)
    gcnew CData(); // Fill portion of CLR Heap

CData^ d1 = gcnew CData(); // ((1))
for(int i=0; i<1000; i++)
    gcnew CData();
CData^ d2 = gcnew CData();

interior_ptr<int> intptr = &d1->age; // ((2))
pin_ptr<int> pinptr = &d2->age; // ((3))

printf("intptr=%p pinptr=%p\r\n", // Display pointer addresses before GC
    intptr, pinptr);

for(int i=0; i<100000; i++) // ((4))
    gcnew CData();

printf("intptr=%p pinptr=%p\r\n",
    intptr, pinptr); // Display pointer addresses after GC

Listing 4.2 Code that compares an interior pointer with a pinning pointer

In the code, you create two CData objects with a gap in between them ((1)) and associate one of them with an interior pointer to the age member of the first object ((2)). The other is associated with a pinning pointer to the age member of the second object ((3)). By creating a large number of orphan objects, you force a garbage-collection cycle ((4)) (again, note that calling GC::Collectmay not always force a garbage-collection cycle; you need to fill up a generation before a garbage-collection cycle will occur). The output I got was

intptr=012CB4C8 pinptr=012CE3B4
intptr=012A13D0 pinptr=012CE3B4

Your pointer addresses will be different, but after the garbage-collection cycle, you'll find that the address held by the pinned pointer (pinptr) has not changed, although the interior pointer (intptr) has changed. This is because the CLR and the GC see that the object is pinned and leave it alone (meaning it doesn't get relocated on the CLR heap). This is why you can pass a pinned pointer to native code (because you know that it won't be moved around).

Passing to native code

The fact that a pinning pointer always points to the same object (because the object is in a pinned state) allows the compiler to provide an implicit conversion from a pinning pointer to a native pointer. Thus, you can pass a pinning pointer to any native function that expects a native pointer, provided the pointers are of the same type. Obviously, you can't pass a pinning pointer to a float to a function expecting a native pointer to a char. Look at the following native function that accepts a wchar_t* and returns the number of vowels in the string pointed to by the wchar_t*:

#pragma unmanaged
int NativeCountVowels(wchar_t* pString)
{
    int count = 0;
    const wchar_t* vowarr = L"aeiouAEIOU";
    while(*pString)
        if(wcschr(vowarr,*pString++))
            count++;
    return count;
}
#pragma managed

#pragma managed/unmanaged

These are #pragma compiler directives that give you function-level control for compiling functions as managed or unmanaged. If you specify that a function is to be compiled as unmanaged, native code is generated, and the code is executed outside the CLR. If you specify a function as managed (which is the default), MSIL is generated, and the code executes within the CLR. Note that if you have an unmanaged function that you've marked as unmanaged, you should remember to re-enable managed compilation at the end of the function

Here's how you pass a pointer to a CLI object, after first pinning it, to the native function just defined:

String^ s = "Most people don't know that the CLR is written in C++";
pin_ptr<Char> p = const_cast< interior_ptr<Char> >(
    PtrToStringChars(s));
Console::WriteLine(NativeCountVowels(p));

PtrToStringChars returns a const interior pointer, which you cast to a non-const interior pointer; this is implicitly converted to a pinning pointer. You pass this pinning pointer, which implicitly converts to a native pointer, to the NativeCountVowels function. The ability to pass a pinning pointer to a function that expects a native pointer is extremely handy in mixed-mode programming, because it gives you an easy mechanism to pass pointers to objects on the CLR heap to native functions. Figure 4.2 illustrates the various pointer conversions that are available.

As you can see in the figure, the only pointer conversion that is illegal is that from an interior pointer to a native pointer; every other conversion is allowed and implicitly done. You have seen how pinning pointers make it convenient for you to pass pointers to CLI objects to unmanaged code. I now have to warn you that pinning pointers should be used only when they're necessary, because tactless usage of pinning pointers results in what is called the heap fragmentation problem.

The heap fragmentation problem

Objects are always allocated sequentially in the CLR heap. Whenever a garbage collection occurs, orphan objects are removed, and the heap is compacted so it won't remain in a fragmented condition. (We covered this in the previous chapter when we discussed the multigenerational garbage-collection algorithm used by the CLR.) Let's assume that memory is allocated from a simple heap that looks like figures 4.3 through 4.6. Of course, this is a simplistic representation of the CLR's GC-based memory model, which involves a more complex algorithm. But the basic principle behind the heap fragmentation issue remains the same, and thus this simpler model will suffice for the present discussion. Figure 4.3 depicts the status of the heap before a garbage-collection cycle occurs.

There are presently three objects in the heap. Assume that Obj2 (with the gray shaded background) is an orphan object, which means it will be cleaned up during the next garbage-collection cycle. Figure 4.4 shows what the heap looks like after the garbage-collection cycle.

The orphan object has been removed and a heap compaction has been performed, so Obj1 and Obj3 are now next to each other. The idea is to maximize the free space available in the heap and to put that free space in a single contiguous block of memory. Figure 4.5 shows what the heap would look like if there was a pinned object during the garbage-collection cycle.

Assume that Obj3 is a pinned object (the circle represents the pinning). Because the GC won't move pinned objects, Obj3 remains where it was. This results in fragmentation because the space between Obj1 and Obj2 cannot be added to the large continuous free block of memory. In this particular case, it's just a small gap that would have contained only a single object, and thus isn't a major issue. Now, assume that several pinned objects exist on the CLR heap when the garbage-collection cycle occurs. Figure 4.6 shows what happens in such a situation.

None of those pinned objects can be relocated. This means the compaction process can't be effectively implemented. When there are several such pinned objects, the heap is severely fragmented, resulting in slower and less efficient memory allocation for new objects. This is the case because the GC has to try that much harder to find a block that's large enough to fit the requested object. Sometimes, although the total free space is bigger than the requested memory, the fact that there is no single continuous block of memory large enough to hold that object results in an unnecessary garbage-collection cycle or a memory exception. Obviously, this isn't an efficient scenario, and it's why you have to be extremely cautious when you use pinning pointers.

Recommendations for using pinning pointers

Now that you've seen where pinning pointers can be handy and where they can be a little dodgy, I'm going to give you some general tips on effectively using pinning pointers.

Unless you absolutely have to, don't use a pinning pointer! Whenever you think you need to use a pinning pointer, see if an interior pointer or a tracking reference may be a better option. If an interior pointer is acceptable as an alternative, chances are good that this is an improper place for using a pinning pointer.
If you need to pin multiple objects, try to allocate those objects together so that they're in an adjacent area in the CLR heap. That way, when you pin them, those pinned objects will be in a contiguous area of the heap. This reduces fragmentation compared to their being spread around the heap.
When making a call into native code, check to see if the CLR marshalling layer (or the target native code) does any pinning for you. If it does, you don't need to pin your object before passing it, because you'd be writing unnecessary (though harmless) code by adding an extra pinning pointer to the pinned object (which doesn't do anything to the pinned state of the object).
Newly allocated objects are put into Generation-0 of the CLR heap. You know that garbage-collection cycles happen most frequently in the Generation-0 heap. Consequently, you should try to avoid pinning recently allocated objects; chances are that a garbage-collection cycle will occur while the object is still pinned.
Reduce the lifetime of a pinning pointer. The longer it stays in scope, the longer the object it points to remains pinned and the greater the chances of heap fragmentation. For instance, if you need a pinning pointer inside an if block, declare it inside the if block so the pinning ends when the if block exits.
Whenever you pass a pinning pointer to a native pointer, you have to ensure that the native pointer is used only if the pinning pointer is still alive. If the pinning pointer goes out of scope, the object becomes unpinned. Now it can be moved around by the GC. Once that happens, the native pointer is pointing to some random location on the CLR heap. I've heard the term GC hole used to refer to such a scenario, and it can be a tough debugging problem. Although it may sound like an unlikely contingency, think of what may happen if a native function that accepts a native pointer stores this pointer for later use. The caller code may have passed a pinning pointer to this function. Once the function has returned, the pinning will quickly stop, because the original pinning pointer won't be alive much longer. However, the saved pointer may be used later by some other function in the native code, which may result in some disastrous conditions (because the location the pointer points to may contain some other object now or even be free space). The best you can do is to know what the native code is going to do with a pointer before you pass a pinning pointer to it. That way, if you see that there is the risk of a GC hole, you avoid calling that function and try to find an alternate solution.

Note that these are general guidelines and not hard rules to be blindly followed at all times. It's good to have some basic strategies and to understand the exact consequences of what happens when you inappropriately use pinning pointers. Eventually, you have to evaluate your coding scenario and use your judgment to decide on the best course.