C++/CLI in Action - Using interior and pinning pointers






4.93/5 (38 votes)
Excerpt from Chapter 4 on interior and pinning pointers
![]() |
|
This is a chapter excerpt from C++/CLI in Action authored by Nishant Sivakumar and published by Manning Publications. The content has been reformatted for CodeProject and may differ in layout from the printed book and the e-book.
4.1 Using interior and pinning pointers
You can't use native pointers with CLI objects on the managed heap. That is like trying to write Hindi text using the English alphabet—they're two different languages with entirely different alphabets. Native pointers are essentially variables that hold memory address locations. They point to a memory location rather than to a specific object. When we say a pointer points to an object, we essentially mean that a specific object is at that particular memory location.
This approach won't work with CLI objects because managed objects in the CLR heap don't remain at the same location for the entire period of their lifetime. Figure 4.1 shows a diagrammatic view of this problem. The Garbage Collector (GC) moves objects around during garbage-collection and heap-compaction cycles. A native pointer that points to a CLI object becomes garbage once the object has been relocated. By then, it's pointing to random memory. If an attempt is made to write to that memory, and that memory is now used by some other object, you end up corrupting the heap and possibly crashing your application.
C++/CLI provides two kinds of pointers that work around this problem. The first kind is called an interior pointer, which is updated by the runtime to reflect the new location of the object that's pointed to every time the object is relocated. The physical address pointed to by the interior pointer never remains the same, but it always points to the same object. The other kind is called a pinning pointer, which prevents the GC from relocating the object; in other words, it pins the object to a specific physical location in the CLR heap. With some restrictions, conversions are possible between interior, pinning, and native pointers.
Pointers by nature aren't safe, because they allow you to directly manipulate
memory. For that reason, using pointers affects the type-safety and
verifiability of your code. I strongly urge you to refrain from using CLI
pointers in pure-managed applications (those compiled with /clr:safe
or
/clr:pure
)
and to use them strictly to make interop calls more convenient.
4.1.1 Interior pointers
An interior pointer is a pointer to a managed object or a member of a managed object that is updated automatically to accommodate for garbage-collection cycles that may result in the pointed-to object being relocated on the CLR heap. You may wonder how that's different from a managed handle or a tracking reference; the difference is that the interior pointer exhibits pointer semantics, and you can perform pointer operations such as pointer arithmetic on it. Although this isn't an exact analogy, think of it like a cell phone. People can call you on your cell phone (which is analogous to an interior pointer) wherever you are, because your number goes with you—the mobile network is constantly updated so that your location is always known. They wouldn't be able to do that with a landline (which is analogous to a native pointer), because a landline's physical location is fixed.
Interior pointer declarations use the same template-like syntax that is used for CLI arrays, as shown here:
interior_ptr< type > var = [address];
Listing 4.1 shows how an interior pointer gets updated when the object it points to is relocated.
ref struct CData
{
int age;
};
int main()
{
for(int i=0; i<100000; i++) // ((1))
gcnew CData();
CData^ d = gcnew CData();
d->age = 100;
interior_ptr<int> pint = &d->age; // ((2))
printf("%p %d\r\n",pint,*pint);
for(int i=0; i<100000; i++) // ((3))
gcnew CData();
printf("%p %d\r\n",pint,*pint); // ((4))
return 0;
}
Listing 4.1 Code that shows how an interior pointer is updated by the CLR
In the sample code, you create 100,000 orphan CData
objects
((1)) so that you can
fill up a good portion of the CLR heap. You then create a CData
object that's
stored in a variable and ((2)) an interior pointer to the int
member
age
of this CData
object. You then print out the pointer address as well as the
int
value
that is pointed to. Now, ((3)) you create another 100,000 orphan CData
objects;
somewhere along the line, a garbage-collection cycle occurs (the orphan objects
created earlier ((1)) get collected because they aren't referenced anywhere). Note
that you don't use a GC::Collect
call because that's not guaranteed to force a
garbage-collection cycle. As you've already seen in the discussion of the
garbage-collection algorithm in the previous chapter, the GC frees up space by
removing the orphan objects so that it can do further allocations. At the end of
the code (by which time a garbage collection has occurred), you again ((4)) print
out the pointer address and the value of age
. This is the output I got on my
machine (note that the addresses will vary from machine to machine, so your
output values won't be the same):
012CB4C8 100
012A13D0 100
As you can see, the address pointed to by the interior pointer has changed. Had this been a native pointer, it would have continued to point to the old address, which may now belong to some other data variable or may contain random data. Thus, using a native pointer to point to a managed object is a disastrous thing to attempt. The compiler won't let you do that: You can't assign the address of a CLI object to a native pointer, and you also can't convert from an interior pointer to a native pointer.
Passing by reference
Assume that you need to write a function that accepts an integer (by reference) and changes that integer using some predefined rule. Here's what such a function looks like when you use an interior pointer as the pass-by-reference argument:
void ChangeNumber(interior_ptr<int> num, int constant)
{
*num += constant * *num;
}
And here's how you call the function:
CData^ d = gcnew CData();
d->age = 7;
interior_ptr<int> pint = &d->age;
ChangeNumber(pint, 3);
Console::WriteLine(d->age); // outputs 28
Because you pass an interior pointer, the original variable (the age member
of the CData
object) gets changed. Of course, for this specific
scenario, you may as well have used a tracking reference as the first argument
of the ChangeNumber
function; but one advantage of using an
interior pointer is that you can also pass a native pointer to the function,
because a native pointer implicitly converts to an interior pointer (although
the reverse isn't allowed). The following code works:
int number = 8;
ChangeNumber(&number, 3); // ((1)) Pass native pointer to function
Console::WriteLine(number); // outputs 32
It's imperative that you remember this. You can pass a native pointer to function that expects an interior pointer as you do here ((1)), because there is an implicit conversion from the interior pointer to the native pointer. But you can't pass an interior pointer to a native pointer; if you try that, you'll get a compiler error. Because native pointers convert to interior pointers, you should be aware that an interior pointer need not necessarily always point to the CLR heap: If it contains a converted native pointer, it's then pointing to the native C++ heap. Next, you'll see how interior pointers can be used in pointer arithmetic (something that can't be done with a tracking reference).
Pointer arithmetic
Interior pointers (like native pointers) support pointer arithmetic; thus,
you may want to optimize a performance-sensitive piece of code by using direct
pointer arithmetic on some data. Here's an example of a function that uses
pointer arithmetic on an interior pointer to quickly sum the contents of an
array of int
s:
int SumArray(array<int>^% intarr)
{
int sum = 0;
interior_ptr<int> p = &intarr[0]; // ((1)) Get interior pointer to array
while(p != &intarr[0]+ intarr->Length) // ((2)) Iterate through array
sum += *p++;
return sum;
}
In this code, p
is an interior pointer to the array ((1))
(the address of the first element of the array is also the address of the
array). You don't need to worry about the GC relocating the array in the CLR
heap. You iterate through the array by using the ++ operator
on the
interior pointer ((2)), and you add each element to the variable
sum
as you do so. This way, you avoid the overhead of going through the
System::Array
interface to access each array element.
It's
not just arrays that can be manipulated using an interior pointer. Here's
another example of using an interior pointer to manipulate the contents of a
System::String
object:
StString^ str = "Nish wrote this book for Manning Publishing";
interior_ptr<Char> ptxt = const_cast< interior_ptr<Char> >(
PtrToStringChars(str)); // ((1))
interior_ptr<Char> ptxtorig = ptxt; // ((2))
while((*ptxt++)++); // ((3))
Console::WriteLine(str); // ((4))
while((*ptxtorig++)--); // ((5))
Console::WriteLine(str); // ((6))
You use the PtrToStringChars
helper function ((1)) to get
an interior pointer to the underlying string buffer of a System::String
object. The PtrToStringChars
function is a helper function
declared in <vcclr.h> that returns a const
interior pointer
to the first character of a System::String
. Because it returns a
const
interior pointer, you have to use const_cast
to
convert it to a non-const
pointer. You go through the string using
a while
-loop ((3)) that increments the pointer as well as
each character until a nullptr
is encountered, because the
underlying buffer of a String
object is always nullptr
-terminated.
Next, when you use Console::WriteLine
on the String
object ((4)), you can see that the string has changed to:
Ojti!xspuf!uijt!cppl!gps!Nboojoh!Qvcmjtijoh
You've achieved encryption! (Just kidding.) Because you saved the original
pointer in ptxtorig
((2)), you can use it to convert the
string back to its original form using another while
loop. The second while
loop ((5)) increments the pointer but decrements each character until it
reaches the end of the string (determined by the nullptr
). Now,
((6)) when you do a Console::WriteLine
, you get the original string:
Nish wrote this book for Manning Publishing
A dangerous side-effect of using interior pointers to manipulate
|
Whenever you use an interior pointer, it's represented as a managed pointer
in the generated MSIL. To distinguish it from a reference (which is also
represented as a managed pointer in IL), a modopt
of type
IsExplicitlyDereferenced
is emitted by the compiler. A modopt
is an optional
modifier that can be applied to a type's signature. Another interesting point in
connection with interior pointers is that the this
pointer of an instance of a
value
type is a non-const
interior pointer to the type. Look at the
value
class
shown here, which obtains an interior pointer to the class by assigning it to
the this
pointer:
value class V
{
void Func()
{
interior_ptr<V> pV1 = this;
//V* pV2 = this; <-- this won't compile
}
};
As is obvious, in a value
class, if you need to get a pointer to
this
, you should use an interior pointer, because the compiler
won't allow you to use a native pointer. If you specifically need a native
pointer to a value
object that's on the managed heap, you have to
pin the object using a pinning pointer and then assign it to the native pointer.
We haven't discussed pinning pointers yet, but that's what we'll talk about in
the next section.
4.1.2 Pinning pointers
As we discussed in the previous section, the GC moves CLI objects around the CLR heap during garbage-collection cycles and during heap-compaction operations. Native pointers don't work with CLI objects, for reasons previously mentioned. This is why we have interior pointers, which are self-adjusting pointers that update themselves to always refer to the same object, irrespective of where the object is located in the CLR heap. Although this is convenient when you need pointer access to CLI objects, it only works from managed code. If you need to pass a pointer to a CLI object to a native function (which runs outside the CLR), you can't pass an interior pointer, because the native function doesn't know what an interior pointer is, and an interior pointer can't convert to a native pointer. That's where pinning pointers come into play.
A pinning pointer pins a CLI object on the CLR heap; as long as the pinning pointer is alive (meaning it hasn't gone out of scope), the object remains pinned. The GC knows about pinned objects and won't relocate pinned objects. To continue the phone analogy, imagine a pinned pointer as being similar to your being forced to remain stationary (analogous to being pinned). Although you have a cell phone, your location is fixed; it's almost as if you had a fixed landline.
Because pinned objects don't move around, it's legal to convert a pinned pointer to a native pointer that can be passed to the native caller that's running outside the control of the CLR. The word pinning or pinned is a good choice; try to visualize an object that's pinned to a memory address, just like you pin a sticky note to your cubicle's side-board.
The syntax used for a pinning pointer is similar to that used for an interior pointer:
pin_ptr< type > var = [address];
The duration of pinning is the lifetime of the pinning pointer. As long as
the pinning pointer is in scope and pointing to an object, that object remains
pinned. If the pinning pointer is set to nullptr
, then the object
isn't pinned any longer; or if the pinning pointer is set to another object, the
new object becomes pinned and the previous object isn't pinned any more.
Listing 4.2 demonstrates the difference between interior and pinning
pointers. To simulate a real-world scenario within a short code snippet, I used
for
loops to create a large number of objects to bring the GC into
play.
for(int i=0; i<100000; i++)
gcnew CData(); // Fill portion of CLR Heap
CData^ d1 = gcnew CData(); // ((1))
for(int i=0; i<1000; i++)
gcnew CData();
CData^ d2 = gcnew CData();
interior_ptr<int> intptr = &d1->age; // ((2))
pin_ptr<int> pinptr = &d2->age; // ((3))
printf("intptr=%p pinptr=%p\r\n", // Display pointer addresses before GC
intptr, pinptr);
for(int i=0; i<100000; i++) // ((4))
gcnew CData();
printf("intptr=%p pinptr=%p\r\n",
intptr, pinptr); // Display pointer addresses after GC
Listing 4.2 Code that compares an interior pointer with a pinning pointer
In the code, you create two CData
objects with a gap in between
them ((1)) and associate one of them with an interior pointer to the
age
member of the first object ((2)). The other is
associated with a pinning pointer to the age
member of the second
object ((3)). By creating a large number of orphan objects, you force a
garbage-collection cycle ((4)) (again, note that calling
GC::Collect
may not always force a garbage-collection cycle; you need to
fill up a generation before a garbage-collection cycle will occur). The output I
got was
intptr=012CB4C8 pinptr=012CE3B4
intptr=012A13D0 pinptr=012CE3B4
Your pointer addresses will be different, but after the garbage-collection
cycle, you'll find that the address held by the pinned pointer (pinptr
)
has not changed, although the interior pointer (intptr
) has
changed. This is because the CLR and the GC see that the object is pinned and
leave it alone (meaning it doesn't get relocated on the CLR heap). This is why
you can pass a pinned pointer to native code (because you know that it won't be
moved around).
Passing to native code
The fact that a pinning pointer always points to the same object (because the
object is in a pinned state) allows the compiler to provide an implicit
conversion from a pinning pointer to a native pointer. Thus, you can pass a
pinning pointer to any native function that expects a native pointer, provided
the pointers are of the same type. Obviously, you can't pass a pinning pointer
to a float
to a function expecting a native pointer to a char
.
Look at the following native function that accepts a wchar_t*
and
returns the number of vowels in the string pointed to by the wchar_t*
:
#pragma unmanaged
int NativeCountVowels(wchar_t* pString)
{
int count = 0;
const wchar_t* vowarr = L"aeiouAEIOU";
while(*pString)
if(wcschr(vowarr,*pString++))
count++;
return count;
}
#pragma managed
#pragma managed/unmanagedThese are |
Here's how you pass a pointer to a CLI object, after first pinning it, to the native function just defined:
String^ s = "Most people don't know that the CLR is written in C++";
pin_ptr<Char> p = const_cast< interior_ptr<Char> >(
PtrToStringChars(s));
Console::WriteLine(NativeCountVowels(p));
PtrToStringChars
returns a const
interior pointer,
which you cast to a non-const
interior pointer; this is implicitly
converted to a pinning pointer. You pass this pinning pointer, which implicitly
converts to a native pointer, to the NativeCountVowels
function.
The ability to pass a pinning pointer to a function that expects a native
pointer is extremely handy in mixed-mode programming, because it gives you an
easy mechanism to pass pointers to objects on the CLR heap to native functions.
Figure 4.2 illustrates the various pointer conversions that are available.
As you can see in the figure, the only pointer conversion that is illegal is that from an interior pointer to a native pointer; every other conversion is allowed and implicitly done. You have seen how pinning pointers make it convenient for you to pass pointers to CLI objects to unmanaged code. I now have to warn you that pinning pointers should be used only when they're necessary, because tactless usage of pinning pointers results in what is called the heap fragmentation problem.
The heap fragmentation problem
Objects are always allocated sequentially in the CLR heap. Whenever a garbage collection occurs, orphan objects are removed, and the heap is compacted so it won't remain in a fragmented condition. (We covered this in the previous chapter when we discussed the multigenerational garbage-collection algorithm used by the CLR.) Let's assume that memory is allocated from a simple heap that looks like figures 4.3 through 4.6. Of course, this is a simplistic representation of the CLR's GC-based memory model, which involves a more complex algorithm. But the basic principle behind the heap fragmentation issue remains the same, and thus this simpler model will suffice for the present discussion. Figure 4.3 depicts the status of the heap before a garbage-collection cycle occurs.
There are presently three objects in the heap. Assume that Obj2
(with the gray shaded background) is an orphan object, which means it will be
cleaned up during the next garbage-collection cycle. Figure 4.4 shows what the
heap looks like after the garbage-collection cycle.
The orphan object has been removed and a heap compaction has been performed,
so Obj1
and Obj3
are now next to each other. The idea
is to maximize the free space available in the heap and to put that free space
in a single contiguous block of memory. Figure 4.5 shows what the heap would
look like if there was a pinned object during the garbage-collection cycle.
Assume that Obj3
is a pinned object (the circle represents the
pinning). Because the GC won't move pinned objects, Obj3
remains
where it was. This results in fragmentation because the space between Obj1
and Obj2
cannot be added to the large continuous free block of
memory. In this particular case, it's just a small gap that would have contained
only a single object, and thus isn't a major issue. Now, assume that several
pinned objects exist on the CLR heap when the garbage-collection cycle occurs.
Figure 4.6 shows what happens in such a situation.
None of those pinned objects can be relocated. This means the compaction process can't be effectively implemented. When there are several such pinned objects, the heap is severely fragmented, resulting in slower and less efficient memory allocation for new objects. This is the case because the GC has to try that much harder to find a block that's large enough to fit the requested object. Sometimes, although the total free space is bigger than the requested memory, the fact that there is no single continuous block of memory large enough to hold that object results in an unnecessary garbage-collection cycle or a memory exception. Obviously, this isn't an efficient scenario, and it's why you have to be extremely cautious when you use pinning pointers.
Recommendations for using pinning pointers
Now that you've seen where pinning pointers can be handy and where they can be a little dodgy, I'm going to give you some general tips on effectively using pinning pointers.
Unless you absolutely have to, don't use a pinning pointer! Whenever you think you need to use a pinning pointer, see if an interior pointer or a tracking reference may be a better option. If an interior pointer is acceptable as an alternative, chances are good that this is an improper place for using a pinning pointer.
If you need to pin multiple objects, try to allocate those objects together so that they're in an adjacent area in the CLR heap. That way, when you pin them, those pinned objects will be in a contiguous area of the heap. This reduces fragmentation compared to their being spread around the heap.
When making a call into native code, check to see if the CLR marshalling layer (or the target native code) does any pinning for you. If it does, you don't need to pin your object before passing it, because you'd be writing unnecessary (though harmless) code by adding an extra pinning pointer to the pinned object (which doesn't do anything to the pinned state of the object).
Newly allocated objects are put into Generation-0 of the CLR heap. You know that garbage-collection cycles happen most frequently in the Generation-0 heap. Consequently, you should try to avoid pinning recently allocated objects; chances are that a garbage-collection cycle will occur while the object is still pinned.
Reduce the lifetime of a pinning pointer. The longer it stays in scope, the longer the object it points to remains pinned and the greater the chances of heap fragmentation. For instance, if you need a pinning pointer inside an
if
block, declare it inside theif
block so the pinning ends when theif
block exits.Whenever you pass a pinning pointer to a native pointer, you have to ensure that the native pointer is used only if the pinning pointer is still alive. If the pinning pointer goes out of scope, the object becomes unpinned. Now it can be moved around by the GC. Once that happens, the native pointer is pointing to some random location on the CLR heap. I've heard the term GC hole used to refer to such a scenario, and it can be a tough debugging problem. Although it may sound like an unlikely contingency, think of what may happen if a native function that accepts a native pointer stores this pointer for later use. The caller code may have passed a pinning pointer to this function. Once the function has returned, the pinning will quickly stop, because the original pinning pointer won't be alive much longer. However, the saved pointer may be used later by some other function in the native code, which may result in some disastrous conditions (because the location the pointer points to may contain some other object now or even be free space). The best you can do is to know what the native code is going to do with a pointer before you pass a pinning pointer to it. That way, if you see that there is the risk of a GC hole, you avoid calling that function and try to find an alternate solution.
Note that these are general guidelines and not hard rules to be blindly followed at all times. It's good to have some basic strategies and to understand the exact consequences of what happens when you inappropriately use pinning pointers. Eventually, you have to evaluate your coding scenario and use your judgment to decide on the best course.