Click here to Skip to main content
Email Password   helpLost your password?

Introduction

In every good book that you read about the .NET framework or the .NET programming languages (C#, VB.NET), a special section is dedicated to strings (the System.String class). The reason for that is that this class has a special behaviour, and the better you understand how the .NET framework is handling it, the better your code will be.

Background

This article is targeting beginner and intermediate developers, but at the same time, advanced developers may find a few interesting things here.

1. Value Type vs. Reference Type

The String is a reference type since it inherits from System.Object. In order to assign a string literal to a System.String object, the following simplified syntax can be used:

String s = "Hello!";

As you know, this syntax is used to assign values to any of the value types from the .NET framework.

int i = 10;
decimal d = 5.25 

The syntax String s = new String ("Hello!"); does not work - actually, it does not even compile because the String constructor is not overloaded to accept a string literal. Instead, another way to create a String object is:

char[] chars = {'a','b','c'};
String s = new String(chars);
// it will create a String object which holds the literal "abc"

So, from the way the value is often assigned (first assignment), System.String can be mistakenly seen as a value type.

Another scenario is when System.String is a parameter type for any method. The value types in .NET framework can be passed by value (default behaviour) or by reference. On the other hand, reference types are passed always by reference (they don't require the ref keyword). In the code below, the parameter of the "UpdateValue" method is an integer (a value type).

static void Main(string[] args)
{
    int myValue = 10;
    UpdateValue(ref myValue);
    Console.WriteLine("My value: {0}", myValue);          
}

static void UpdateValue(ref int sValue)
{
    sValue = 20;
}

If we change the code by replacing the <int> value type with the System.String reference type, we will get the same end result: the myValue variable will be different - the "UpdatedValue" is displayed in the console (I'll explain in the next articles on this subject why this happens with strings):

static void Main(string[] args)
{
    String myValue = "CurrentValue";
    UpdateString(ref myValue);
    Console.WriteLine("My value: {0}", myValue);          
}
static void UpdateString(ref String sValue)
{
    sValue = "UpdatedValue";
}

If the "ref" keyword is removed from the method call and also from the method parameter definition, then the output will display the value before the call to the method: "CurrentValue". So, the question is why System.String is a reference type when it acts as a value type? The answer is a little more complex. Many modern programming languages (including Java, C#) consider String to be a primitive type, and therefore the compiler treats it as such. Again, why is String a reference type? Mainly because it inherits from System.Object, and therefore in memory, it exists under the Heap (the Heap is the area in memory where all the reference types are stored) and not under the Stack (this is where all value types go). Keep in mind though that System.String is a very special case of reference type.

2. String Interning Process

So, what happens behind the scene? The CLR (Common Language Runtime) internally creates a hash table (also called the "intern pool") where all the strings declared in an application or the ones that are programmatically added are kept. The behaviour of interning strings which are declared in the application is different from one version to another of the .NET framework, and therefore as a rule of thumb, do not trust that the declared strings are automatically interned (added to the internal hash table). However, you can trust that all the ones for which the String.Intern method (see below for a description of the behaviour) was used are in this hash table.

The key of the hash table is the string value, and the value is the reference to the String object(s). The way to add the strings programmatically to the hash table is by calling the method String.Intern(String s). By calling this method, CLR checks if there is an identical string already in the table. If an entry is found, then the method returns the reference to the existing string. If no entry is found, then CLR creates a copy of the string which is added to the internal hash table and the reference to the copy is returned. Another method related to the string interning process is String.IsInterned(String s). First of all, I need to mention that the return type of this method is not a boolean (Microsoft should have come up with a better naming for this method or for the returned type). The type that this method returns is a String, and the logic is the following: if the string that is passed in the method call exists in the internal hash table, then the method returns a reference to the String object that is interned already (the table already contains it). Otherwise, the method returns null, and here it's worth mentioning that the string is not added to the internal table if it does not exist.

There is also a difference between the String object references that exist in the Heap and the ones that are in the internal hash table (impacted by the interning process) in the way that not all the strings are in the internal hash table. Enough talk for now, and let's see the results when executing some code:

//declarative vs programmatically interning       

String s1 = "Hello world!";
String s2 = "Hello world!";
Console.WriteLine("Object references are equal: {0}", 
                  Object.ReferenceEquals(s1, s2));

String s3 = String.Intern("Hello world!");
String s4 = String.Intern("Hello world!");
Console.WriteLine("Object references are equal: {0}", 
                  Object.ReferenceEquals(s3, s4));

In the first case (strings s1 and s2), the .NET framework may add the strings to the intern pool. It depends on the framework version and also on the compile parameters/settings. Therefore, the result may be different depending on the platform and settings, and you should never write code that relies on it. In the second case, however, the result will always be the same: the ReferenceEquals method returns 'true' because the string literals are programmatically added to the intern pool. So far, we've seen that the System.String is a reference type, and in the second scenario, s3 and s4 are pointing to the same reference (memory address on the heap). The question that you may ask is: if I change the value of s4, then s3 will be changed as well? The answer is 'No', or to be more precise, 'Not Really', and in the next article, you will find out why.

3. Strings are Immutable

In the Object Oriented programming world, an immutable object is an object which cannot be modified once it is created. This behaviour of strings is what made the interning process possible. Having the strings immutable, a copy of the reference can be created instead of copying the entire object. Therefore, multiple objects can point to the same string literal. But, immutability does not mean that the memory where the object data (string literal) is stored is read-only. What it really means is that behind the scenes, the .NET framework makes sure that you cannot change the value of the string literal (or at least not when working with managed/safe code). Let's see what happens in the following code:

Line 1: String s1 = String.Intern("ABC");
Line 2: String s2 = String.Intern("ABC");
Line 3: s2 = s2.ToLower();

Line 1 adds the literal "ABC" to the intern pool, and returns the reference to the object s1. Line 2 tries to add the literal "ABC" to the intern pool, but in this case, aligned with the .NET documentation, "ABC" is not added since it already exists. In turn, the same reference is returned to the object s2. Until now, both of the objects point to the same string literal by pointing to the same reference. The very interesting part comes in Line 3. Here, the method 'toLower()' does the following: creates a new string literal and populates it with the value "abc". The reference to the string literal is then returned, and now the object s2 points to a new memory location. Note that by no means the memory location which holds the literal "ABC" was overwritten with the value "abc" in this case. Therefore, we are in the situation that s1 still points to "ABC" and now s2 points to "abc". This assumption is all good and valid when we are in the context of managed/safe code. If we deal with unmanaged code, we need to be very careful when we do operations with strings. As I mentioned above, the memory location where the string literal is stored is not read-only, and therefore it can be overwritten if we write code that does that. And with the unmanaged code, this can be achieved. Let’s see what happens in the below example:

static void Main(string[] args)
{
     String s1 = String.Intern("String cannot be changed");
     String s2 = String.Intern("String cannot be changed");

     int bufferLength = s1.Length;
     GetUserName(s1, ref bufferLength);
     Console.WriteLine("The second string: {0}",s2);
}

[DllImport("Advapi32", CharSet = CharSet.Unicode)]
static extern bool GetUserName(
[MarshalAs(UnmanagedType.LPWStr)] string userName, ref int bufferLength);

Running the above code on my computer, the following message was displayed in the console (Marius is the my NT username): "The second string: Marius cannot be changed". So, we declare s1 and s2, and we make sure that they point to the same literal by using the String.Intern(String s) method. Next, an unmanaged/unsafe piece of code is called: GetUserName from "Advapi32.dll" (you can follow the link for the MSDN description of the method). What happens during the method call is the interesting part: the method is passed one of the strings declared, and since the unmanaged code does not follow the rules of the managed code regarding the immutability of the strings, it writes the actual response at the memory location that s1 points to. But in the managed world, the s2 object also points to the same memory location, and therefore the content of the string literal is actually changed.

The source of this article can also be found here.

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
GeneralNice
Hans Dietrich
12:50 8 Mar '10  
This is a nice article, but CodeProject rules require that you post the source with the article, and not on some external site. Please add the zip to your article.
Best wishes,
Hans


[Hans Dietrich Software]

Generalinterning does not decrease memory usage?!
HalfHuman
0:19 22 Jan '09  
GC.Collect();
IList li = new List();
for (int i = 0; i < 1000000; i++)
li.Add(string.Intern("love is in the air")) vs li.Add("love is in the air")
GC.Collect();
li = null;
GC.Collect();

here is the code i used for testing. first version uses interning the second does not... or so i think.
i used beaks on every forecd collection. after the first collection i had around 2mb of ram in all heaps. at second i had about 10mb. at third the size returned to 2mb. as i see it the ram usage remained the same as if there is no difference. did i make any mistakes in my testing?

GeneralRe: interning does not decrease memory usage?!
Marius Serban
20:06 22 Jan '09  
I'm not really sure that I fully understand what you are trying to do here,but anyway: when an assembly is loaded all the strings that are defined within your application are interned by default which means that when your application is running the "love is in the air" is already interned. However please keep in mind that this behaviour can be turned off by marking your assembly with the System.Runtime.CompilerServices.CompilationRelaxationsAttribute attribute.

Also, different versions of the .NET framework have different rules for interning strings. Most likely in both versions the string was interned while the assembly was loaded and you should not see any difference.

To see the difference you can try to build the string at runtime: something like String.Intern("MyString" + k.toString()) where k may take a limited number of values (i.e. 0 to 9);

http://www.mariusserban.com

GeneralRe: interning does not decrease memory usage?!
HalfHuman
22:44 22 Jan '09  
GC.Collect();
string time = DateTime.Now.ToString();
IList li = new List();
for (int i = 0; i < 1000000; i++)
li.Add(string.IsInterned(time)); vs li.Add(time);
GC.Collect();
li = null;
GC.Collect();

this is the new version i tested. the compiler cannot know the string beforehand. the result are the same as previous. interning makes no difference. the result were in both cases 3m/7mb/3mb after each gc.coolect(). the IsInterned seems to make no difference as 1mil strings use about 4mb of ram.

GeneralRe: interning does not decrease memory usage?!
Marius Serban
18:15 24 Jan '09  
In the test you created the interning did not make any difference because on your .NET platform it is done regardless of the fact that the IsInterned() method is called or not.
You can run the following example as well:
    String time = DateTime.Now.ToString();
String s1 = time;
String s2 = time;
Console.WriteLine("Object references are equal: {0}",
Object.ReferenceEquals(s1, s2));

time = DateTime.Now.ToString();
String s3 = String.Intern(time);
String s4 = String.Intern(time);
Console.WriteLine("Object references are equal: {0}",
Object.ReferenceEquals(s3, s4));
In the majority of the cases the interning is done by default by .NET framework, but in order to be 100% positive the String.Intern() method should be used. In this way you will not get any unexpected results in terms of memory use when the same application is executed on different platforms or where a new .NET version comes up and there is a change in the default interning process.

http://www.mariusserban.com

GeneralRe: interning does not decrease memory usage?!
HalfHuman
23:05 22 Jan '09  
GC.Collect();
int max = 10/ 100 /1 000 000;
Random rd = new Random(1000000);
string time = DateTime.Now.ToString();
IList li = new List();
for (int i = 0; i < 1000000; i++)
li.Add(string.IsInterned(rd.Next(max).ToString("000000"))); vs li.Add(rd.Next(max).ToString("000000"));
GC.Collect();
li = null;
GC.Collect();

so in last post 1mil of string used 4megs of ram roughly. i analyzed a bit and it resulted about 4bytes/string so i guess they were interned. so i repeted the tests modifying the approach. this time i used a random number generator (pseudo random). i used seed 1000000 and repeated the test setting the max value to 10, 100 and 1mil to see how it behaved. what i did actually was to set 1 in 10 strings unique, 1 in 100 unique and 1 in 1mil unique so in 1mil case the strings generated shoud be unique practically.

without interning the results were:
max = 10 2mb/21mb/2mb => 19mb => 19bytes/string
max = 100 2mb/31mb/2mb => 29mb => 29bytes/string
max = 1mil 2mb/38mb/2mb => 37mb => 37bytes/string

with interning there results were
max = 10 2mb/8mb/2mb => 6mb => 6bytes/string
max = 100 2mb/8mb/2mb => 6mb => 6bytes/string
max = 1mil 2mb/8mb/2mb => 6mb => 6bytes/string

as you can see in the first scenario the different the more unique the strings wer the more ram they consumed. this is a bit strange since i expected them to use the same ram and not to grow.
in the second case, with interning, the ram usage remained constant. it grew from earlyer test from 4megs to 6megs but remained much less thean the method that did not use interning.

conclusion:
it seems that interning does make a difference but its' not very logical in result as far as i understand. in my first tests it seemed that one string costed only the spae for the refference (4byte == 32bits). in latest tests the memory usage grew but not in a way that i understand. maybe the author can shed some light on this matter.

GeneralRe: interning does not decrease memory usage?!
Marius Serban
18:37 24 Jan '09  
I think your test is a little wrong as "String.IsInterned() method does not do interning. IsIntern() takes a string and looks it up in the internal hash table. If a matching entry is found then the method returns the reference to the string otherwise it returns null (if you debug your example you will the list li filled with nulls and not with strings).

Using String.Intern() does not make any difference if you have 1.000.000 unique strings in your application. It makes a big difference when you have 1.000.000 references to strings and 10/100/1000 unique strings among the 1.000.000 references to strings. Therefore you will see some difference in the memory usage when, in your example, the max variable is set to 10 or 100. When it is set to 1.000.000 you should not see much difference as this is likely to have generated 1.000.000 unique strings.

http://www.mariusserban.com

GeneralRe: interning does not decrease memory usage?!
HalfHuman
23:48 26 Jan '09  
GC.Collect();
int uniqueness = 100;
Random rd = new Random(100);
string time = DateTime.Now.ToString();
IList li = new List();
for (int i = 0; i < 1000000; i++)
{
string row = rd.Next(uniqueness).ToString("000000");
li.Add(string.Intern(row)); or li.Add(row);
}
GC.Collect();

as one can see i redid the tests. this time as a precaution i killed the aspnet_wp.exe on each run. i know one will say that it's not very ok to test it using asp.net but this was the quickest method.
i used beaks after each collection and noted the result. the results were measured using "Performance" app from windows. here it goes.

no interning
10 => 37mb
100 => 37mb
1mil => 38mb

interning
10 => 4-7mb
100 => 6-8mb
1mil => 21-22mb

on the run without interning the memory usage was somehow stable. on the interning side i got varying result even if a i killed the iis worker process each time. the measurements might not be very precise since i'm using it in asp.net but one can see that the advantage is there. don;t understand why there is a difference between the 1mil run with interning and without. even if the strings are pretty much unique the interning still does some good. should repeat the testing using some console app but i consider the point made. interning is usefull.

Generalexcellent article
Donsw
12:59 18 Jan '09  
Good work, I leanred the intern method and more of string handling. Big Grin
GeneralString.Copy and interning
supercat9
7:43 15 Jan '09  
Is it guaranteed that if a string is generated programmatically without an explicit request to intern it, that the allocated object will not get used by any other string even if the contents happen to match?

For example, if I had a function that was supposed to return a string if it worked, but return an indication if it failed (assuming the "failure" was sufficiently expected that an exception would be inappropriate), and if an empty string would be a legitimate result, would it be reasonable to do something like this:
Public Shared ErrorCondition1 as New String("?"c,0)
Public Shared ErrorCondition2 as New String("?"c,0)

Sub Whatever
st = MyFunction() ' Returns a legitmate string or an ErrorCondition one.
'
Should return String.Empty if the result is legitimately a zero-length string.

If String.IsNullOrEmpty(st) And st IsNot String.Empty Then
If st is ErrorCondition1 Then
.. Handle error condition 1 Else If st is ErrorCondition2 Then
.. Handle error condition 1 Else
.. It's some other error condition
End If
End If
End Sub
If the function always returns String.Empty for any 'legitimate' zero-length result, would code like the above be guaranteed to work (without the framework itself getting creative and deciding to map the zero-length strings to the same object)?
GeneralRe: String.Copy and interning [modified]
Marius Serban
18:48 24 Jan '09  
supercat9 wrote:
Is it guaranteed that if a string is generated programmatically without an explicit request to intern it, that the allocated object will not get used by any other string even if the contents happen to match?

No, it is not guaranteed and you should never rely on it when creating any logic around it. If you can run the below example under .NET 3.5 you will see that in both cases the object references are equal.

String time = DateTime.Now.ToString();
String s1 = time;
String s2 = time;
Console.WriteLine("Object references are equal: {0}",
Object.ReferenceEquals(s1, s2));

time = DateTime.Now.ToString();
String s3 = String.Intern(time);
String s4 = String.Intern(time);
Console.WriteLine("Object references are equal: {0}",
Object.ReferenceEquals(s3, s4));

http://www.mariusserban.com
modified on Saturday, January 24, 2009 11:58 PM

GeneralRe: String.Copy and interning
supercat9
20:31 24 Jan '09  
I would expect s1 and s2 to be equal, and likewise s3 and s4. My question would be about situations like:
Function Weird(Num as Integer) As Boolean
  string st1 = "3" & Num.ToString
string st2 = (Num*11).ToString
Return st1 is st2
End Function

Would it be possible for st1 and st2 to report identical if e.g. Num equals 3?
GeneralRe: String.Copy and interning
Marius Serban
20:52 24 Jan '09  
First of all this is a "weird" function. The question here is: you want to return True when the string literals are the same (meaning "abc" = "abc") or when Object.ReferenceEquals() returns true.

The function mentioned may or may not return True depending on the way CLR gets "creative". To be on the safe side you should use String.Intern() in this case.
Example:

 class Program
{
static void Main(string[] args)
{
Console.WriteLine("Object references are equal: {0}", Weird(3));

}

static bool Weird(int number)
{
String s1 = String.Intern("3" + number.ToString());
String s2 = String.Intern((11 * number).ToString());
return Object.ReferenceEquals(s1, s2);
}
}


http://www.mariusserban.com

GeneralRe: String.Copy and interning
supercat9
9:21 25 Jan '09  
That particular function was a little weird, though I gave it as an instance where the compiler would clearly not be able to predict that strings might be equal. Even returning to the original code I gave, though, is there any circumstance in which something like:
Function Test1 As Boolean
  Dim S1 as String = New String("?"c,0)
Dim S2 as String = New String("?"c,0)
Return S1 is S2 ' Test for reference equality
End Function
could ever return true, or that S1 should ever be "reference-equals" of any other string other than those to which it copied? If some other part of the code generates a null string via some means other than by explicitly duplicating the reference to S1, is it possible that it will be given the object referred to by S1?
GeneralThoughts
PIEBALDconsult
6:23 15 Jan '09  
"The String is a reference type since it inherits from System.Object"

All types derive from System.Object; are they all reference types?


What have you included that isn't in the documentation and "every good book"?
GeneralRe: Thoughts
Marius Serban
7:05 15 Jan '09  
Please see my answers below:

PIEBALDconsult wrote:
All types derive from System.Object; are they all reference types?

This is exactly what I said as well, but unlike all the other reference types the System.String has a special handling in .NET mainly for optimization purposes (a similar handling is done also in Java with java.lang.String).

PIEBALDconsult wrote:
What have you included that isn't in the documentation and "every good book"?

Some parts of this article may be found in 'every good book', but not everything. I tried to put together an article that explains the behaviour which I read in multiple books, articles and from my own experience.

http://www.mariusserban.com

GeneralRe: Thoughts
John Brett
5:27 24 Feb '09  
System.ValueType does not derive from System.Object.
The CLR has a mechanism known as boxing that allows it to store a value type in an object, and retrieve it later. This is not the same as ValueTypes being objects, however.
GeneralRe: Thoughts
PIEBALDconsult
6:37 24 Feb '09  
John Brett wrote:
System.ValueType does not derive from System.Object.

This from the documentation[^]:
"Both reference and value types are derived from the ultimate base class Object. "
GeneralGood job
Adrian Dorache
5:59 15 Jan '09  
Once I have read this article .net strings will never be the same.
GeneralGood article + String modifications
Pavel Pawlowski
4:07 15 Jan '09  
Good article for beginners and intermediate

For beginners it could be good to mention, that due to strings are immutable, then any modification to them by any operator creates new instance of the string and the original string is than collected by garbage collector.

Then is important to use StringBuilder when doing multiple modifications of string as string builder allocates a memory buffer and then does all the modifications in this buffer. (only in case the string after modification do not fit into the current buffer int allocates new buffer with some reserve and copies the original buffer to the new one). the old buffer is then again collected by garbage collector.

Also when we know we will fill the StringBuilder with particular string it is a good idea to create it with enough initial capacity to avoid buffers reallocation during string construction.
GeneralRe: Good article + String modifications
Marius Serban
7:12 15 Jan '09  
Thank you Pavel for your feedback. You are bringing a good point and initially my intention was to include a fourth section called "String operations" in which I would mention the StringBuilder class and the benefits of using it over the String class. I will edit the article soon.

http://www.mariusserban.com

GeneralRe: Good article + String modifications
supercat9
7:29 15 Jan '09  
Pavel Pawlowski wrote:
For beginners it could be good to mention, that due to strings are immutable, then any modification to them by any operator creates new instance of the string and the original string is than collected by garbage collector.

This is also why in vb.net one is allowed to say mid(st,5,1)="?", but one cannot say st.chars(4)="?"c. The former operation actually results in vb creating a new string which is similar to the original, but with the fifth character modified; the latter would attempt to modify the original string except that it's forbidden from doing so.


Last Updated 15 Jan 2009 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010