It seems like there is always some confusion concerning string
s in .NET. This is both from developers who are new to the Framework and those that have been working with it for quite some time.
String
s in the .NET Framework are represented by the System.String class, which encapsulates the data manipulation, sorting, and searching methods you most commonly perform on string
data.
In the .NET Framework, you can use System.String (which is the actual type name or the language alias (for C#, string
). They are equivalent so use whichever naming convention you prefer but be consistent. Common usage (and my preference) is to use the language alias (string
) when referring to the data type and String (the actual type name) when accessing the static
members of the class.
Many mainstream programming languages (like C and C++) treat string
s as a null
terminated array of characters. The .NET Framework, however, treats string
s as an immutable sequence of Unicode characters which cannot be modified after it has been created. Because string
s are immutable, all operations which modify the string
contents are actually creating new string
instances and returning those. They never modify the original string
data.
There is one important word in the preceding paragraph which many people tend to miss: sequence. In .NET, string
s are treated as a sequence…in fact, they are treated as an enumerable sequence. This can be verified if you look at the class declaration for System.String, as seen below:
public sealed class String : IEnumerable,
IComparable, IComparable<string>, IEquatable<string>
The first interface that String
implements is IEnumerable, which has the following definition:
public interface IEnumerable
{
IEnumerator GetEnumerator();
}
As a side note, System.Array also implements IEnumerable. Why is that important to know? Simply put, it means that any operation you can perform on an array can also be performed on a string
. This allows you to write code such as the following:
string s = "The quick brown fox";
foreach (var c in s)
{
System.Diagnostics.Debug.WriteLine(c);
}
for (int i = 0; i < s.Length; i++)
{
System.Diagnostics.Debug.WriteLine(s[i]);
}
If you executed those lines of code in a running application, you would see the following output in the Visual Studio Output window:
In the case of a string
, these enumerable or array operations return a char
(System.Char) rather than a string
. That might lead you to believe that you can get around the string
immutability restriction by simply treating string
s as an array and assigning a new character to a specific index location inside the string
, like this:
string s = "The quick brown fox";
s[2] = 'a';
However, if you were to write such code, the compiler will promptly tell you that you can’t do it:
This preserves the notion that string
s are immutable and cannot be changed once they are created. (Incidentally, there is no built in way to replace a single character like this. It can be done but it would require converting the string
to a character array, changing the appropriate indexed location, and then creating a new string
.)