Introduction
C#, Java, Python and other programming languages have an immutable string class. Why not C++? Immutable value objects have demonstrated many advantages (in languages that foster them). The problem is that in C++ you cannot put an immutable object into a std::container
or call a someobj.set(string)
function without operator=
, a mutating function. Other languages seemingly don't face this problem because they conceive strings as reference objects whereas in C++ they only make sense as value objects. On the other hand, in order to be usable, immutable objects need mutable(!) references. So, from a conceptual point of view, the difference between immutable string classes in C#, Java, ... and the (almost immutable) string class(es) for C++ I present here is not as big as it seems at first sight.
Background
Motivation
More often than not, you need not change a string once it has been created. It seems reasonable to design a string class that is 'cheap' to copy and assign but 'expensive' to modify. This is what the immutable string classes in C#, Java, and other languages aim at. Consequently, calling someobj.set(string)
or someobj.get()
functions, inserting strings into a container, sorting, replacing strings, ... can be done without ever requiring a 'deep' copy of the string contents.
In general, this kind of a string class is useful when the string changes rarely but is copied frequently. For heavily changing strings, C# and Java provide a mutable 'StringBuilder
' companion class.
Three 'Prototypical' String Classes
Three prototypical (imaginary) C++ string classes can be distinguished:
- 'StringBuilder': mutable string class for strings with frequently changing contents.
- 'FixString': immutable string, no changes after construction (like in C#
String
).
- 'AutoBuffer': array of characters on the stack without dynamic allocation.
In C++, the current std::string
[5] implementations typically combine two of the above prototypical approaches, a compromise that hardly is optimal or even appropriate for all cases.
Comparison of fix_str with other string implementations
| std:string [5] (VC 6.0) | std::string [5] (VC 7.1) | CString (MFC) | fix_str |
sizeof | 16 | 28 | 4 | 4 |
copy / assignment method | reference-counted + COW[1] | deep copy + SSO[2] | reference-counted + COW[1] | reference-counted |
copy / assignment speed | fast | fast or slow[3] | fast | fast |
default constructor | fast | fast | fast | fast |
constructor for length > 0 | slow | fast or slow[3] | slow | slow |
usable in multi-threaded environments | ? | yes | ? | yes/no[4] |
thread safe for concurrent write | no | no | no | no |
mutable | yes | yes | yes | assignable but otherwise immutable |
- COW: Copy-On-Write
- SSO: Small String Optimization; the string contains a buffer (16 byte in VC 7.1) for small strings and allocates memory on the heap only for larger strings.
- Fast with SSO for strings <= 15
char
or <= 7 wchar_t
(UNICODE), respectively.
- Different classes for single- and multi-threaded environments (see below).
- '
std::string
' is a typedef
of template<class charT, class traits = char_traits<charT>, class Allocator = allocator<charT> > class basic_string
.
fix_str
fix_str basics
fix_str
is a (set of) very lightweight string class(es).
- implemented deliberately as classes, not as a template, and without namespaces.
- designed as a value type.
- default constructor, copy constructor and
operator=
are always 'cheap'.
- the contents of a
fix_str
object cannot be changed except by assignment.
Using the Code
Examples:
fix_str fs ("Hello", " ", "world", "!");
fix_str fs2 (fs, " and again ", fs);
fix_str fs4;
fix_str fs5 (fs);
fs4 = fs5;
size_t pos = fs.find ("world");
pos = fs.rfind ("Hell");
long h = fs.hash_code();
if (fs == fs2) { ... }
if (fs2 > fs) { ... }
fs = fix_str::sub_str (fs, 5);
fs = fix_str::trim (fs);
fs = fix_str::pad_front (fs, 9, '.');
fs = fix_str::value_of (123);
Four Types of fix_str Functions
You can distinguish four groups of fix_str
functions:
- default constructor, copy constructor, assignment operator: these functions do not allocate heap memory, have exception specification
throw()
.
- constructors for 1 - 8 arguments (
fix_str
s or character strings).
- non-static member functions like
find()
, rfind()
, hash_code()
and (friend) operators ==
, !=
, <
, >
, <=
, >=
; (exception specification throw()
).
- static member functions:
sub_str()
, duplicate()
, trim_front()
, trim_back()
, trim()
, pad_front()
, pad_back()
, value_of()
; these create a new fix_str
object and therefore allocate heap memory (of course, e.g. trim_front()
only creates a new object if a trim is necessary, otherwise it just returns the input).
One design goal for fix_str
is to clearly separate 'expensive' and 'cheap' functions. You always know the cost of each function call when you write it. There are no hidden, but sometimes expensive, 'optimizations' behind your back.
Points of Interest
Unicode and Multi-Threading
Why four fix_str
classes?
There are different classes for:
- ASCII (
char
) and UNICODE (wchar_t
) strings (similar to Win32-API functions).
- Single- and multi-threaded environments.
Strictly speaking, having different classes for single- and multi-threaded environments indicates that an implementation detail (reference-counting) shows up in the class interface.
The fix_str
variants:
| char (ASCII) | wchar_t (UNICODE) |
Single-Threaded Environment | fix_str_as | fix_str_ws |
Multi-Threaded Environment | fix_str_am | fix_str_wm |
About 'Thread-Safety'
The term "thread safety" is sometimes used with unclear or ambiguous meaning, especially in C++. One must always ask: 'Thread safe in what respect?'. I don't call fix_str
classes 'thread safe'. Some are usable in multi-threaded environments.
fix_str Classes for Multi-Threaded Environments
fix_str
objects which are used in different threads may share the same internal representation and hence the same reference-counter (because they are copies of each other). In this case, atomic increment and decrement of the reference-counter must be assured internally by the implementation. This is what the fix_str
classes for multi-threaded environments, fix_str_am
and fix_str_wm
, guarantee. As a rule of thumb, take these classes when you use copies of the same object in different threads.
But it is never safe for two or more threads to concurrently write to (assign to) the same fix_str
object (remember, assignment is the only way to change any fix_str
object). Concurrent writes to the same object must always be protected by the user.
fix_str Classes for Single-Threaded Environments
On the other hand, concurrency problems cannot occur when you:
- work exclusively in a single-threaded environment or
- never pass copies of
fix_str
objects between threads
In the latter cases you may prefer the slightly faster fix_str
classes for single-threaded environments, fix_str_as
and fix_str_ws
. Hint: the static member function fix_str::duplicate()
can be used to create completely independent copies of fix_str
objects (no shared reference-counter, see also function documentation).
Win32
There is a default typedef
in fix_str.h for fix_str
, dependent on the definition of the macros _MT
(Multi-Threaded) and _UNICODE
.
#if defined (_UNICODE) && defined (_MT) // UNICODE, Multi-Threaded
typedef fix_str_wm fix_str;
#elif defined (_UNICODE) && !defined (_MT) // UNICODE, Single-Threaded
typedef fix_str_ws fix_str;
#elif !defined (_UNICODE) && defined (_MT) // ASCII, Multi-Threaded
typedef fix_str_am fix_str;
#elif !defined (_UNICODE) && !defined (_MT) // ASCII, Single-Threaded
typedef fix_str_as fix_str;
#endif
You can use fix_str
in the familiar Win32 style, including the popular but annoying _T()
macro:
fix_str fs (_T("Hello, world!"));
Each fix_str_xx
class is available individually. You may even use different fix_str_xx
classes in the same application:
fix_str_wm fs1 (L"Hello, world!");
fix_str_as fs2 ("Hello, world!");
The fix_str
classes also work in non-Windows environments (at least in single-threaded).
Limitations
- no
operator+
is provided for performance reasons; instead use a constructor (actually, this is a feature, not a limitation):
fs = fix_str ("Use ", "a ", "constructor ", "to ",
"efficiently ", "concatenate ", "strings");
- embedded
NULL
s are not possible since fix_str
is based on standard C functions.
- usable for fixed-length character encodings like UTF-16 which is the encoding standard at Microsoft (and on the Macintosh, on the Java platform, ...).
fix_str
objects are compared 'binary', i.e. they are equal only if they contain the same sequence of bytes.
Other 'Immutable' String Implementations in C++
const_string<>
: in sum, 'immutable' but with some mutating functions, 'Boost-style' and 'boost' namespace but not a Boost library, 'thread safe' but not safe for concurrent writes.
Conclusion
fix_str
is a set of lightweight string classes akin to the immutable string classes in other languages. You may consider using fix_str
when a string changes rarely but is copied frequently, e.g. when a container is sorted and when set()
/get()
functions for strings are called a lot.
History
- October 18, 2005 - Submission to CodeProject.
- October 28, 2005 - Submission of updated article to CodeProject.
- Article refactored, especially paragraphs 'Unicode and Multi-Threading' and 'About Thread-Safety' rewritten for more clarity (hopefully).