Click here to Skip to main content
15,867,568 members
Articles / Programming Languages / C++
Article

fix_str - An (Almost) Immutable String Class in C++

Rate me:
Please Sign up or sign in to vote.
4.58/5 (11 votes)
7 Nov 20056 min read 70.1K   633   11   9
New style string class(es) for ASCII and UNICODE, single- and multi-threaded environments.

Introduction

C#, Java, Python and other programming languages have an immutable string class. Why not C++? Immutable value objects have demonstrated many advantages (in languages that foster them). The problem is that in C++ you cannot put an immutable object into a std::container or call a someobj.set(string) function without operator=, a mutating function. Other languages seemingly don't face this problem because they conceive strings as reference objects whereas in C++ they only make sense as value objects. On the other hand, in order to be usable, immutable objects need mutable(!) references. So, from a conceptual point of view, the difference between immutable string classes in C#, Java, ... and the (almost immutable) string class(es) for C++ I present here is not as big as it seems at first sight.

Background

Motivation

More often than not, you need not change a string once it has been created. It seems reasonable to design a string class that is 'cheap' to copy and assign but 'expensive' to modify. This is what the immutable string classes in C#, Java, and other languages aim at. Consequently, calling someobj.set(string) or someobj.get() functions, inserting strings into a container, sorting, replacing strings, ... can be done without ever requiring a 'deep' copy of the string contents.

In general, this kind of a string class is useful when the string changes rarely but is copied frequently. For heavily changing strings, C# and Java provide a mutable 'StringBuilder' companion class.

Three 'Prototypical' String Classes

Three prototypical (imaginary) C++ string classes can be distinguished:

  • 'StringBuilder': mutable string class for strings with frequently changing contents.
  • 'FixString': immutable string, no changes after construction (like in C# String).
  • 'AutoBuffer': array of characters on the stack without dynamic allocation.

In C++, the current std::string[5] implementations typically combine two of the above prototypical approaches, a compromise that hardly is optimal or even appropriate for all cases.

Comparison of fix_str with other string implementations

 std:string[5]
(VC 6.0)
std::string[5]
(VC 7.1)
CString
(MFC)
fix_str
sizeof162844
copy / assignment methodreference-counted + COW[1]deep copy + SSO[2]reference-counted + COW[1]reference-counted
copy / assignment speedfastfast or slow[3]fastfast
default constructorfastfastfastfast
constructor for length > 0slowfast or slow[3]slowslow
usable in multi-threaded environments?yes?yes/no[4]
thread safe for concurrent writenononono
mutableyesyesyesassignable but otherwise immutable
  1. COW: Copy-On-Write
  2. SSO: Small String Optimization; the string contains a buffer (16 byte in VC 7.1) for small strings and allocates memory on the heap only for larger strings.
  3. Fast with SSO for strings <= 15 char or <= 7 wchar_t (UNICODE), respectively.
  4. Different classes for single- and multi-threaded environments (see below).
  5. 'std::string' is a typedef of template<class charT, class traits = char_traits<charT>, class Allocator = allocator<charT> > class basic_string.

fix_str

fix_str basics

  • fix_str is a (set of) very lightweight string class(es).
  • implemented deliberately as classes, not as a template, and without namespaces.
  • designed as a value type.
  • default constructor, copy constructor and operator= are always 'cheap'.
  • the contents of a fix_str object cannot be changed except by assignment.

Using the Code

Examples:

// constructors for 0 - 8 arguments
fix_str fs ("Hello", " ", "world", "!");
fix_str fs2 (fs, " and again ", fs);
// no dynamic allocation for assignment, copying, and empty fix_str
fix_str fs4;
fix_str fs5 (fs);
fs4 = fs5;
// non-static member functions (and friends)
size_t pos = fs.find ("world"); // pos: 6
pos = fs.rfind ("Hell"); // pos: 0
long h = fs.hash_code();
if (fs == fs2) { ... }
if (fs2 > fs) { ... }
// static member functions create a new fix_str object
fs = fix_str::sub_str (fs, 5); // fs: " world!"
fs = fix_str::trim (fs); // fs: "world!"
fs = fix_str::pad_front (fs, 9, '.'); // fs: "...world!"
fs = fix_str::value_of (123); // fs: "123"

Four Types of fix_str Functions

You can distinguish four groups of fix_str functions:

  1. default constructor, copy constructor, assignment operator: these functions do not allocate heap memory, have exception specification throw().
  2. constructors for 1 - 8 arguments (fix_strs or character strings).
  3. non-static member functions like find(), rfind(), hash_code() and (friend) operators ==, !=, <, >, <=, >=; (exception specification throw()).
  4. static member functions: sub_str(), duplicate(), trim_front(), trim_back(), trim(), pad_front(), pad_back(), value_of(); these create a new fix_str object and therefore allocate heap memory (of course, e.g. trim_front() only creates a new object if a trim is necessary, otherwise it just returns the input).

One design goal for fix_str is to clearly separate 'expensive' and 'cheap' functions. You always know the cost of each function call when you write it. There are no hidden, but sometimes expensive, 'optimizations' behind your back.

Points of Interest

Unicode and Multi-Threading

Why four fix_str classes?

There are different classes for:

  • ASCII (char) and UNICODE (wchar_t) strings (similar to Win32-API functions).
  • Single- and multi-threaded environments.

Strictly speaking, having different classes for single- and multi-threaded environments indicates that an implementation detail (reference-counting) shows up in the class interface.

The fix_str variants:

 char (ASCII)wchar_t (UNICODE)
Single-Threaded Environmentfix_str_asfix_str_ws
Multi-Threaded Environmentfix_str_amfix_str_wm

About 'Thread-Safety'

The term "thread safety" is sometimes used with unclear or ambiguous meaning, especially in C++. One must always ask: 'Thread safe in what respect?'. I don't call fix_str classes 'thread safe'. Some are usable in multi-threaded environments.

fix_str Classes for Multi-Threaded Environments

fix_str objects which are used in different threads may share the same internal representation and hence the same reference-counter (because they are copies of each other). In this case, atomic increment and decrement of the reference-counter must be assured internally by the implementation. This is what the fix_str classes for multi-threaded environments, fix_str_am and fix_str_wm, guarantee. As a rule of thumb, take these classes when you use copies of the same object in different threads.

But it is never safe for two or more threads to concurrently write to (assign to) the same fix_str object (remember, assignment is the only way to change any fix_str object). Concurrent writes to the same object must always be protected by the user.

fix_str Classes for Single-Threaded Environments

On the other hand, concurrency problems cannot occur when you:

  • work exclusively in a single-threaded environment or
  • never pass copies of fix_str objects between threads

In the latter cases you may prefer the slightly faster fix_str classes for single-threaded environments, fix_str_as and fix_str_ws. Hint: the static member function fix_str::duplicate() can be used to create completely independent copies of fix_str objects (no shared reference-counter, see also function documentation).

Win32

There is a default typedef in fix_str.h for fix_str, dependent on the definition of the macros _MT (Multi-Threaded) and _UNICODE.

#if    defined (_UNICODE) &&  defined (_MT) // UNICODE, Multi-Threaded
  typedef fix_str_wm fix_str;
#elif  defined (_UNICODE) && !defined (_MT) // UNICODE, Single-Threaded
  typedef fix_str_ws fix_str;
#elif !defined (_UNICODE) &&  defined (_MT) // ASCII,   Multi-Threaded
  typedef fix_str_am fix_str;
#elif !defined (_UNICODE) && !defined (_MT) // ASCII,   Single-Threaded
  typedef fix_str_as fix_str;
#endif

You can use fix_str in the familiar Win32 style, including the popular but annoying _T() macro:

fix_str fs (_T("Hello, world!"));

Each fix_str_xx class is available individually. You may even use different fix_str_xx classes in the same application:

fix_str_wm fs1 (L"Hello, world!");
fix_str_as fs2 ("Hello, world!");

The fix_str classes also work in non-Windows environments (at least in single-threaded).

Limitations

  • no operator+ is provided for performance reasons; instead use a constructor (actually, this is a feature, not a limitation):
    fs = fix_str ("Use ", "a ", "constructor ", "to ",
                  "efficiently ", "concatenate ", "strings");
  • embedded NULLs are not possible since fix_str is based on standard C functions.
  • usable for fixed-length character encodings like UTF-16 which is the encoding standard at Microsoft (and on the Macintosh, on the Java platform, ...). fix_str objects are compared 'binary', i.e. they are equal only if they contain the same sequence of bytes.

Other 'Immutable' String Implementations in C++

  • const_string<>: in sum, 'immutable' but with some mutating functions, 'Boost-style' and 'boost' namespace but not a Boost library, 'thread safe' but not safe for concurrent writes.

Conclusion

fix_str is a set of lightweight string classes akin to the immutable string classes in other languages. You may consider using fix_str when a string changes rarely but is copied frequently, e.g. when a container is sorted and when set()/get() functions for strings are called a lot.

History

  • October 18, 2005 - Submission to CodeProject.
  • October 28, 2005 - Submission of updated article to CodeProject.
    • Article refactored, especially paragraphs 'Unicode and Multi-Threading' and 'About Thread-Safety' rewritten for more clarity (hopefully).

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
Austria Austria
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralWhy (?) Pin
albeanou21-Oct-05 21:47
albeanou21-Oct-05 21:47 
GeneralRe: Why (?) Pin
Roland Pibinger23-Oct-05 9:06
Roland Pibinger23-Oct-05 9:06 
GeneralVery handy. Pin
WREY21-Oct-05 9:00
WREY21-Oct-05 9:00 
GeneralRe: Very handy. Pin
Roland Pibinger23-Oct-05 8:53
Roland Pibinger23-Oct-05 8:53 
GeneralRe: Very handy. Pin
WREY23-Oct-05 12:21
WREY23-Oct-05 12:21 
AnswerRe: Very handy. Pin
Roland Pibinger24-Oct-05 9:11
Roland Pibinger24-Oct-05 9:11 
Generala question Pin
go_gilly20-Oct-05 19:26
go_gilly20-Oct-05 19:26 
GeneralRe: a question Pin
Anonymous20-Oct-05 22:33
Anonymous20-Oct-05 22:33 
GeneralRe: a question Pin
Roland Pibinger21-Oct-05 5:48
Roland Pibinger21-Oct-05 5:48 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.