Click here to Skip to main content
Click here to Skip to main content

CString-clone Using Standard C++

By , 7 Dec 2011
 

Introduction

As much as I use and appreciate the Standard C++ Library, I've never liked its string template - basic_string<>. At times, it seems the designers went out of their way to make it difficult to use.

On the other hand, I've always loved the ease of use of MFC's CString class. It checks for NULL pointers, implicitly converts to const TCHAR*, and has some very handy member functions (Format, Load, etc.) that make string programming a breeze. But of course, I don't want to use MFC anymore. In fact, I don't want to rely on any proprietary library because I want portability.

Therefore I decided to combine the best of both worlds and create:

CStdString

This is a class (a template instantiation actually) that derives from from basic_string<TCHAR>. To the basic_string it adds the entire CString API. You get CString ease of use with 100% basic_string compatibility. In short, a CStdString object is a basic_string that (with very few exceptions (noted below) it is also a drop-in replacement for CString. The best part of this is that both APIs (basic_string and CString) are well known and well documented.

I originally submitted this article to another code site (which shall remain nameless :)) a few years ago. I like CodeProject so much I thought I'd submit it here too. I have used this class in almost every professional project I've done over the past 4 years. It has proven to be the single most useful piece of code I've ever written. It is also extensively debugged. I hope you like it. If you ever have any problems with it, please e-mail me. I'm happy to help.

I provided a simple source application here to prove some of the CString functions work but it's really just a token. The list of sample projects out there that use CString and/or basic_string is massive.

Features

  • Drop in Replacement for CString (see below for exceptions)
  • Two instantiations available at all times -- wchar_t-based version CStdStringW and char-based version CStdStringA. The name CStdString is just a typedef of one of these two.
  • Safely checks for NULL string pointer inputs (like CString) in all functions
  • Extra constructors and assignment operators to automatically convert between wide (wchar_t-based) and thin (char-based) strings for you.
  • Implicit conversion to c_str(). The C++ committee doesn't like this but I sure do.
  • Builds on several platforms, including Windows, Unix and Linux. Works with several implementations of the Standard C++ Library, including Dinkumware, GNU, CodeWarrior, and STLPort.
  • Win32 builds give you some extra goodies like UNICODE/MBCS conversion macros (just like MFCs) as well as member functions for persisting CStdString objects to and from DCOM IStreams.
  • Makes no use of any implementation details of the base class template (basic_string)
  • The derived template adds no member data to basic_string and adds no virtual functions

There are a couple of issues about this code of that I should point out.

CString Compatibility

I was unable to exactly reproduce the CString API. There are a two functions that both CString and basic_string; share, but implement differently. In these cases, I felt it best to make CStdString behave like basic_string (the base class) rather than CString. To be specific.

  • CStdString::operator[] returns characters by value (unlike CString which returns them by reference)
  • The constructor that takes a character and a count takes them in the order (count, value) which is the opposite of the order CString declares them. That's the order that basic_string<>; needs and it was impossible to implement both versions.

There were also two CString functions I could not implement at all -- LockBuffer and UnlockBuffer.

Deriving From basic_string<>

The template I wrote derives from basic_string, a class template without a virtual destructor. Any introductory text to C++ will tell you that it is dangerous to derive from a class without a virtual destructor. It can lead to behavior that is undefined. So if you were to code the following (deleting a CStdStringA through a pointer to the base class), you would technically get undefined behavior:

// assign DERIVED object to  BASE pointer
std::string* pstr = new CStdStringA("Hi"); 

// delete  DERIVED through BASE class pointer -- UNDEFINED
delete pstr;   

Personally, I don't think this is much of an issue. I mean really how often do you actually do this with string objects? I have rarely (if ever) needed to dynamically allocate a string object on the heap. And if I ever do, I won't using a base-class pointer. So if you don't do this, you'll never have to worry. In fact, even if you do code this way, I doubt you'll have any problems with CStdString. I can tell you that at least with Microsoft Visual C++, even the above code runs just fine with no errors or memory leaks. I doubt many other compilers would give you problems either. However my doubt does not impose reality on the C++ world. Caveat Emptor.

History

  • 7 Dec 2011: Updated source code.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Joe O'Leary
Web Developer
United States United States
Member
I've been a software developer since 1990.
 
While my main page is out of date (and I have therefore blanked it), you can read about the CStdString pages here
 
http://home.earthlink.net/~jmoleary/stdstring.htm

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionAbout Stephen Hewitt's GetBuffer Question [modified]memberFeiluHang15 Apr '13 - 1:25 
If this is not safe, why not just check the memory address, and then allocate a temp buffer for the user? Or just always allocate a temp buffer?
 
GetBuf()
if ( static_cast<int>(this->size()) < nMinLen )
			this->resize(static_cast<MYSIZE>(nMinLen));
 
        if ( (&(this->at(length())) - &(this->at(0))) == length() * sizeof(TCHAR))) //if the memory is contiguous
              return this->empty() ? const_cast<CT*>(this->data()) : &(this->at(0));
        else
             {
              if (nMinLen < length()+1) 
                    nMinLen = length()+1;
              _tempbuffer = new TCHAR[nMinLen];
              _tcscpy(_tempbuffer,this->data());
              return _tempbuffer;
              }
 
RelBuf
         {
             if (NULL != _tempbuffer)
             {
                 *this = _tempbuffer;
                 delete [] _tempbuffer;
                  tempbuffer = NULL;
                 
              }
            this->resize(static_cast<MYSIZE>(nNewLen > -1 ? nNewLen :sslen(this->c_str())));
 
}


modified 15 Apr '13 - 7:32.

AnswerRe: About Stephen Hewitt's GetBuffer QuestionmemberJoe O'Leary17 Apr '13 - 16:10 
To quote my answer to your other post on this question below:
 
Quote:
I'm afraid this would not be a solution. You can't just return a buffer that you have allocated via operator new. The caller is not expecting to have to free any buffer returned to him/her. So you would end up with a memory leak in this case.

And, while checking addresses might reduce the chance of my "hack" coming back to bite you, it would not eliminate it, I'm afraid. Of course it could only bite you in the mythical implementation that uses non-contiguous buffers but if you actually had such an implementation, there is still the possibility that intermediate characters are not at memory locations between the first and last.

-Joe

GeneralMy vote of 5mvpMichael Haephrati מיכאל האפרתי8 Jan '13 - 11:42 
Great work! BTW CString is also part of ATL
GeneralMy vote of 5memberDave Calkins15 Nov '12 - 3:20 
Nice work! This could be very useful for getting CString in a non-MFC environment. Thanks for the article!
SuggestionThank you!memberuser_user_user12 Nov '12 - 22:23 
Great class. Thank you!
My vote of 5.
 
Suggestion:
How about using _vscprintf/_vscwprintf function,
when calculating resulting buffer size in CStdStr<T>::FormatV?
I don't know whether compilers other than Visual C++ supports this function,
but Visual C++ supports _vscprintf/_vscwprintf.
Question[My vote of 2] WeakmemberxComaWhitex11 Oct '12 - 18:16 
Personally, I don't see what is so great about CString let alone std::string. Why were you even making std::string* str = ...? std::strings don't need to be pointers or any string class as they are shared classes. Personally, if you are going to bother making any decent string class. I wouldn't waste time using crappy hungarian notation. If you're even going to get any other developer on a OS besides windows. They will not hungarian notation as it serves no purpose in this modern world. Look at C#. They do not even use it and never should you. That being said. I would recommend you looking at Qt's QString class which offers more than enough of a string class that is 1 million times better than CString and std::string.
 
Source code is ugly too. I suggest you splitting up the the classes and use type_traits based on how you're going to use it, which will make the code less ugly and more readable. Use proper parameter names too.
AnswerRe: [My vote of 2] WeakmemberJoe O'Leary11 Oct '12 - 18:40 
I don't even understand the first part of your post. You wrote this:
 
Quote:
Why were you even making std::string* str =

 
I agree that no one should use that. I never said anyone should. So what are you objecting to?
 
As for not liking Hungarian notation, well that ship has sailed for most folks. Too many are already working in code bases that have Hungarian. I merely followed suit.
 
And I have used QT's string class. I spent two years dealing with that. It's fine if all you do is live in the world of QT (which uses the Hungarian Notation you object to, by the way). Otherwise, No thanks.
 
If you object for the formatting, perhaps you could rewrite the code for us to suit your aesthetic sensibilities. That's the great thing about free code. You can rewrite it, repost it to suit your needs.
-Joe

GeneralRe: [My vote of 2] WeakmemberxComaWhitex11 Oct '12 - 18:45 
Btw it's Qt not QT. QT is Quicktime. Qt isn't an acronym. Where is Hungarian notation anywhere in Qt? now, I don't know about internal, as I don't mess with that. Just saying, it's useable outside of Qt too, if you actually wrote a class that performed the same functionality. Why would I bother rewriting code you wrote? That's your job not mine. I personally have no use for the code. I'm not saying your code is bad as I haven't used it. I've only glanced over it and gave my opinions.
GeneralRe: [My vote of 2] WeakmemberJoe O'Leary11 Oct '12 - 18:49 
Yes and I gave mine. Thanks for sharing
-Joe

AnswerRe: [My vote of 2] WeakmemberStephen Hewitt11 Oct '12 - 22:58 
Quote:
Why were you even making std::string* str = ...? std::strings don't need to be pointers or any string class as they are shared classes.

 
You're missing the point. The intent was a heads-up warning that CStdString doesn't support being polymorphically deleted through it's base class.
 
Steve

GeneralRe: [My vote of 2] WeakmemberxComaWhitex11 Oct '12 - 23:07 
If you bothered reading the article. I was was referring to this:
 
// assign DERIVED object to BASE pointer
std::string* pstr = new CStdStringA("Hi");
 
// delete DERIVED through BASE class pointer -- UNDEFINED
delete pstr;
 
Which you should never make a std::string or any string class.
GeneralRe: [My vote of 2] WeakmemberStephen Hewitt11 Oct '12 - 23:21 
I did read the article, and reviewed the code, and I was referring to that too! My comment stands: since basic_string doesn't have a virtual destructor it's not safe to delete a CStdString polymorphically via a pointer to a basic_string. The author wasn't presenting a design pattern but warning users about a limitation.
 
Steve

GeneralRe: [My vote of 2] WeakmemberxComaWhitex11 Oct '12 - 23:24 
Okay, then ignore that. :P
AnswerRe: [My vote of 2] WeakmemberDave Calkins15 Nov '12 - 3:19 
If you don't like Hungarian notation or CString then just don't use this class. Seems like it does provide a use for those who are familiar with and/or want to use CString in a non-MFC environment. Complaining because you don't like hungarian notation or think its "modern" is a pretty cheap shot and not all that constructive imo. I use CString a lot and I think this could be a very useful tool in a non-MFC environment.
GeneralThis isn't standard C++. [modified]memberStephen Hewitt5 Oct '12 - 0:51 
The following is an excerpt from your code:
CT* GetBuf(int nMinLen=-1)
{
	if ( static_cast<int>(size()) < nMinLen )
		resize(static_cast<MYSIZE>(nMinLen));
 
	return empty() ? const_cast<CT*>(data()) : &(at(0));
}
 
It is not legal or portable to cast away the const and modify the buffer, nor can you assume that &at(0) == &at(1)-1. What's more, the assumption that these illegal modifications to the buffer will actually modify the underlying string is also illegal and implementation defined. For example, it is legal for a basic_string to use non-contiguous memory regions (and doing so has advantages) for it's underlying storage. In this case the buffer returned by c_str and data is created on demand and changes made to it need not be copied back to the string's real buffer and &at(0) may or may not be equal to &at(1)-1 .
 
The reason for c_str and data is to support interfacing with legacy code and the reason they return const pointers is to allow basic_string the flexibility to use such optimisations.
 
Steve


modified 5 Oct '12 - 7:10.

GeneralRe: This isn't standard C++.memberJoe O'Leary5 Oct '12 - 1:12 
You are correct. The buffer managed by basic_string need not be contiguous and if it were not, GetBuffer() and its variations would simply not work. There is, of course, no other way to implement CString's GetBuffer functionality as it depends specifically upon this behavior. I understood that when I first wrote the code, roughly 16 years ago.
 
I finally decided to "risk" this and use my implementation because I was unable to find any implementation of std::basic_string that did *not* use a contiguous buffer. And now, 16 years later, I still cannot name such an implementation. Can you?
-Joe

GeneralRe: This isn't standard C++.memberStephen Hewitt5 Oct '12 - 1:19 
I don't have a vast array of C++ compilers or STL versions, so I can't really comment on that. You are presenting the code as standard C++ however, both in the article and in the code's comments, so I would suggest that this section of the code warrants both a comment in the source and a heads up in the documentation, and there's an argument that if it's present at all you should be able to control its inclusion with a macro.
 
Steve

GeneralRe: This isn't standard C++.memberJoe O'Leary5 Oct '12 - 4:18 
Neither do I. I've tested it with a few (STLPort, gcc, Borland back in the day, etc) and have heard from more people using more compilers/tools/implementations than I can count. I do not believe such an implementation exists. Not because it could not be done, but because it sounds better in theory than it works in practice. To wit: the buggy "optimized" implementation that Dinkumware put out for MS originally that used--as I vaguely recall-- broken "copy on write" semantics that fell over in multithreaded scenarios. They didnt even try to fix that. They just got rid of it in a hurry.
 

When I first posted this class on another board I actually did not have a GetBuffer implementation at all (just as there is still no LockBuffer imementation). And it was for that very reason: There is no way to do it without unsafe code. The few other CStdString functions that use it used to new/delete buffers and then assign instead. But so many people kept asking me for it that I wrote one. Over the years, I have never felt the need to replace it. My philosophy since has been caveat emptor. There are a couple of aspects of this code for which that attitude is necessary ( e.g. deriving from a class without a virtual destructor.). If you think GetBuffer is scary, take a look at the code gymnastics I go through in Format to protect people from a Microsoft implementation hack. Its safer but in the end you can only go so far.
 
In the end I view this class as a way for people to wean themselves off of CString and functions like GetBuffer, Format and other dicey operations it provides in favor of the basic_string equivalents, stringstreams, etc. Beyond that, its main utility is in the conversions, NULL checking, and few Win32 wrappers it provides that simplify string programming.
 
This class does not "present itself as standard". I have implemented the CString API as best I can using basic_string functions but that implementation is necessarily far from perfect. Rest assured,I will not be submitting this to ISO anytime soon. Smile | :)
 
i am not even a little bit worried that anyone is ever going to run into an implementation of the library that breaks my version GetBuffer, but if/when I ever get around to posting my local version (that fills in a few missing CString functions) I will add both a preprocessor macro and a comment for GetBuffer. I will leave it enabled by default, however.
-Joe

GeneralRe: This isn't standard C++.memberFeiluHang15 Apr '13 - 1:17 
If this is not safe, why not just check the memory address, and then allocate a temp buffer for the user? Or just always allocate a temp buffer?
 
GetBuf()
        if ( (&(this->at(length)) - &(this->at(0))) == length * sizeof(TCHAR))) //if the memory is contiguous
              return this->empty() ? const_cast<CT*>(this->data()) : &(this->at(0));
        else
             {
              if (nMinLen < length+1) 
                    nMinLen = length+1;
              _tempbuffer = new TCHAR[nMinLen];
              _tcscpy(_tempbuffer,this->data());
              return _tempbuffer;
              }
 
RelBuf
         {
             if (NULL != _tempbuffer)
             {
                 *this = _tempbuffer;
                 delete [] _tempbuffer;
                  tempbuffer = NULL;
                 return;
              }
            this->resize(static_cast<MYSIZE>(nNewLen > -1 ? nNewLen :sslen(this->c_str())));
 
}

GeneralRe: This isn't standard C++.memberJoe O'Leary17 Apr '13 - 16:08 
I'm afraid this would not be a solution. You can't just return a buffer that you have allocated via operator new. The caller is not expecting to have to free any buffer returned to him/her. So you would end up with a memory leak in this case.
 
And, while checking addresses might reduce the chance of my "hack" coming back to bite you, it would not eliminate it, I'm afraid. Of course it could only bite you in the mythical implementation that uses non-contiguous buffers but if you actually had such an implementation, there is still the possibility that intermediate characters are not at memory locations between the first and last.
-Joe

Questionmissing CStringT::TokenizememberMizan Rahman21 May '12 - 23:21 
Hi,
 
Great article. But the class is missing CStringT::Tokenize.
 
/Mizan
GeneralMy vote of 5memberAndy Bantly25 Apr '12 - 5:37 
Great work!
Questionstd::isspace in VC6 returns true for international characters like Ö [modified]memberKarl Edwall4 Mar '12 - 17:12 
There seems to be an issue with the Trim functions because you use the std::isspace which for Dinkumware in VC6 which doesn't accurately handle international characters properly. I've changed it to use isspace instead but perhaps for somebody else with the same issue something to take into consideration.
 
Update:
Seems only
bool bRet = std::isspace((WCHAR)L"Ö", std::locale());
has the problem so really only an issue with UNICODE builds.
 
Karl

modified 4 Mar '12 - 23:26.

AnswerRe: std::isspace in VC6 returns true for international characters like ÖmemberJoe O'Leary4 Mar '12 - 18:20 
Hey thanks for that. I wasn't aware there was an issue. In the next update I make (which I keep promising but not delivering...) I will put an #ifdef check in there for _MSC_VER that uses regular old isspace for VC6
-Joe

GeneralRe: std::isspace in VC6 returns true for international characters like ÖmemberKarl Edwall4 Mar '12 - 18:22 
Yeah seems to be a rounding issue of some kind, if you change it to (int) or (CHAR), it works fine but seeing the UNICODE one is basically basic_string probably why it happens.
 
Karl

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130516.1 | Last Updated 7 Dec 2011
Article Copyright 2001 by Joe O'Leary
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid