|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
Introduction
Much of what you need to do is pretty straightforward. This is not a complete tutorial on
String ConcatenationOne of the very convenient features of CString gray("Gray"); CString cat("Cat"); CString graycat = gray + cat; is a lot nicer than having to do something like: char gray[] = "Gray"; char cat[] = "Cat"; char * graycat = malloc(strlen(gray) + strlen(cat) + 1); strcpy(graycat, gray); strcat(graycat, cat); Formatting (including integer-to-CString)Rather than using CString s; s.Format(_T("The total is %d"), total); The advantage here is that you don't have to worry about whether or not the buffer is large enough to hold the formatted data; this is handled for you by the formatting routines. Use of formatting is the most common way of converting from non-string data types to a CString s; s.Format(_T("%d"), total); I always use the #define _T(x) x // non-Unicode version whereas for a Unicode application it is defined as #define _T(x) L##x // Unicode version so in Unicode the effect is as if I had written s.Format(L"%d", total); If you ever think you might ever possibly use Unicode, start coding in a Unicode-aware fashion. For example, never, ever use #define DIM(x) ( sizeof((x)) / sizeof((x)[0]) ) This is not only useful for dealing with Unicode buffers whose size is fixed at compile time, but any compile-time defined table. class Whatever { ... }; Whatever data[] = { { ... }, ... { ... }, }; for(int i = 0; i < DIM(data); i++) // scan the table looking for a match Beware of those API calls that want genuine byte counts; using a character count will not work. TCHAR data[20]; lstrcpyn(data, longstring, sizeof(data) - 1); // WRONG! lstrcpyn(data, longstring, DIM(data) - 1); // RIGHT WriteFile(f, data, DIM(data), &bytesWritten, NULL); // WRONG! WriteFile(f, data, sizeof(data), &bytesWritten, NULL); // RIGHT This is because Using Converting a CString to an integerThe simplest way to convert a While generally you will suspect that CString hex = _T("FAB"); CString decimal = _T("4011"); ASSERT(_tcstoul(hex, 0, 16) == _ttoi(decimal)); Converting between char * and CStringThis is the most common set of questions beginners have on the For example, having noticed the above example you might wonder why you can't write CString graycat = "Gray" + "Cat"; or CString graycat("Gray" + "Cat"); In fact the compiler will complain bitterly about these attempts. Why? Because the + operator is defined as an overloaded operator on various combinations of the CString graycat = CString("Gray") + CString("Cat"); or even CString graycat = CString("Gray") + "Cat"; If you study these, you will see that the + always applies to at least one char * to CStringSo you have a char * p = "This is a test" or, in Unicode-aware applications TCHAR * p = _T("This is a test") or LPTSTR p = _T("This is a test"); you can write any of the following: CString s = "This is a test"; // 8-bit only CString s = _T("This is a test"); // Unicode-aware CString s("This is a test"); // 8-bit only CSTring s(_T("This is a test"); // Unicode-aware CString s = p; CString s(p); Any of these readily convert the constant string or the pointer to a TCHAR * p = _T("Gray"); CString s(p); p = _T("Cat"); s += p; and be sure that the resulting string is There are several other methods for CString to char * I: Casting to LPCTSTRThis is a slightly harder transition to find out about, and there is lots of confusion about the "right" way to do it. There are quite a few right ways, and probably an equal number of wrong ways. The first thing you have to understand about a Unless you do some special things, you know nothing about the size of the buffer that is associated with the The operator CString s("GrayCat"); LPCTSTR p = s; and it works correctly. This is because of the rules about how casting is done in C; when a cast is required, C++ rules allow the cast to be selected. For example, you could define (float) as a cast on a complex number (a pair of floats) and define it to return only the first float (called the "real part") of the complex number so you could say Complex c(1.2f, 4.8f); float realpart = c; and expect to see, if the (float) operator is defined properly, that the value of This works for you in all kinds of places. For example, any function that takes an BOOL DoSomethingCool(LPCTSTR s); and call it as follows CString file("c:\\myfiles\\coolstuff") BOOL result = DoSomethingCool(file); This works correctly because the But what if you want to format it? CString graycat("GrayCat"); CString s; s.Format("Mew! I love %s", graycat); Note that because the value appears in the variable-argument list (the list designated by " Well, surprise, you actually get the string "Mew! I love GrayCat" because the MFC implementers carefully designed the What you can't CString v("1.00"); // currency amount, 2 decimal places LPCTSTR p = v; p[lstrlen(p) - 3] = ','; If you try to do this, the compiler will complain that you are assigning to a constant string. This is the correct message. It would also complain if you tried strcat(p, "each"); because Don't try to defeat these error messages. You will get yourself into trouble! The reason is that the buffer has a count, which is inaccessible to you (it's in that hidden area that sits below the CString to char * II: Using GetBufferA special method is available for a CString s(_T("File.ext")); LPTSTR p = s.GetBuffer(); LPTSTR dot = strchr(p, '.'); // OK, should have used s.Find... if(p != NULL) *p = _T('\0'); s.ReleaseBuffer(); This is the first and simplest use of CString s(...); LPTSTR p = s.GetBuffer(); //... lots of things happen via the pointer p int n = s.GetLength(); // BAD!!!!! PROBABLY WILL GIVE WRONG ANSWER!!! s.TrimRight(); // BAD!!!!! NO GUARANTEE IT WILL WORK!!!! s.ReleaseBuffer(); // Things are now OK int m = s.GetLength(); // This is guaranteed to be correct s.TrimRight(); // Will work correctly Suppose you want to actually extend the string. In this case you must know how large the string will get. This is just like declaring char buffer[1024]; knowing that 1024 is more than enough space for anything you are going to do. The equivalent in the LPTSTR p = s.GetBuffer(1024);
This call gives you not only a pointer to the buffer, but guarantees that the buffer will be (at least) 1024 bytes in length. Also, note that if you have a pointer to a A common "bad idiom" left over from C programmers is to allocate a buffer of fixed size, do a char buffer[256]; sprintf(buffer, "%......", args, ...); // ... means "lots of stuff here" CString s = buffer; while the better form is to do CString s; s.Format(_T("%....", args, ...); Note that this always works; if your string happens to end up longer than 256 bytes you don't clobber the stack! Another common error is to be clever and realize that a fixed size won't work, so the programmer allocates bytes dynamically. This is even sillier: int len = lstrlen(parm1) + 13 + lstrlen(parm2) + 10 + 100; char * buffer = new char[len]; sprintf(buffer, "%s is equal to %s, valid data", parm1, parm2); CString s = buffer; .... delete [] buffer; Where it can be easily written as CString s; s.Format(_T("%s is equal to %s, valid data"), parm1, parm2); Note that the CString to char * III: Interfacing to a controlA very common operation is to pass a TVINSERTITEMSTRUCT tvi; CString s; // ... assign something to s tvi.item.pszText = s; // Compiler yells at you here // ... other stuff HTREEITEM ti = c_MyTree.InsertItem(&tvi); Now why did the compiler complain? It looks like a perfectly good assignment! But in fact if you look at the structure, you will see that the member is declared in the LPTSTR pszText;
int cchTextMax;
Therefore, the assignment is not assigning to an OK, you say, I can deal with that, and you write tvi.item.pszText = (LPCTSTR)s; // compiler still complains! What the compiler is now complaining about is that you are attempting to assign an const int i = ...; //... do lots of stuff ... = a[i]; // usage 1 // ... lots more stuff ... = a[i]; // usage 2 Then the compiler can trust that, because you said const int i = ...; int * p = &i; //... do lots of stuff ... = a[i]; // usage 1 // ... lots more stuff (*p)++; // mess over compiler's assumption // ... and other stuff ... = a[i]; // usage 2 The the compiler would believe in the constancy of Why not just declare the member as an Therefore, you will often find in my code something that looks like tvi.item.pszText = (LPTSTR)(LPCTSTR)s; This casts the You need a slightly different method when you are trying to retrieve data, such as the value stored in a control. For example, for a TVITEM tvi; // ... assorted initialization of other fields of tvi tvi.pszText = s.GetBuffer(MY_LIMIT); tvi.cchTextMax = MY_LIMIT; c_MyTree.GetItem(&tvi); s.ReleaseBuffer(); Note that the code above works for any type of CString to BSTRWhen programming with ActiveX, you will sometimes need a value represented as a type You can convert at CString s; s = ... ; // whatever BSTR b = s.AllocSysString() The pointer ::SysFreeString(b); to free the string. The story is that the decision of how to represent strings sent to ActiveX controls resulted in some serious turf wars within Microsoft. The Visual Basic people won, and the string type BSTR to CStringSince a For example, if you do, in an ANSI application, BSTR b; b = ...; // whatever CString s(b == NULL ? L"" : s) works just fine for a single-string Remember, according to the rules of C/C++, if you have an In UNICODE mode, this is just the constructor CString::CString(LPCTSTR); As indicated above, in ANSI mode there is a special constructor for CString::CString(LPCWSTR); this calls an internal function to convert the Unicode string to an ANSI string. (In Unicode mode there is a special constructor that takes an There is an additional problem as pointed out above: Note that the conversion from Unicode to ANSI uses the If you are compiling as CString convert(BSTR b)
{
if(b == NULL)
return CString(_T(""));
CString s(b); // in UNICODE mode
return s;
}
If you are in ANSI mode, you need to convert the string in a more complex fashion. This will accomplish it. Note that this code uses the same argument values to CString convert(BSTR b)
{
CString s;
if(b == NULL)
return s; // empty for NULL BSTR
#ifdef UNICODE
s = b;
#else
LPSTR p = s.GetBuffer(SysStringLen(b) + 1);
::WideCharToMultiByte(CP_ACP, // ANSI Code Page
0, // no flags
b, // source widechar string
-1, // assume NUL-terminated
p, // target buffer
SysStringLen(b)+1, // target buffer length
NULL, // use system default char
NULL); // don't care if default used
s.ReleaseBuffer();
#endif
return s;
}
Note that I do not worry about what happens if the VARIANT to CStringActually, I've never done this; I don't work in COM/OLE/ActiveX where this is an issue. But I saw a posting by Robert Quirk on the A VARIANT vaData; vaData = m_com.YourMethodHere(); ASSERT(vaData.vt == VT_BSTR); CString strData(vaData.bstrVal); Note that you could also make a more generic conversion routine that looked at the CString VariantToString(VARIANT * va)
{
CString s;
switch(va->vt)
{ /* vt */
case VT_BSTR:
return CString(vaData->bstrVal);
case VT_BSTR | VT_BYREF:
return CString(*vaData->pbstrVal);
case VT_I4:
s.Format(_T("%d"), va->lVal);
return s;
case VT_I4 | VT_BYREF:
s.Format(_T("%d"), *va->plVal);
case VT_R8:
s.Format(_T("%f"), va->dblVal);
return s;
... remaining cases left as an Exercise For The Reader
default:
ASSERT(FALSE); // unknown VARIANT type (this ASSERT is optional)
return CString("");
} /* vt */
}
Loading STRINGTABLE valuesIf you want to create a program that is easily ported to other languages, you must not include native-language strings in your source code. (For these examples, I'll use English, since that is my native language (aber Ich kann ein bischen Deutsch sprechen). So it is very CString s = "There is an error"; Instead, you should put all your language-specific strings (except, perhaps, debug strings, which are never in a product deliverable). This means that is fine to write s.Format(_T("%d - %s"), code, text); in your program; that literal string is not language-sensitive. However, you must be very careful to not use strings like // fmt is "Error in %s file %s" // readorwrite is "reading" or "writing" s.Format(fmt, readorwrite, filename); I speak of this from experience. In my first internationalized application I made this error, and in spite of the fact that I know German, and that German word order places the verb at the end of a sentence, I had done this. Our German distributor complained bitterly that he had to come up with truly weird error messages in German to get the format codes to do the right thing. It is much better (and what I do now) to have two strings, one for reading and one for writing, and load the appropriate one, making them string parameter-insensitive, that is, instead of loading the strings "reading" or "writing", load the whole format: // fmt is "Error in reading file %s" // "Error in writing file %s" s.Format(fmt, filename); Note that if you have more than one substitution, you should make sure that if the word order of the substitutions does not matter, for example, subject-object, subject-verb, or verb-object, in English. For now, I won't talk about So how do we accomplish all this? By storing the string values in the resource known as the STRINGTABLE
IDS_READING_FILE "Reading file %s"
IDS_WRITING_FILE "Writing file %s"
END
Note: these resources are always stored as Unicode strings, no matter what your program is compiled as. They are even Unicode strings on Win9x platforms, which otherwise have no real grasp of Unicode (but they do for resources!). Then you go to where you had stored the strings // previous code CString fmt; if(...) fmt = "Reading file %s"; else fmt = "Writing file %s"; ... // much later CString s; s.Format(fmt, filename); and instead do // revised code CString fmt; if(...) fmt.LoadString(IDS_READING_FILE); else fmt.LoadString(DS_WRITING_FILE); ... // much later CString s; s.Format(fmt, filename); Now your code can be moved to any language. The There is a clever feature of the CString s; s.LoadString(IDS_WHATEVER); CString t( (LPCTSTR)IDS_WHATEVER); ASSERT(s == t); Now, you may say, how can this possibly work? How can it tell a valid pointer from a I tend to use the CString s; s.LoadString(IDS_WHATEVER); CString t( MAKEINTRESOURCE(IDS_WHATEVER)); ASSERT(s == t); Just to give you an idea: I practice what I preach here. You will rarely if ever find a literal string in my program, other than the occasional debug output messages, and, of course, any language-independent string. CStrings and temporary objectsHere's a little problem that came up on the I am trying to set a registry value using char* szName = GetName().GetBuffer(20); RegSetValueEx(hKey, "Name", 0, REG_SZ, (CONST BYTE *) szName, strlen (szName + 1)); The Dear Frustrated, You have been done in by a fairly subtle error, caused by trying to be a bit too clever. What happened was that you fell victim to knowing too much. The correct code is shown below: CString Name = GetName(); RegSetValueEx(hKey, _T("Name"), 0, REG_SZ, (CONST BYTE *) (LPCTSTR)Name, (Name.GetLength() + 1) * sizeof(TCHAR)); Here's why my code works and yours didn't. When your function GetName returned a CString, it returned a "temporary object". See the C++ Reference manual §12.2. In some circumstances it may be necessary or convenient for the compiler to generate a temporary object. Such introduction of temporaries is implementation dependent. When a compiler introduces a temporary object of a class that has a constructor it must ensure that a construct is called for the temporary object. Similarly, the destructor must be called for a temporary object of a class where a destructor is declared. The compiler must ensure that a temporary object is destroyed. The exact point of destruction is implementation dependent....This destruction must take place before exit from the scope in which the temporary is created. Most compilers implement the implicit destructor for a temporary at the next program sequencing point following its creation, that is, for all practical purposes, the next semicolon. Hence the CString existed when the GetBuffer call was made, but was destroyed following the semicolon. (As an aside, there was no reason to provide an argument to GetBuffer, and the code as written is incorrect since there is no ReleaseBuffer performed). So what GetBuffer returned was a pointer to storage for the text of the CString. When the destructor was called at the semicolon, the basic CString object was freed, along with the storage that had been allocated to it. The MFC debug storage allocator then rewrites this freed storage with 0xDD, which is the symbol "Ý". By the time you do the write to the Registry, the string contents have been destroyed. There is no particular reason to need to cast the result to a char * immediately. Storing it as a CString means that a copy of the result is made, so after the temporary CString is destroyed, the string still exists in the variable's CString. The casting at the time of the Registry call is sufficient to get the value of a string which already exists. In addition, my code is Unicode-ready. The Registry call wants a byte count. Note also that the call lstrlen(Name+1) returns a value that is too small by 2 for an ANSI string, since it doesn't start until the second character of the string. What you meant to write was lstrlen(Name) + 1 (OK, I admit it, I've made the same error!). However, in Unicode, where all characters are two bytes long, we need to cope with this. The Microsoft documentation is surprisingly silent on this point: is the value given for REG_SZ values a byte count or a character count? I'm assuming that their specification of "byte count" means exactly that, and you have to compensate. CString EfficiencyOne problem of CString is that it hides certain inefficiencies from you. On the other hand, it also means that it can implement certain efficiencies. You may be tempted to say of the following code CString s = SomeCString1; s += SomeCString2; s += SomeCString3; s += ","; s += SomeCString4; that it is horribly inefficient compared to, say char s[1024]; lstrcpy(s, SomeString1); lstrcat(s, SomeString2); lstrcat(s, SomeString 3); lstrcat(s, ","); lstrcat(s, SomeString4); After all, you might think, first it allocates a buffer to hold SomeCString1, then copies SomeCString1 to it, then detects it is doing a concatenate, allocates a new buffer large enough to hold the current string plus SomeCString2, copies the contents to the buffer and concatenates the SomeCString2 to it, then discards the first buffer and replaces the pointer with a pointer to the new buffer, then repeats this for each of the strings, being horribly inefficient with all those copies. The truth is, it probably never copies the source strings (the left side of the +=) for most cases. In VC++ 6.0, in Release mode, all CString buffers are allocated in predefined quanta. These are defined as 64, 128, 256, and 512 bytes. This means that unless the strings are very long, the creation of the concatenated string is an optimized version of a strcat operation (since it knows the location of the end of the string it doesn't have to search for it, as strcat would; it just does a memcpy to the correct place) plus a recomputation of the length of the string. So it is about as efficient as the clumsier pure-C code, and one whole lot easier to write. And maintain. And understand. Those of you who aren't sure this is what is really happening, look in the source code for CString, strcore.cpp, in the mfc\src subdirectory of your vc98 installation. Look for the method ConcatInPlace which is called from all the += operators. Aha! So CString isn't really "efficient!" For example, if I create CString cat("Mew!"); then I don't get a nice, tidy little buffer 5 bytes long (4 data bytes plus the terminal NUL). Instead the system wastes all that space by giving me 64 bytes and wasting 59 of them. If this is how you think, be prepared to reeducate yourself. Somewhere in your career somebody taught you that you always had to use as little space as possible, and this was a Good Thing. This is incorrect. It ignores some seriously important aspects of reality. If you are used to programming embedded applications with 16K EPROMs, you have a particular mindset for doing such allocation. For that application domain, this is healthy. But for writing Windows applications on 500MHz, 256MB machines, it actually works against you, and creates programs that perform far worse than what you would think of as "less efficient" code. For example, size of strings is thought to be a first-order effect. It is Good to make this small, and Bad to make it large. Nonsense. The effect of precise allocation is that after a few hours of the program running, the heap is cluttered up with little tiny pieces of storage which are useless for anything, but they increase the storage footprint of your application, increase paging traffic, can actually slow down the storage allocator to unacceptable performance levels, and eventually allow your application to grow to consume all of available memory. Storage fragmentation, a second-order or third-order effect, actually dominates system performance. Eventually, it compromises reliability, which is completely unacceptable. Note that in Debug mode compilations, the allocation is always exact. This helps shake out bugs. Assume your application is going to run for months at a time. For example, I bring up VC++, Word, PowerPoint, FrontPage, Outlook Express, Forté Agent, Internet Explorer, and a few other applications, and essentially never close them. I've edited using PowerPoint for days on end (on the other hand, if you've had the misfortune to have to use something like Adobe FrameMaker, you begin to appreciate reliability; I've rarely been able to use this application without it crashing four to six times a day! And always because it has run out of space, usually by filling up my entire massive swap space!) Precise allocation is one of the misfeatures that will compromise reliability and lead to application crashes. By making CStrings be multiples of some quantum, the memory allocator will end up cluttered with chunks of memory which are almost always immediately reusable for another CString, so the fragmentation is minimized, allocator performance is enhanced, application footprint remains almost as small as possible, and you can run for weeks or months without problem. Aside: Many years ago, at CMU, we were writing an interactive system. Some studies of the storage allocator showed that it had a tendency to fragment memory badly. Jim Mitchell, now at Sun Microsystems, created a storage allocator that maintained running statistics about allocation size, such as the mean and standard deviation of all allocations. If a chunk of storage would be split into a size that was smaller than the mean minus one s than the prevailing allocation, he didn't split it at all, thus avoiding cluttering up the allocator with pieces too small to be usable. He actually used floating point inside an allocator! His observation was that the long-term saving in instructions by not having to ignore unusable small storage chunks far and away exceeded the additional cost of doing a few floating point operations on an allocation operation. He was right. Never, ever think about "optimization" in terms of small-and-fast analyzed on a per-line-of-code basis. Optimization should mean small-and-fast analyzed at the complete application level (if you like New Age buzzwords, think of this as the holistic approach to program optimization, a whole lot better than the per-line basis we teach new programmers). At the complete application level, minimum-chunk string allocation is about the worst method you could possibly use. If you think optimization is something you do at the code-line level, think again. Optimization at this level rarely matters. Read my essay on Optimization: Your Worst Enemy for some thought-provoking ideas on this topic. Note that the += operator is special-cased; if you were to write: CString s = SomeCString1 + SomeCString2 + SomeCString3 + "," + SomeCString4; then each application of the + operator causes a new string to be created and a copy to be done (although it is an optimized version, since the length of the string is known and the inefficiencies of strcat do not come into play). SummaryThese are just some of the techniques for using CString. I use these every day in my programming. CString is not a terribly difficult class to deal with, but generally the MFC materials do not make all of this apparent, leaving you to figure it out on your own. AcknowledgementsSpecial thanks to Lynn Wallace for pointing out a syntax error in one of the examples, Brian Ross for his comments on BSTR conversions, and Robert Quirk for his example of VARIANT-to-BSTR conversion. The views expressed in these essays are those of the author, and in no way represent, nor are they endorsed by, Microsoft. Send mail to newcomer@flounder.com with questions or comments about this web site. Copyright © 1999 CompanyLongName All Rights Reserved. www.flounder.com/mvp_tips.htm
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||