Click here to Skip to main content
Licence 
First Posted 26 Dec 2004
Views 86,554
Bookmarked 15 times

Optimizing String-Manipulation Functions

By | 26 Dec 2004 | Article
Optimizing string-manipulation functions.

Abstract

VC++, GCC and other compilers for x86 use inefficient implementation of strncmp(). If this function call is part of an inner loop, your application may slow down dramatically. A set of preprocessor macros replaces this lame function, thus making your code smaller and faster.

What's the Problem?

Text search applications and interpreters typically have strncmp() in their inner loops. This sample code interprets "PRINT" and "GOTO" commands (of course, it's just a primitive example):

char buf[] = "PRINT xyz";
if(0 == strncmp(buf, "PRINT", 5)) {
    DoPrint(buf+5);
}
else if(0 == strncmp(buf, "GOTO", 4)) {
    DoGoto(buf+4);
}

Let's see how VC++ 6.0 implements this function (use your favorite disassembler to view the code):

strncmp proc ; int strncmp(const char *s1, const char *s2, size_t len)
    ; function prolog
    push ebp
    mov ebp, esp
    push edi
    push esi
    push ebx
    ; if (0 == len) goto exit
    mov ecx, dword ptr [ebp+10]
    jcxz exit

    ; search for terminating zero in s1
    mov ebx, ecx
    mov edi, dword ptr [ebp+08]
    mov esi, edi
    xor eax, eax
    repnz scasb
    ;ecx = the smaller of two numbers: len and strlen(s1)
    neg ecx
    add ecx, ebx

    ;compare ecx bytes from s1 and s2
    mov edi, esi
    mov esi, dword ptr [ebp+0C]
    repz cmpsb
    ;compare the first different character in both strings
    mov al, byte ptr [esi-01]
    xor ecx, ecx
    cmp al, byte ptr [edi-01]

    ja smaller_s1 ; s1 is less than s2, should return -1
    je exit ; s1 == s2, should return zero
    ; ecx = -2
    dec ecx
    dec ecx
smaller_s1:
    ; if s1 was less than s2, return ~0 = -1
    ; if s1 was greater than s2, return ~(-2) = 1
    not ecx

exit:
    ; epilog: return the value
    mov eax, ecx
    pop ebx
    pop esi
    pop edi
    leave
    ret
end proc


; Using strncmp
; if(0 == strncmp(buf, "PRINT", 5)) {
    push 5
    lea    ecx, DWORD PTR [esp+4] ; buf
    push OFFSET PRINT
    push ecx
    call _strncmp
    add    esp, 12
    test eax, eax
    jne    SHORT skip
; DoPrint(buf+5)
    ;some printing stuff here
skip:

Umm, tons of machine code from a simple line of your C program... Is there any easier way to compare strings? Sure, there is:

     cmp DWORD PTR [buf], 'NIRP'
     jnz skip
     cmp BYTE PTR [buf+4], 'T'
     jnz skip
; DoPrint
     ;printing
skip:

These four instructions do the same comparison as the long code above, but they work much faster and take up less space. We start by comparing the first four characters of the string with 'PRIN'. 32-bit processors such as Pentium or Athlon can read and compare 4 bytes very quickly. Just put one cmp instruction to do this.

But why 'NIRP' instead of 'PRIN'? Why are the character reversed? Remember, x86 is a little-endian processor: it reverses byte order when reading DWORDs from memory. Thus we need to reverse our string too. The processor will read reversed DWORD, compare it with reversed 'PRIN', and say if they are equal. If so, it will then compare fifth character with 'T'. If all conditions are met, DoPrint will be executed. In other cases, the processor will jump to the skip label.

You don't need to use assembler for this trick. The same can be done with C/C++:

if(('P' | ('R' << 8) | ('I' << 16) | ('N' << 24)) == *(long*)buf &&
    'T' == buf[4]) {
    DoPrint(buf+5);
}

Exactly the same nice and fast machine code will be generated. But such tricky code is hard to read and maintain. Let's make our work easier and write some macros.

Macros and Their Usage

#ifdef UNICODE
#define cmp2ch(s,a,b) (*(long*)(s) == \
    ((unsigned short)(a) | (unsigned short)(b) << 16))
#define cmp4ch(s,a,b,c,d) (*(long*)(s) == \
    ((unsigned short)(a) | (unsigned short)(b) << 16) && \
    *(long*)(s+2) == \
    ((unsigned short)(c) | (unsigned short)(d) << 16))
#define set2ch(s,a,b) (*(long*)(s) = \
    ((unsigned short)(a) | (unsigned short)(b) << 16));
#define set4ch(s,a,b,c,d) (*(long*)(s) = \
    ((unsigned short)(a) | (unsigned short)(b) << 16), \
    *(long*)(s+2) = \
    ((unsigned short)(c) | (unsigned short)(d) << 16));
#else
#define cmp2ch(s,a,b) (*(short*)(s) == \
    ((unsigned char)(a) | (unsigned char)(b) << 8))
#define cmp4ch(s,a,b,c,d) (*(long*)(s) == \
     ((unsigned char)(a) | (unsigned char)(b) << 8 |   \
     (unsigned char)(c) << 16 | (unsigned char)(d) << 24))
#define set2ch(s,a,b) (*(short*)(s) = \
    ((unsigned char)(a) | (unsigned char)(b) << 8));
#define set4ch(s,a,b,c,d) (*(long*)(s) = \
     ((unsigned char)(a) | (unsigned char)(b) << 8 |   \
     (unsigned char)(c) << 16 | (unsigned char)(d) << 24));
#endif

cmp2ch and cmp4ch macros compare 2 and 4 characters of a string. The following example shows how to use them:

if(cmp4ch(buf, 'P','R','I','N') && buf[4] == 'T')
    DoPrint(buf+5);

Another example:

if(cmp4ch(buf, 'Y','E','S','\0'))
    DoYes();

If the string is equal to "YES", DoYes() function is invoked.

Although macros seem to be quite complicated, they are compiled to very simple machine code. Note that all shifts and ORs are evaluated during compilation, so the resulting code consists of just a fast comparison (CMP and JNZ instructions).

Finally, let's talk about set2ch and set4ch. As their names say, these macros do fast assignment of 2 or 4 characters:

set4ch(buf, 'P','R','I','N'); // Equivalent of strcpy(buf, "PRINT"),
set2ch(buf+4, 'T','\0');      // but much shorter and faster

The same technique can be applied to other functions, e.g., to replace strcat():

TCHAR filename[MAX_PATH];
// Get path from control
int len = SendMessage(hwnd, WM_GETTEXT, 0, (LPARAM)filename);
if(*(p + len) != '\\')
   *(p + len) = '\\', len++;
// Append file name
set4ch(p + len, 'f', 'i', 'l', 'e');
set4ch(p + len + 4, '.', 't', 'x', 't');
*(p + len + 8) = '\0';
// Open the file
HANDLE hFile = CreateFile(filename, GENERIC_READ, FILE_SHARE_READ, ...);

Portability and Localization

These macros were tested on various platforms and compilers including MS VC++ 6.0, LCC 3.8, MinGW and GCC 3.2 under Linux. The tests compiled and run successfully, producing perfect and fast code. I have been using cmpXch macros in my freeware applications for one year and had no problems with them.

When compiled with #define UNICODE, the macros perform comparison of wide characters. So there should be no difficulties in maintaining Unicode and ANSI builds of your application. Macros also can be easily ported to Win64. (If anybody can test them on 64-bit processor, please post a message below.)

Localization seems to be much harder than porting. MS VC++ can properly compile this line with Russian characters in Unicode:

set4ch(str, TEXT('&#1064;'), TEXT('&#1080;'), TEXT('&#1083;'), TEXT('&#1086;'));

But other compilers generate a wrong code! LCC 3.8 can't properly convert a literal ASCII character to Unicode (not to mention it was unable to optimize shifts). MinGW also failed to select right Russian character when UNICODE was defined.

However, macros work fine with English language (in both Unicode and ASCII modes) and with any language using 8-bit ASCII. If you are building BASIC interpreter or parsing HTML tags, this should be enough for you. When you need Unicode support for non-English characters, you have to use MS VC++ compiler.

Case-insensitive search

How to perform case-insensitive search using this technique? One of the possible solutions is to compare all possible combinations of 2 first characters (e.g., PR, pr, Pr, pR for "PRINT"). If this comparison returns true, let's use strcmpi, lstrcmpi or CharUpper to check out the rest of our string:

if(cmp2ch(buf, 'P', 'R') || cmp2ch(buf, 'P', 'r') ||
   cmp2ch(buf, 'p', 'R') || cmp2ch(buf, 'p', 'r')) {
        char tmp[5];
        *(long*)tmp = *(long*)buf; // Fast copy 4 chars from buf to tmp
        tmp[4] = buf[4];           // or: memcpy(tmp, buf, 5 * sizeof(TCHAR));
        CharUpperBuff(tmp, 5); // Convert 5 chars to uppercase
        if(cmp4ch(tmp, 'P', 'R', 'I', 'N') && tmp[4] == 'T')
            DoPrint(buf+5);    // Compare converted characters
}

Another variation (ugly, but faster):

if((cmp2ch(buf, 'P', 'R') || cmp2ch(buf, 'P', 'r') ||
    cmp2ch(buf, 'p', 'R') || cmp2ch(buf, 'p', 'r')) &&
   (cmp2ch(buf+2, 'I', 'N') || cmp2ch(buf+2, 'I', 'n') ||
    cmp2ch(buf+2, 'i', 'N') || cmp2ch(buf+2, 'i', 'n'))
    && (buf[4] == 'T' || buf[4] == 't'))
        DoPrint(buf+5);

If you need to compare only English letters, you can use a very fast trick. Note that each capital letter has the fifth bit cleared, and corresponding small letters have the same bit set:

0100 0001 — A 0100 0010 — B 0100 0011 — C 0100 0100 — D
0110 0001 — a 0110 0010 — b 0110 0011 — c 0110 0100 — d

Let's clear this bit by ANDing letters with 1101 1111 = 0xDF. Thus we get only capital letters and can compare them with a constant:

if(('P' | ('R' << 8) | ('I' << 16) | ('N' << 24)) == (*(long*)buf & 0xDFDFDFDF)
    && (buf[4] & 0xDF) == 'T') {
        DoPrint(buf+5);
    }

Although this code is extremely fast and small, it works only for English letters. Adding digits or brackets will introduce bugs. For example, "{" will be converted to "[" and this code won't operate correctly:

if(('[' | ('A' << 8)) == (*(short*)buf & 0xDFDF)) {
        ReportError(); // Will call ReportError() for "[a", "{a", "[A", "{A"
    }

This can be rewritten as follows:

if(('[' | ('A' << 8)) == (*(short*)buf & 0xDFFF)) {
        ReportError(buf+5); // Will call ReportError() for both "[a" and "[A"
    }

This code converts the second letter by ANDing it with DF, but leave the first letter unchanged.

Buffer Length Warning

You should always check that string comparison doesn't exceed available buffer length. Avoid such kind of code:

char buf[256], *p = buf;
GetBuf(&buf, 256); // Read 256 characters from anywhere
do {
    if(cmp2ch(p, '\r', '\n'))
        DoSomething(p);
} while(*p++);

Remember, you are comparing the current character and the next character by using cmp2ch. If there are only 256 characters in string, the last iteration will try to compare 256 and 257 characters with "\r\n". So, you should add one extra character when allocating the buffer:

char buf[257], *p = buf;
GetBuf(&buf, 256); // Read 256 characters from anywhere
...

Or, if you know the string length, make only (length - 1) iterations:

char *buf = (char*)malloc(len), *p = buf;
GetBuf(&buf, len); // Read len characters from anywhere
len--; // Don't compare the last character
while(len > 0) {
    if(cmp2ch(p, '\r', '\n'))
        DoSomething(p);
    len--, p++;
}
free(buf);

Future

Of course, these macros can't be considered an elegant way to compare strings. But C standard library and CString-like classes are even worse. strcmp() and other string manipulation routines proved to be very large and non-effective, and they can significantly slow down your inner loops.

The problem resides in C language itself. There is no built-in string type, so the compiler can't optimize string-manipulation code. I hope some compiler writer will read this article and will think of a language with fast string processing. It would be great if any compiler could apply these tricks automatically.

History of changes

12-24-2004:

  • The first version.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Peter Kankowski

Software Developer

Russian Federation Russian Federation

Member

Peter lives in Siberia, the land of sleeping sun, beautiful mountains, and infinitely deep snow. He recently started a wiki about algorithms and code optimization, where people could share their ideas, learn, and teach others.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
GeneralIn many times there are much simpler solutions PinmemberMichaelYork4:45 8 Feb '10  
GeneralRe: In many times there are much simpler solutions PinmemberPeter Kankowski15:13 8 Feb '10  
Generalok, the strncmp is slow... Pinmembernecropower1238:02 25 Aug '07  
GeneralRe: ok, the strncmp is slow... PinmemberPeter Kankowski12:55 27 Aug '07  
GeneralAlignment penalty Pinmemberanton_petrov7:27 31 Oct '06  
GeneralNice Optimzation Pinmemberbasementman4:10 5 Jan '05  
GeneralRe: Nice Optimzation PinmemberGernot Frisch4:36 5 Jan '05  
Generalforget it PinmemberGernot Frisch5:23 5 Jan '05  
GeneralRe: forget it PinmemberPeter Kankowski18:21 5 Jan '05  
GeneralRe: forget it PinmemberGernot Frisch10:26 6 Jan '05  
GeneralRe: forget it PinmemberGernot Frisch22:23 6 Jan '05  
GeneralRe: Nice Optimzation PinmemberPeter Kankowski18:23 5 Jan '05  
GeneralRe: Nice Optimzation PinmemberPeter Kankowski18:20 5 Jan '05  
GeneralMaking it useable PinmemberGernot Frisch4:02 5 Jan '05  
GeneralRe: Making it useable Pinmembernecropower12317:03 24 Aug '07  
GeneralRe: Making it useable PinmemberGernot Frisch21:25 24 Aug '07  
GeneralRe: Making it useable Pinmembernecropower1236:34 25 Aug '07  
GeneralCrazy idea PinmemberJörgen Sigvardsson8:28 28 Dec '04  
GeneralRe: Crazy idea PinmemberPeter Kankowski15:00 29 Dec '04  
GeneralInteresting PinmemberJohn R. Shaw7:54 27 Dec '04  
GeneralRe: Interesting PinmemberPeter Kankowski15:29 27 Dec '04  
Question? PinmemberGoran Mitrovic20:37 26 Dec '04  
AnswerWell... Pinmemberpeterchen22:00 26 Dec '04  
GeneralRe: Well... PinmemberGoran Mitrovic22:23 26 Dec '04  
GeneralRe: Well... PinmemberPeter Kankowski0:02 27 Dec '04  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web04 | 2.5.120528.1 | Last Updated 27 Dec 2004
Article Copyright 2004 by Peter Kankowski
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid