Click here to Skip to main content
15,887,214 members
Articles / Programming Languages / C++

Automatically add _T macro to quoted strings and other multibyte-Unicode converstions

Rate me:
Please Sign up or sign in to vote.
4.44/5 (10 votes)
18 Oct 2009CPOL3 min read 35.1K   27   7
Add the _T macro to quoted strings when porting to a Unicode configuration in VC++.

Introduction

This article will show how to port a Visual Studio C++ project from a multi-byte configuration into Unicode, with a special emphasis on:

  1. Automatically adding the _T("") macro to quoted strings.
  2. Porting from std::string, std::ostringstream, and std::ofstream to Unicode compatible versions.
  3. Storing Unicode values in std::string.

Background

Once upon a time, I started a project in Visual Studio 6, C++, and the first line I ever wrote was:

C++
AfxMessageBox("Hello World!");

When I hit "Build", it won't compile until I changed the project's configuration to Multibyte instead of the default Unicode.

I knew from that moment on that I better put the _T macro ahead of every string, but after some time, I stopped doing it.

An year later, when the project became a 200,000 lines code monster, I was asked to translate the program to other languages, like Russian and Chinese. After changing the project configuration back to Unicode, it had thousands of errors, mostly because of quoted text not having added with the _T macro.

This article will show how to automatically add the _T("") macro to quoted strings, using Visual Studio's macro explorer.

Automatically adding the _T("") macro to quoted strings

Using the code

  1. Open your project in Visual Studio.
  2. In the top main menu, go to "Tools->Macros->Macro Explorer". The Macro Explorer panel should appear at the right part of the screen.
  3. Right-click on "MyMacros" and choose "New module".
  4. Type in, exactly, the following name: "AutoT".
  5. Right-click on the newly created module and choose "Edit".
  6. Paste in the following text, overwriting the existing few lines in the automatically generated code.
  7. Save and close the macro.

Note that although the following code is written in VBScript, it is intended for C++ programs:

VBScript
Imports System
Imports EnvDTE
Imports EnvDTE80
Imports System.Diagnostics

Public Module AutoT

    Sub ReplaceXWithY(ByVal X As String, ByVal Y As String, _
                      Optional ByVal MatchCase As Boolean = False, _
                      Optional ByVal PatternSyntax As _
                        EnvDTE.vsFindPatternSyntax = _
                        vsFindPatternSyntax.vsFindPatternSyntaxLiteral)

        DTE.Find.Action = vsFindAction.vsFindActionReplace
        DTE.Find.FindWhat = X
        DTE.Find.ReplaceWith = Y
        DTE.Find.Target = vsFindTarget.vsFindTargetOpenDocuments
        DTE.Find.MatchCase = MatchCase
        DTE.Find.MatchWholeWord = False
        DTE.Find.Backwards = False
        DTE.Find.MatchInHiddenText = False
        DTE.Find.PatternSyntax = PatternSyntax

        If (DTE.Find.Execute() = vsFindResult.vsFindResultNotFound) Then
            Throw New System.Exception("vsFindResultNotFound")
        End If

    End Sub

    Sub QuotedTextTo_T()
        ReplaceXWithY("{:q}", "_T(\1)", True, _
                      vsFindPatternSyntax.vsFindPatternSyntaxRegExpr)
    End Sub

End Module

When you go back to your project, you will see a macro named "AutoT" in the Macro Explorer panel on the right.

Every time you double click on that macro, it will mark the next quoted text in the currently opened C++ file. Another click on the macro will wrap the text with the _T macro:

C++
AfxMessageBox("Hello World!");

which will be changed to:

C++
AfxMessageBox(_T("Hello World!"));

and will compile both in Multibyte and Unicode configurations!

It is recommended to add a keyboard shortcut to the macro:

  1. In the top main menu of Visual Studio, go to "Tools->Options"
  2. Click on the + at the left of "Environment"
  3. Click "Keyboard"
  4. At the right pane, type "AutoT" under the "Show commands containing:" edit box, to find our new macro
  5. Click on the macro, and assign a shortcut key to it (I chose Ctrl-Alt-Num0)
  6. Click OK

Now every time you press that keyboard combination, the macro will be executed.

Note: Don't be tempted to blindly let the script do all the job. Human verification is needed. The script will try to add _T macro to lines like:

C++
#include "StdAfx.h"

In order to make the script skip such a line, simply press the "Right" arrow on your keyboard.

The script is also not smart enough to recognize in-text quotes like:

C++
AfxMessageBox(_T("Hello \"World!\" "));

It will also fail to skip quoted text already wrapped with _T macro, but the good thing is it will never skip a quoted text :)

Note that you will also have to rename all occurrences like strcmp to their TCHAR.H routines like _tcscmp.

Porting from std::string, std::ostringstream, std::ofstream to Unicode compatible versions

If in your multibyte project you've widely used std::string, std::ostringstream, or std::ofstream, those will work badly in a Unicode compilation.

The easiest way is to define the following, and rename all occurrences in your program from std::string to tstring for example.

C++
#include <string>

typedef std::basic_string<TCHAR> tstring;
typedef std::basic_ostringstream<TCHAR> tostringstream;
typedef std::basic_ofstream<TCHAR> tofstream;

Also, replace all char in your code to TCHAR.

Storing Unicode values in std::string

To store Unicode values, std::wstring can be used, but when you must store a Unicode value in a standard std::string or char array, you can store it in UTF-8 format, which is also used in TinyXML, among others.

The following helper functions may help you with the conversions:

C++
std::string CStringToString(const CString& cs)
{
    // Convert a TCHAR string to a LPCSTR
    CT2CA pszConvertedAnsiString (cs);

    // construct a std::string using the LPCSTR input
    //std::string strStd (pszConvertedAnsiString);
    return pszConvertedAnsiString; 
}

tstring CStringTo_tstring(const CString& cs)
{
    std::basic_string <TCHAR> bsConverted (cs);
    return bsConverted;
}

std::string tstringTo_stdString(const tstring& ts)
{
    return CStringToString(ts.c_str()).c_str();
}

tstring UTF8charTo_tstring( const char* strIn )
{
    wchar_t buffer[2048];//!!H
    MultiByteToWideChar(CP_UTF8, 0, strIn, -1, buffer, 2048 );//!!H
    tstring ts1 = CString(buffer);
    return ts1;
}

std::string tstringToUTF8string( tstring tsIn )
{
    char buffer[2048];//!!H
    WideCharToMultiByte( CP_UTF8, 0, tsIn.c_str() , -1, buffer, 2048, NULL, NULL );
    std::string s1 = buffer;
    return s1;
}

bool HasUnicodeChars( tstring tsIn )
{
    std::string sNarrow = tstringTo_stdString(tsIn);
    tstring tsFromNarrow = CString(sNarrow.c_str());

    if ( tsFromNarrow != tsIn )
        return true;
    else
        return false;
}

To convert tstring to char*:

C++
CStringToString(sName.c_str()).c_str()

To convert std::string to tstring:

C++
tstring ts30 = CString(stdS1.c_str());

Note: UTF-8 strings are like char strings, but Unicode letters might take two chars, making the string longer:

C++
tstring ts1;
ts1 = _T("Some foreign language text");
int nLen = ts1.length();
int nSize = ts1.size();
    
//Convert wide string to std string
std::string s2 = tstringToUTF8string(ts1);
int nLen2 = s2.length();
int nSize2 = s2.size();
    
//Convert std string back to wide string
tstring ts5 = UTF8charTo_tstring(s2.c_str());

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Israel Israel
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
Generalthank you!! Pin
bljacobs21-Jan-11 7:36
bljacobs21-Jan-11 7:36 
Generalthanks!!! Pin
Maximilien30-Nov-10 9:29
Maximilien30-Nov-10 9:29 
GeneralSimilar idea ... but written in PERL Pin
eFotografo2-Nov-09 4:18
professionaleFotografo2-Nov-09 4:18 
A couple of years ago I had a similar requirement, but instead of trying to work within the VS IDE I decided to use PERL. This gave me greater flexibility (particularly with the use of regular expressions) and the possibility to work on an entire folder of source files. The script isn't perfect, but it handles (most) comments, strings that have already been wrapped in _T(), #include statements and URLs (avoiding confusing http:// with a single line comment). Rather than overwrite the input files, all output is written to a subfolder ("wrapped") in this case, in order to allow before/after comparisons on the few cases that are not correctly handled.

Hope this might be of use to someone Smile | :)

John

PS The colour coding of the script appears a little strange due to the existence of non-matching quotes (single and double) on several lines.

#!perl
#
# wrapstrings.pl
# Summary:  Script for automatically wrapping all C++ source string and character literals ("..." and '.')
#           with the Microsoft/C++ Generic-text _T() macro. This allows automatic conversion of inline
#           texts to Unicode (with the L"" macro) if/when necessary.
#
# Author: John Cullen
# History:
#   2007/01/04  -   Initial version.
#   2007/01/05  -   Updated to take into account (and ignore) strings inside comments.

# ensure the temporary folder exists;
mkdir("wrapped");

foreach $infile (@ARGV)
{
    close(IN); close(OUT);
    
    $outfile = "wrapped/$infile";
    open(IN, $infile) || do {warn "Failed to open $infile for reading: $!"; next;};
    open(OUT, ">" . $outfile) || do {warn "Failed to open $outfile for writing: $!"; next;};

    while (<IN>)
    {
        # skip any line that doesn't include a quoted string
        do { print OUT; next; } unless /"|'/;
    
        # skip include statements
        do { print OUT; next; } if /#\s*include\s*"/;
    
        # rudimentary handler for simple comments (multiple inline /* */ comments are not handled)
        # comments may begin with // or /* but we need to skip URLs such as
        # http://, file:/// and ftp:// etc. and strings embedded within
        # multiline comments.
        #
        # comments are temporarily removed, then added back after replacing strings.
        
        if (
            (m%(?<!:)   #     3. negative look-behind for ":" (URL)
            (?<!/)      #   2. negative look-behind for ":/" e.g. file:///
            (//)        # 1. check for comment marker, but not preceeded by : or :/
            %x)
            ||
            (m%(/\*)%)  # /* comment (opening)
            ||
            (m%(\*/)%)) # */ comment (closing)
        {
            # remove the newline (or whatever is defined as newline in $/)
            chomp;
            $ctype = $1;
            if ($ctype eq "*/")
            {
                $commentClose = 1;
                ($comment = $_) =~ s/^(.*?)\Q$ctype\E.*$/$1/;
                s/^.*?\Q$ctype\E(.*)$/$1/;
            }
            else
            {
                $commentClose = 0;
                ($comment = $_) =~ s/^.*?\Q$ctype\E(.*)$/$1/;
                s/^(.*?)\Q$ctype\E.*$/$1/;
            }
        }

        # if we get to here, there must be a string in here somewhere!      
        # first remove any previously wrapped strings; this avoids the problem
        # of identifying lines with *partially* wrapped strings
        # e.g. ..., "abc", _T("def"), 'ghi') etc.

        s!_T\(\s*           # match a previously wrapped string _T(...
            (               # open group 1 (capture either string)
                ('          # open group 2 (capture a '...' string)
                    (\\.|[^'\\])*   # open/close group 3 (string content)
                 ')         # close group 2
                |           # OR
                ("          # open group 4 (capture a "..." string)
                    (\\.|[^"\\])*   # open/close group 5 (string content)
                 ")         # close group 4
            )               # close group 1
        \)                  # trailing ...)
        !$1!xg;             # replace the _T(...) with ...
        
        #
        # now perform the actually wrapping of quoted strings in _T()
        #
        s!
            (               # open group 1 (capture whichever string group is matched)
                ('          # open group 2 (match '...')
                    (\\.|[^'\\])*   # group 3 (the string content)
                 ')         # close group 2
                |
                ("          # open group 4 (match "...")
                    (\\.|[^"\\])*   # group 5 (the string content)
                 ")         # close group 4
            )               # close group 1
        !_T($1)!xg;         # wrap the match in _T()


        # no comments, just output the transformed input
        if ($ctype eq "")
        {
            print OUT $_;
        }
        else
        {
            if ($commentClose == 1)
            {
                printf OUT "%s%s%s%s", $comment, $ctype, $_, $/;
            }
            else
            {
                printf OUT "%s%s%s%s", $_, $ctype, $comment, $/;
            }
            $commentClose = 0;
            $comment = "";
            $ctype = "";
        }
    }
}



QuestionRe: Similar idea ... but written in PERL Pin
MarkDoubson20-Nov-10 13:03
MarkDoubson20-Nov-10 13:03 
GeneralRe: Similar idea ... but written in PERL Pin
eFotografo22-Nov-10 3:22
professionaleFotografo22-Nov-10 3:22 
GeneralThanks Pin
chenyu220286323-Oct-09 18:39
chenyu220286323-Oct-09 18:39 
GeneralNearly the same Pin
AngelMischa198119-Oct-09 19:08
AngelMischa198119-Oct-09 19:08 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.