Identically Named C++ Variable in Inner Block Hides Like Named Variable in Outer Block






2.50/5 (4 votes)
Variable scope in C++ can be devastatingly subtle, as in this function.
Introduction
Several weeks ago, when I was writing and testing a function for use in a library, I inadvertently gave a variable inside a block the same name as a variable defined in the containing block. Initially stunned, I soon realized my error, and understood why the values I saw in the watch window changed suddenly when the execution path entered the inner scope block.
Background
Since I have written more code in C than C++ in the last few months, I had grown accustomed to the compiler emitting a fatal error when a name is reused in the same function.
- C functions have a single namespace. All local variables must be defined at the top of a function, and they have function scope. Defining a variable below the first executable statement that isn't an initializer is a fatal syntax error.
- C++ functions have a namespace that includes all variables defined at that level. If the function contains blocks (Anything enclosed in braces is a block.), each block acts like a subsidiary namespace that inherits the names defined in each containing block, up to the block that delimits the function body.
Nowever, if the outer and inner blocks both define a variable named Foo
, the variable Foo
defined in the inner block hides the outer Foo
until execution passes the closing brace.
Demonstration of Problem and Its Solution
The first statement in the body of function FB_ReplaceW
defines lngFoundPos
as a long integer, and initializes it to UNICODE_STRING_MAX_CHARS
(32767). Things get really interesting as execution enters the do/while
block, in which the first statement defines another long integer, which it also names lngFoundPos, and assigns the value returned by function StrIndex_P6C
to it. If lngFoundPos is nonzero, a new value for lngTCharsToCopy
is computed, memcpy
is invoked to copy a substring, and lngTCharsToCopy
is added to lngOutPos
, the third variable defined with function scope.
When execution reaches the end of the do/while
block, I had a brain teaser on my hands. Which lngTCharsToCopy
does the while
clause evaluate? No fair peeking at the answer below.
LPTSTR __stdcall FB_ReplaceW
(
LPCTSTR plpStrData ,
LPCTSTR plpToFind ,
LPCTSTR plpToReplace ,
PUINT puintNewLength
)
{
#define UNICODE_STRING_MAX_CHARS 32767
#define BUFFER_BEGINNING_P6C 0
#define STRLEN_EMPTY_P6C 0
#define NONE_P6C 0
#define STRPOS_FOUND_P6C 1
#define TRAILING_NULL_ALLOWANCE_P6C 1
long lngFoundPos = UNICODE_STRING_MAX_CHARS ;
long lngInPos = BUFFER_BEGINNING_P6C ;
long lngOutPos = BUFFER_BEGINNING_P6C ;
long lngLenToRepl = StringIsNullOrEmptyWW ( plpToReplace )
? STRLEN_EMPTY_P6C
: _tcslen ( plpToReplace ) ;
long lngInStrLen = _tcslen ( plpStrData ) ;
long lngLenToFind = _tcslen ( plpToFind ) ;
long lngTCharsToCopy = STRLEN_EMPTY_P6C ;
do // while ( lngFoundPos > STRLEN_EMPTY_P6C )
{ // The loop constraint must test explicitly for nonzero.
long lngFoundPos = StrIndex_P6C ( ( plpStrData + ( LONG_PTR ) lngInPos ) ,
plpToFind ) ;
// --------------------------------------------------------------------
// There are two distinct conditions to evaluate.
//
// 1) Was substring plpToFind found?
// 2) If so, are there intervening characters to copy?
//
// if ( lngFoundPos ) covers the first test, and the second, which is
// skipped unless the first condition is true, is evaluated by the next
// statement, if ( lngFoundPos > STRPOS_FOUND_P6C ).
//
// Since positions are ordinals, StrIndex_P6C returns the position of a
// substring as a human would see it, rather than as an offset. Hence,
// a returned value of +1 means that the next match was immediately
// found. This also means that the position where the match was found
// must be deducted to determine the number of intervening characters.
// --------------------------------------------------------------------
if ( lngFoundPos )
{ // Substring found.
lngTCharsToCopy = lngFoundPos - 1 ;
if ( lngTCharsToCopy )
{
// ------------------------------------------------------------
// For long strings, memcpy is significantly more efficient
// than any string copy function, because it copies the word
// aligned portion of the string a machine word at a time. All
// string copying routines are limited to copying the text one
// character at a time. Hence, memcpy can be 2 to 4 times
// faster.
//
// Computing offsets is a bit counterintuitive. Its signature
// indicates that source and destination are void pointers.
//
// void * __cdecl memcpy(void *, const void *, size_t);
//
// Nevertheless, if your actual arguments include address math,
// memcpy insists on them being cast to a type with a known
// size (e. g., LPTSTR). The reason for this becomes clear when
// you view the assembly code generated by the call to memcpy,
// and it affeccts the way the offset formulas must be written.
//
// The generated machine code takes into account the size of
// the specified type. For example, the size of a LPTSTR is the
// width of a TCHAR in the character set of the current trans-
// lation unit. While this simplifies coding the offset, it is
// a trap for the unwary, because the offset is multiplied by
// sizeof (cast) (e. g, sizeof (TCHAR) under the hood, as
// demonstrated in the assembly code emitted to implement the
// following call to memcpy.
//
// mov eax, DWORD PTR _lngTCharsToCopy$[ebp]
// shl eax, 1
// push eax
// mov ecx, DWORD PTR _lngInPos$[ebp]
// mov edx, DWORD PTR _plpStrData$[ebp]
// lea eax, DWORD PTR [edx+ecx*2]
// push eax
// mov ecx, DWORD PTR _lngOutPos$[ebp]
// mov edx, DWORD PTR _rlpScratchBuff$[ebp]
// lea eax, DWORD PTR [edx+ecx*2]
// push eax
// call _memcpy
//
// Two instructions illustrate my point.
//
// lea eax, DWORD PTR [edx+ecx*2]
// lea eax, DWORD PTR [edx+ecx*2]
//
// In both cases, register ECX contains the offset (lngInPos
// and lngOutPos, respectively).
//
// In contrast, the count of bytes to copy (the third argument)
// is always taken at face value, as shown below.
//
// mov eax, DWORD PTR _lngTCharsToCopy$[ebp]
//
// Hence, to copy a given number of characters from a string,
// the character count must be explicitly multiplied by the
// width of a TCHAR, (preprocessor variable TCHAR_SIZE_P6C) as
// shown below.
//
// mov eax, DWORD PTR _lngTCharsToCopy$[ebp]
// shl eax, 1
//
// The second instruction (shl eax, 1) is a very efficient way
// to multiply an integer of up to 1 billion or so by two.
//
// The original version of this routine incorrectly assumed
// that memcpy returned NULL if it failed. However, since
// it doesn't, I eliminated the test. Hence, the associated
// status code, P6STRINGLIB_LSTRCPYN_ERR_P6C, is no longer
// used by this routine.
//
// Since it predates the TcharsToBytesP6C macro, it used an
// error prone hard coded mathematical expression. Although
// technically correct as it was originally written, I replaced
// the expression with the macro.
//
// 2013/05/12 - Especially in the case of strings, although it
// is very fast, memcpy is a tad hazardous. At the
// cost of a few machine instructions and a stack
// frame, I am substituting SafeMemCpyTchars_WW,
// which guarantees that the supplied buffer has
// enough room to copy the string and its terminal
// null character.
//
// 2015/03/21 - Since this routine is private, and is intended
// for use with strings that fall far short of the
// 4097 character capacity of the output buffer,
// and a key design goal is avoidance of the heap,
// this routine reverts to the original design,
// directly calling memcpy. Since it is already
// worked out and tested, I decided to leave the
// new "safe" method as a comment.
// -----------------------------------------------------------
memcpy ( ( LPTSTR ) m_lpFBReplaceBuff + ( LONG_PTR ) lngOutPos ,
( LPCTSTR ) plpStrData + ( LONG_PTR ) lngInPos ,
TcharsToBytesP6C ( lngTCharsToCopy ) ) ;
lngOutPos += lngTCharsToCopy ;
} // if ( lngTCharsToCopy )
if ( lngLenToRepl )
{
memcpy ( ( LPTSTR ) m_lpFBReplaceBuff + ( LONG_PTR ) lngOutPos ,
( LPCTSTR ) plpToReplace + ( LONG_PTR ) lngInPos ,
TcharsToBytesP6C ( lngLenToRepl ) ) ;
lngOutPos += lngLenToRepl ;
} // if ( lngLenToRepl )
lngInPos = lngInPos
+ lngFoundPos
+ lngLenToFind
- TRAILING_NULL_ALLOWANCE_P6C ;
} // TRUE block, if ( lngFoundPos )
else
{
lngTCharsToCopy = lngInStrLen != lngInPos
? lngInStrLen - lngInPos
: NONE_P6C ;
if ( lngTCharsToCopy )
{
memcpy ( ( LPTSTR ) m_lpFBReplaceBuff + ( LONG_PTR ) lngOutPos ,
( LPCTSTR ) plpStrData + ( LONG_PTR ) lngInPos ,
TcharsToBytesP6C ( lngTCharsToCopy ) ) ;
} // if ( lngTCharsToCopy )
} // FALSE block, if ( lngFoundPos )
} while ( lngFoundPos > STRLEN_EMPTY_P6C ) ;
return m_lpFBReplaceBuff ;
} // FB_ReplaceW
I discovered the hard way that the outer variable is evaluated, since the end of the inner block is the closing brace,. The result was an infinite loop, because lpFoundPos
is initialized, and nevver changes thereafter.
The solution was obvious and simple. The first statement in the buggy block was as follows.
long lngFoundPos = StrIndex_P6C ( ( plpStrData + ( LONG_PTR ) lngInPos ) ,
plpToFind ) ;
Eliminating the first keyword (long
) keeps the original lngFoundPos
in scope, allowing the loop to stop when it should, rather tnan run off into deep space (high memory, actually). Consolidating the statement into the if
statement that followed it in the original code, simplifying the while
expression, and initializeing lpFoundPos
to NULL
(zero) yields a working loop that looks like this.
LPTSTR lpFoundPos = NULL ;
...
if ( lpFoundPos = _tcsstr ( lpInPos , plpToFind ) )
...
} while ( lpFoundPos ) ;
Points of Interest
Although the role of braces as scope boundary markers is familliar to me, because other languages that borrowed heavily from C++ exhibit the same behavior, the example made crystal clear that the braces form a Chinese wall around the code that they enclose. Any variable defined inside the block doesn't exist until execution passes the opening brace, and it ceases to exist the instant execution passes the closing brace. Since the while
clause lies outside the braces, it can't use any variable that was defined inside them, even if a like named variable exists in its scope. Technically, they are two different variables.
Numerous other languages follow these rules, or something very close to them. I know of the following languages, and I am certain that this list is far from exhaustive.
- C#
- Perl
- Java
- Javascript
Other popular languages that I suspect follow the same rules include Python, PHP, and Pascal.
History
Monday, 01 June 2015, Initial Publication