Click here to Skip to main content
15,895,777 members
Articles / All Topics

(a |= b) Is Not Equal to (a = a | b) ?

Rate me:
Please Sign up or sign in to vote.
4.40/5 (3 votes)
26 Dec 2013CPOL3 min read 7.4K   1   1
(a |= b) Is Not Equal to (a = a | b) ?

I’m trouble-shooting a driver of Intel network interface card (NIC) on pharlap which can’t send Ethernet packets out successfully. I track it down to the following code:

C++
// Memory-mapped I/O
struct Registers
{
   // Maps to a 32-bit hardware register which controls
   // how the NIC works
   u32 DeviceCtrl;
   ..
};

#define DEVICE_RESET (1 >> 26)
Registers m_pReg;
..
/* Problematic code */
m_pReg->DeviceCtrl |= DEVICE_RESET;

If the following code is used instead, the problem is gone.

C++
/* No problem */
m_pReg->DeviceCtrl = m_pReg->DeviceCtrl | DEVICE_RESET;

Frankly, I can’t tell any difference from the former code. It should always hold that...

C++
a |= b

...equals:

C++
a = a | b

No?

So I turn on the compiler flag(MSVC6.0, /FAs) to output assembly.

C++
#1
m_pReg->DeviceCtrl = m_pReg->DeviceCtrl | DEVICE_RESET;

Assembly
mov eax, DWORD PTR [esi+180] ; Load m_pReg to eax
mov eax, DWORD PTR [eax]     ; Load DeviceCtrl to eax
or eax, 67108864             ; Or eax with (1<<26)
mov ecx, DWORD PTR [esi+180] ; Load m_pReg to eax
mov DWORD PTR [ecx], eax     ; Store eax to update DeviceCtrl

This is straightforward.

C++
#2
m_pReg->DeviceCtrl |= DEVICE_RESET;

Assembly
mov eax, DWORD PTR [esi+180]
pop ecx
or BYTE PTR [eax+3], 4

What?

This surprises me! Where do the magic numbers 3, 4 come from? Can you come up with that?

Wow~

It becomes clear later when I stare at the number 67108864. The result of ORing a 32-bit value with 67108864(which is 0×04000000) is the same as the result of just ORing the highest byte with 0×04, because it’s a no-op to OR the following bytes with pure zeros. That said, the compiler tries to improve the I/O performance by writing a single byte instead of four on the data bus. Since the CPU I’m using (Intel i5-440) is little endian, the offset of the highest byte to this 32-bit register is 3.

Reasonable optimization, isn’t it?

But why does it cause the hardware problem after optimization? After writing the register, I read it back and it turns out to be the old value. Obviously, the write is rejected by the hardware.

My first thought is that the address put on the address bus is incorrect which causes the hardware not to accept the access. Usually, the NIC (I believe many of the other PCI/PCIe devices behave like this) checks the address bus and will reject or ignore the access if the address is not expected. In this case, the address/offset of DeviceCtrl is 0, the next register valid would start at 4. Obviously, offset 3 (highest byte) is not an effective address of a 32 bit register that NIC recognizes.

Is that the reason why the access is rejected?

Wait a minute.. No way..

Actually, the memory access is always aligned to 4 bytes on x86 32-bit platform. Even if a non-aligned address is produced (e.g. 3). CPU always generates a properly aligned address. In this case, it’s 0. But one thing is true. CPU won’t try writing the low 3 bytes on data bus. Reason? Whatever it writes for those bytes, it’s wrong since it’s only given the value of the highest byte.

Byte Enable

On i5-440, there are four Byte Enable signals corresponding to four bytes of the data bus. A Byte Enable is signalled to enable the transferring of a specific byte, effectively telling the memory that this part of the data bus is used. In our example, only the highest Byte Enable is on and only that byte is transferred.

Is it true that the NIC monitors Byte Enable signals and accepts the access only if all the four signals are enabled?

Yes! It’s proven by the data sheet of the NIC.

The device has limited support of read and write requests when only 
part of the byte enable bits are set as described later in this 
section. Partial writes to the MSI-X table are supported. All other 
partial writes are ignored and silently dropped.

More About the Optimization

In MSVC6.0, this optimization takes place even if all optimizations are disabled(/Od). But there is no such optimization in VC2012 even with full optimization enabled(/Ox). Maybe Microsoft considers this as a premature one and discards it ever since.

Takeaways

When troubleshooting low lever drivers, try disabling all optimizations first. Alternatively, run the debug version (should always be the first choice, right?), since most (if not all) optimizations are automatically turned off in debug version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Technical Lead National Instruments
China China
Senior software engineer at National Instruments, to implement various Ethernet-based industrial protocols, e.g., EtherCAT. Favorite languages are C/C++ and Python. For fun, I like watching films (sci-fi, motion), walking, and various reading.

Comments and Discussions

 
Questionnice article. thank you Pin
stefano.casazza31-Dec-13 3:06
stefano.casazza31-Dec-13 3:06 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.