Click here to Skip to main content
15,880,651 members
Articles / Programming Languages / C

An Introduction to Bitwise Operators

Rate me:
Please Sign up or sign in to vote.
4.84/5 (141 votes)
8 May 2002CPOL5 min read 1.3M   387   120
This article gives a brief overview of C style bitwise operators

Introduction

I have noticed that some people seem to have problems with bitwise operators, so I decided to write this brief tutorial on how to use them.

An Introduction to Bits

Bits, what are they, you may ask?

Well, simply put, bits are the individual ones and zeros that make up everything we do with computers. All the data you use is stored in your computer using bits. A BYTE is made up of eight bits, a WORD is two BYTEs, or sixteen bits. And a DWORD is two WORDS, or thirty two bits.

 0 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0
||              |               |               |              ||
|+- bit 31      |               |               |       bit 0 -+|
|               |               |               |               |
+-- BYTE 3 -----+--- BYTE 2 ----+--- BYTE 1 ----+-- BYTE 0 -----+
|                               |                               |
+----------- WORD 1 ------------+----------- WORD 0 ------------+
|                                                               |
+--------------------------- DWORD -----------------------------+

The beauty of having bitwise operators is that you can use a BYTE, WORD or DWORD as a small array or structure. Using bitwise operators, you can check or set the values of individual bits or even a group of bits.

Hexadecimal Numbers and How They Relate to Bits

When working with bits, it is kind of hard to express every number using just ones and zeros, which is known as binary notation. To get around this, we use hexadecimal (base 16) numbers.

As you may or may not know, it takes four bits to cover all the numbers from zero to fifteen, which also happens to be the range of a single digit hexadecimal number. This group of four bits, or half a BYTE, is called a nibble. As there are two nibbles in a BYTE, we can use two hexadecimal digits to show the value of one BYTE.

NIBBLE   HEX VALUE
======   =========
 0000        0
 0001        1
 0010        2
 0011        3
 0100        4
 0101        5
 0110        6
 0111        7
 1000        8
 1001        9
 1010        A
 1011        B
 1100        C
 1101        D
 1110        E
 1111        F

So if we had one BYTE containing the letter 'r' (ASCII code 114), it would look like this:

0111 0010    binary
  7    2     hexadecimal

We could write it as '0x72'

Bitwise Operators

There are six bitwise operators. They are:

  1.   &   The AND operator
  2.    |   The OR operator
  3.    ^   The XOR operator
  4.    ~   The Ones Complement or Inversion operator
  5.   >>   The Right Shift operator
  6.   <<   The Left Shift operator.

The & Operator

The & (AND) operator compares two values, and returns a value that has its bits set if, and only if, the two values being compared both have their corresponding bits set. The bits are compared using the following table:

1   &   1   ==   1
1   &   0   ==   0
0   &   1   ==   0
0   &   0   ==   0

An ideal use for this is to set up a mask to check the values of certain bits. Say we have a BYTE that contains some bit flags, and we want to check if bit four bit is set.

BYTE b = 50;
if ( b & 0x10 )
    cout << "Bit four is set" << endl;
else
    cout << "Bit four is clear" << endl;

This would result in the following calculation:

  00110010  - b
& 00010000  - & 0x10
----------
  00010000  - result

So we see that bit four is set.

The | Operator

The | (OR) operator compares two values, and returns a value that has its bits set if one or the other values, or both, have their corresponding bits set. The bits are compared using the following table:

1   |   1   ==   1
1   |   0   ==   1
0   |   1   ==   1
0   |   0   ==   0

An ideal use for this is to ensure that certain bits are set. Say we want to ensure that bit three of some value is set:

BYTE b = 50;
BYTE c = b | 0x04;
cout << "c = " << c << endl;

This would result in the following calculation:

  00110010  - b
| 00000100  - | 0x04
----------
  00110110  - result

The ^ Operator

The ^ (XOR) operator compares two values, and returns a value that has its bits set if one or the other value has its corresponding bits set, but not both. The bits are compared using the following table:

1   ^   1   ==   0
1   ^   0   ==   1
0   ^   1   ==   1
0   ^   0   ==   0

An ideal use for this is to toggle certain bits. Say we want to toggle the bits three and four:

BYTE b = 50;
cout << "b = " << b << endl;
b = b ^ 0x18;
cout << "b = " << b << endl;
b = b ^ 0x18;
cout << "b = " << b << endl;

This would result in the following calculations:

  00110010  - b
^ 00011000  - ^ 0x18
----------
  00101010  - result

  00101010  - b
^ 00011000  - ^ 0x18
----------
  00110010  - result

The ~ Operator

The ~ (Ones Complement or inversion) operator acts only on one value and it inverts it, turning all the ones into zeros, and all the zeros into ones. An ideal use of this would be to set certain bytes to zero, and ensuring all other bytes are set to one, regardless of the size of the data. Say we want to set all the bits to one except bits zero and one:

BYTE b = ~0x03;
cout << "b = " << b << endl;
WORD w = ~0x03;
cout << "w = " << w << endl;

This would result in the following calculations:

00000011  - 0x03
11111100  - ~0x03  b

0000000000000011  - 0x03
1111111111111100  - ~0x03  w

Another ideal use, is to combine it with the & operator to ensure that certain bits are set to zero. Say we want to clear bit four:

BYTE b = 50;
cout << "b = " << b << endl;
BYTE c = b & ~0x10;
cout << "c = " << c << endl;

This would result in the following calculations:

  00110010  - b
& 11101111  - ~0x10
----------
  00100010  - result

The >> and << Operators

The >> (Right shift) and << (left shift) operators move the bits the number of bit positions specified. The >> operator shifts the bits from the high bit to the low bit. The << operator shifts the bits from the low bit to the high bit. One use for these operators is to align the bits for whatever reason (check out the MAKEWPARAM, HIWORD, and LOWORD macros):

BYTE b = 12;
cout << "b = " << b << endl;
BYTE c = b << 2;
cout << "c = " << c << endl;
c = b >> 2;
cout << "c = " << c << endl;

This would result in the following calculations:

00001100  - b
00110000  - b << 2
00000011  - b >> 2

Bit Fields

Another interesting thing that can be done using bits is to have bit fields. With bit fields, you can set up minature structures within a BYTE, WORD or DWORD. Say, for example, we want to keep track of dates, but we want to use the least amount of memory as possible. We could declare our structure this way:

struct date_struct {
    BYTE day   : 5,   // 1 to 31
         month : 4,   // 1 to 12
         year  : 14;  // 0 to 9999
    } date;

In this example, the day field takes up the lowest 5 bits, month the next four, and year the next 14 bits. So we can store the date structure in twenty three bits, which is contained in three BYTEs. The twenty fourth bit is ignored. If I had declared it using an integer for each field, the structure would have taken up 12 BYTEs.

|0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0|
  |                           |       |         |
  +------ year ---------------+ month +-- day --+

Now let's pick this declaration apart to see what we are doing.

First, we will look at the data type we are using for the bit field structure. In this case, we used a BYTE. A BYTE is 8 bits, and by using it, the compiler will allocate one BYTE for storage. If however, we use more than 8 bits in our structure, the compiler will allocate another BYTE, as many BYTEs as it takes to hold our structure. If we had used a WORD or DWORD, the compiler would have allocated a total of 32 bits to hold our structure.

Now let's look at how the various fields are declared. First, we have the variable (day, month, and year), followed by a colon that separates the variable from the number of bits that it contains. Each bit field is separated by a comma, and the list is ended with a semicolon.

Now we get to the struct declaration. We put the bit fields into a struct like this so that we can use convention structure accessing notation to get at the structure members. Also, since we cannot get the addresses of bit fields, we can now use the address of the structure.

date.day = 12;

dateptr = &date;
dateptr->year = 1852;

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
President
Canada Canada
Father of two, brother of two, child of two.
Spouse to one, uncle to many, friend to lots.
Farmer, carpenter, mechanic, electrician, but definitely not a plumber.
Likes walks with the wife, board games, card games, travel, and camping in the summer.
High school graduate, college drop-out.
Hobby programmer who knows C++ with MFC and the STL.
Has dabbled with BASIC, Pascal, Fortran, COBOL, C#, SQL, ASM, and HTML.
Realized long ago that programming is fun when there is nobody pressuring you with schedules and timelines.

Comments and Discussions

 
GeneralRe: Mistyping... Pin
Ryan Binns30-Jun-03 23:02
Ryan Binns30-Jun-03 23:02 
GeneralRe: Mistyping... Pin
Kochise1-Jul-03 0:26
Kochise1-Jul-03 0:26 
General* bookmarked * Pin
Nitron4-Feb-03 4:33
Nitron4-Feb-03 4:33 
GeneralGood Starting point Pin
KarstenK12-Nov-02 1:26
mveKarstenK12-Nov-02 1:26 
GeneralThanks a lot Pin
poplarc3-Nov-02 19:54
poplarc3-Nov-02 19:54 
GeneralGood article Pin
Nilesh K.10-Oct-02 23:45
Nilesh K.10-Oct-02 23:45 
GeneralGsm software Pin
loni8-Sep-02 8:29
loni8-Sep-02 8:29 
GeneralBIG Endian vs little Endian Pin
22-Aug-02 23:09
suss22-Aug-02 23:09 

In other words, Motorola vs Intel !

What the deal about these Endians ? I not said Indians Wink | ;) It's the way data is stored in MEMORY. Well, numbers such as a long/DWORD (32 bits) 0x12345678 are ALWAYS stored in BIG ENDIAN into the microprocessor registers, but in a different way into the memory according to the microprocessor firm.

Data are sized differently :
Byte : One byte, noted byte 0
Word : Two bytes, from byte 0 to 1
Long : Four bytes, from byte 0 to 3

Every byte is formed with 8 bits, from bit 7 to 0 (left to right).
Every word is formed with 16 bits, from bit 15 to 0 (left to right)..
Every long is formed with 32 bits, from bit 31 to 0 (left to right)..

Note that byte 0 is the LSB (Less Significant Byte) and byte 3 is the MSB (Most Significant Byte). So :
Byte : 3 2 1 0
Number : $12 $34 $56 $78

The number given is $12*256^3 + $34*256^2 + $56*256^1 + $78*256^0.

It works also for words, word 0 is the 'LSW' (Less Significant Word) and word 1 is the 'MSW' (Most Significant Word) :
Word : 1 0
Number : $1234 $5678

The number given is $1234*65536^1 + $5678*65536^0.

Now the way bytes are stored :

BIG ENDIAN : First END is BIG power, that's to say BIG byte first...
Byte : Byte 3 ($12) (just MSB)
Word : Byte 3 to 2, word 1 ($1234) (MSB to LSB)
Long : Byte 3 to 0, word 1 to 0 ($12345678) (MSB to LSB)

BIG ENDIAN is the way we ALL write numbers on paper, most significant/powerful digit on the left (for instance thousands, then hundreds, then units). It's also the way ALL DATAS are stored in registers, for ALL processors architechtures !

Into memory, you'll find at the address 'n'(+ offset) the following data :
n+0 : Byte 3 ($12) (MSB first : BIG Endian)
n+1 : Byte 2 ($34)
n+2 : Byte 1 ($56)
n+3 : Byte 0 ($78) (LSB)

Byte 0 location : 'n+3' (just 'n+3', byte 0 is $78)
Word 0 location : 'n+2' ('n+2' to 'n+3', byte 1 to byte 0, word 0 is $5678)
Long 0 location : 'n+0' ('n+0' to 'n+3', byte 3 to byte 0, long 0 is $12345678)

It's really useful when you work in embedded systems and you debug memory mappings. What you have in registers is what you get in memory. But fetching the Low Word of the data needs to fetch it at 'n+2'. So the fetching is a bit slower due to the additive memory address shifting to fetch the 'casted' (according to the assembly opcode operand size) data.

Into register : $12345678
At address 'n+0' : $12 $34 $56 $78 (MSB to LSB per byte, byte 3 to 0)
At address 'n+0' : $1234 $5678 (MSB to LSB per word, word 1 to 0)

BIG ENDIAN numbers are also widely used into TCP/IP protocols ! The most common BIG Endian architechtures are Motorola 6800 family, 68000 family and ColdFire family. The PowerPC is a little Endian one (derivated from the IBM Power processor family).

LITTLE ENDIAN : First END is LITTLE power, that's to say little byte first...
Byte : Byte 0 ($78) (just LSB)
Word : Byte 0 to 1, word 0 ($7856) (LSB to MSB)
Long : Byte 0 to 3, word 0 to 1 ($78563412) (LSB to MSB)

LITTLE ENDIAN processors are hardwired to convert LITTLE ENDIAN from the memory to BIG ENDIAN into registers. So it takes no more time to load/store data on LITTLE ENDIAN processors, don't worry...

Into memory, you'll find at the address 'n' (+ offset) the following data :
n+0 : Byte 0 ($78) (LSB first : little Endian)
n+1 : Byte 1 ($56)
n+2 : Byte 2 ($34)
n+3 : Byte 3 ($12) (MSB)

Byte 0 location : 'n+0' (just 'n+0', byte 0 is $78)
Word 0 location : 'n+0' ('n+0' to 'n+1', byte 0 to 1, word 0 is $5678 in LITTLE ENDIAN)
Long 0 location : 'n+0' ('n+0' to 'n+3', byte 0 to 3, long 0 is $12345678 in LITTLE ENDIAN)

It is useful when you want to get the low part of the data, for instance the LSB or the LSW. You just have to fetch the 'casted' (according to the assembly opcode operand size) data at the same address 'n+0' ! But debuging memory is then absolutely weird...

Into register : $12345678
At address 'n+0' : $78 $56 $34 $12 (LSB to MSB per byte in LITTLE ENDIAN)
At address 'n+0' : $7856 $3412 (LSB to MSB per word in LITTLE ENDIAN)

Please note that some few processors uses bitwise LITTLE ENDIAN storage, that's to say a 32 bits long from bit 31 to 0 is stored in memory from bit 0 to 31 !

Into register : $12345678 (%10010001101000101011001111000)
At address 'n+0' : $01CD4589 (%0001110011010100010110001001) (GOD, these fools !)

Kochise

PS : The way numbers format is noted...

Firm : Motorola Toshiba Intel
Decimal : 123 123 123 (default)
Octal : @173 o173 173o
Hexa : $7B h7B 7Bh
Binary : %1111011 b1111011 1111011b

My favorite way is the Motorola's since it is easier to first know in which format is written the number, then reading the number. For instance, when I read $1234, I immediatly know that it means 4660 in decimal format. Harder to first read 1234, then discover the number was given in hexadecimal (Intel). Toshiba's way is good also, but I sometimes find hard to read o01101 or h481 ! Characters @, $ and % cannot be missed !

The useful links :

http://facstaff.bloomu.edu/bobmon/readings/ien137.Cohen-Holy_Wars.html[^] (THE MUST TO READ)

http://www.webopedia.com/TERM/b/big_endian.html[^]
http://www.cs.umass.edu/~verts/cs32/endian.html[^]
http://www.cscc.de/download/java/big-endian-versus-little-endian.txt[^]
http://www.rfc-editor.org/rfc/rfc1071.txt[^] (essential read)
http://ourworld.compuserve.com/homepages/rortiz/LEndians.htm[^]
http://www.ddj.com/ftp/1999/1999_03/antlr.txt[^] (Example 6)
http://www.geocities.com/fontboard/cjk/unicode.html[^]
GeneralHow to know? Pin
richard sancenot8-Sep-05 23:00
richard sancenot8-Sep-05 23:00 
GeneralA minor correction... Pin
13-Jun-02 12:54
suss13-Jun-02 12:54 
GeneralRe: A minor correction... Pin
PJ Arends13-Jun-02 14:18
professionalPJ Arends13-Jun-02 14:18 
GeneralThanks...... Pin
Mazdak23-May-02 21:52
Mazdak23-May-02 21:52 
GeneralRe: Thanks...... Pin
PJ Arends24-May-02 6:23
professionalPJ Arends24-May-02 6:23 
GeneralRe: Thanks...... Pin
Mazdak24-May-02 8:42
Mazdak24-May-02 8:42 
GeneralRe: Thanks...... Pin
Tim Smith24-May-02 8:52
Tim Smith24-May-02 8:52 
GeneralRe: Thanks...... Pin
Mazdak24-May-02 19:07
Mazdak24-May-02 19:07 
GeneralGreat but.... Pin
johny quest19-May-02 18:16
johny quest19-May-02 18:16 
GeneralRe: Great but.... Pin
PJ Arends20-May-02 7:44
professionalPJ Arends20-May-02 7:44 
GeneralDate_struct Pin
KarstenK13-May-02 20:41
mveKarstenK13-May-02 20:41 
GeneralValuable article!! Pin
WREY12-May-02 10:03
WREY12-May-02 10:03 
GeneralGreat article Pin
Jim Crafton9-May-02 16:21
Jim Crafton9-May-02 16:21 
GeneralRe: Great article Pin
Steve Chen9-May-02 19:10
Steve Chen9-May-02 19:10 
GeneralRe: Great article Pin
poplarc3-Nov-02 20:07
poplarc3-Nov-02 20:07 
GeneralBitfield portability Pin
Pravin Wagh9-May-02 10:16
Pravin Wagh9-May-02 10:16 
GeneralRe: Bitfield portability Pin
ilinov10-May-02 3:53
ilinov10-May-02 3:53 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.