Go to top

# Basic concepts on Endianness

, 18 Aug 2003
 Rate this:
A beginner introduction to Endianness.

## Introduction

A long time ago, in a very remote island known as Lilliput, society was split into two factions: Big-Endians who opened their soft-boiled eggs at the larger end ("the primitive way") and Little-Endians who broke their eggs at the smaller end. As the Emperor commanded all his subjects to break the smaller end, this resulted in a civil war with dramatic consequences: 11.000 people have, at several times, suffered death rather than submitting to breaking their eggs at the smaller end. Eventually, the 'Little-Endian' vs. 'Big-Endian' feud carried over into the world of computing as well, where it refers to the order in which bytes in multi-byte numbers should be stored, most-significant first (Big-Endian) or least-significant first (Little-Endian) to be more precise [1]

• Big-Endian means that the most significant byte of any multibyte data field is stored at the lowest memory address, which is also the address of the larger field.
• Little-Endian means that the least significant byte of any multibyte data field is stored at the lowest memory address, which is also the address of the larger field.

For example, consider the 32-bit number, 0xDEADBEEF. Following the Big-Endian convention, a computer will store it as follows:

Figure 1. Big-Endian: The most significant byte is stored at the lowest byte address.

Whereas architectures that follow the Little-Endian rules will store it as depicted in Figure 2:

Figure 2. Little-endian: Least significant byte is stored at the lowest byte address.

The Intel x86 family and Digital Equipment Corporation architectures (PDP-11, VAX, Alpha) are representatives of Little-Endian, while the Sun SPARC, IBM 360/370, and Motorola 68000 and 88000 architectures are Big-Endians. Still, other architectures such as PowerPC, MIPS, and Intel’s 64 IA-64 are Bi-Endian, i.e. they are capable of operating in either Big-Endian or Little-Endian mode. [1].

Endianess is also referred to as the NUXI problem. Imagine the word UNIX stored in two 2-byte words. In a Big-Endian system, it would be stored as UNIX. In a little-endian system, it would be stored as NUXI.

## Which format is better?

Like the egg debate described in the Gulliver's Travels, the Big- .vs. Little-Endian computer dispute has much more to do with political issues than with technological merits. In practice, both systems perform equally well in most applications. There is however a significant difference in performance when using Little-Endian processors instead of Big-Endian ones in network devices (more details below).

## How to switch from one format to the other?

It is very easy to reverse a multi-byte number if you need the other format, it is simply a matter of swapping bytes and the conversion is the same in both directions. The following example shows how an Endian conversion function could be implemented using simple C `union`s:

```unsigned long ByteSwap1 (unsigned long nLongNumber)
{
union u {unsigned long vi; unsigned char c[sizeof(unsigned long)];};
union v {unsigned long ni; unsigned char d[sizeof(unsigned long)];};
union u un;
union v vn;
un.vi = nLongNumber;
vn.d[0]=un.c[3];
vn.d[1]=un.c[2];
vn.d[2]=un.c[1];
vn.d[3]=un.c[0];
return (vn.ni);
}```

Note that this function is intented to work with 32-bit integers.

A more efficient function can be implemented using bitwise operations as shown below:

```unsigned long ByteSwap2 (unsigned long nLongNumber)
{
return (((nLongNumber&0x000000FF)<<24)+((nLongNumber&0x0000FF00)<<8)+
((nLongNumber&0x00FF0000)>>8)+((nLongNumber&0xFF000000)>>24));
}```

And this is a version in assembly language:

```unsigned long ByteSwap3 (unsigned long nLongNumber)
{
unsigned long nRetNumber ;

__asm
{
mov eax, nLongNumber
xchg ah, al
ror eax, 16
xchg ah, al
mov nRetNumber, eax
}

return nRetNumber;
}```

A 16-bit version of a byte swap function is really straightforward:

```unsigned short ByteSwap4 (unsigned short nValue)
{
return (((nValue>> 8)) | (nValue << 8));

}```

Finally, we can write a more general function that can deal with any atomic data type (e.g. `int`, `float`, `double`, etc) with automatic size detection:

```#include <algorithm> //required for std::swap

#define ByteSwap5(x) ByteSwap((unsigned char *) &x,sizeof(x))

void ByteSwap(unsigned char * b, int n)
{
register int i = 0;
register int j = n-1;
while (i<j)
{
std::swap(b[i], b[j]);
i++, j--;
}
}```

For example, the next code snippet shows how to convert a data array of `double`s from one format (e.g. Big-Endian) to the other (e.g. Little-Endian):

```double* dArray; //array in big-endian format
int n; //Number of elements

for (register int i = 0; i <n; i++)
ByteSwap5(dArray[i]);```

Actually, in most cases, you won't need to implement any of the above functions since there are a set of socket functions (see Table I), declared in winsock2.h, which are defined for TCP/IP, so all machines that support TCP/IP networking have them available. They store the data in 'network byte order' which is standard and endianness independent.

Function Purpose
`ntohs` Convert a 16-bit quantity from network byte order to host byte order (Big-Endian to Little-Endian).
`ntohl` Convert a 32-bit quantity from network byte order to host byte order (Big-Endian to Little-Endian).
`htons` Convert a 16-bit quantity from host byte order to network byte order (Little-Endian to Big-Endian).
`htonl` Convert a 32-bit quantity from host byte order to network byte order (Little-Endian to Big-Endian).

Table I: Windows Sockets Byte-Order Conversion Functions [2]

The socket interface specifies a standard byte ordering called network-byte order, which happens to be Big-Endian. Consequently, all network communication should be Big-Endian, irrespective of the client or server architecture.

Suppose your machine uses Little Endian order. To transmit the 32-bit value `0x0a0b0c0d` over a TCP/IP connection, you have to call `htonl()` and transmit the result:

`TransmitNum(htonl(0x0a0b0c0d)); `

Likewise, to convert an incoming 32-bit value, use `ntohl()`:

`int n = ntohl(GetNumberFromNetwork()); `

If the processor on which the TCP/IP stack is to be run is itself also Big-Endian, each of the four macros (i.e. `ntohs`, `ntohl`, `htons`, `htonl`) will be defined to do nothing and there will be no run-time performance impact. If, however, the processor is Little-Endian, the macros will reorder the bytes appropriately. These macros are routinely called when building and parsing network packets and when socket connections are created. Serious run-time performance penalties occur when using TCP/IP on a Little-Endian processor. For that reason, it may be unwise to select a Little-Endian processor for use in a device, such as a router or gateway, with an abundance of network functionality. (Excerpt from reference [1]).

One additional problem with the host-to-network APIs is that they are unable to manipulate 64-bit data elements. However, you can write your own `ntohll()` and `htonll()` corresponding functions:

• `ntohll`: converts a 64-bit integer to host byte order.
• `ntonll`: converts a 64-bit integer to network byte order.

The implementation is simple enough:

```#define ntohll(x) (((_int64)(ntohl((int)((x << 32) >> 32))) << 32) |
(unsigned int)ntohl(((int)(x >> 32)))) //By Runner
#define htonll(x) ntohll(x)```

## How to dynamically test for the Endian type at run time?

As explained in Computer Animation FAQ, you can use the following function to see if your code is running on a Little- or Big-Endian system:

```#define BIG_ENDIAN      0
#define LITTLE_ENDIAN   1

int TestByteOrder()
{
short int word = 0x0001;
char *byte = (char *) &word;
return(byte[0] ? LITTLE_ENDIAN : BIG_ENDIAN);
}```

This code assigns the value `0001h` to a 16-bit integer. A char pointer is then assigned to point at the first (least-significant) byte of the integer value. If the first byte of the integer is `0x01h`, then the system is Little-Endian (the `0x01h` is in the lowest, or least-significant, address). If it is `0x00h` then the system is Big-Endian.

Similarly,

```bool IsBigEndian()
{
short word = 0x4321;
if((*(char *)& word) != 0x21 )
return true;
else
return false;
}```

which is just the reverse of the same coin.

You can also use the standard byte order API’s to determine the byte-order of a system at run-time. For example:

`bool IsBigEndian() { return( htonl(1)==1 ); }`

## Auto detecting the correct Endian format of a data file

Suppose you are developing a Windows application that imports Nuclear Magnetic Resonance (NMR) spectra. High resolution NMR files are generally recorded in Silicon or Sun Workstations but recently Windows or Linux based spectrometers are emerging as practical substitutes. It turns out that you will need to know in advance the Endian format of the file to parse correctly all the information. Here are some practical guidelines you can follow to decipher the correct Endianness of a data file:

1. Typically, the binary file includes a header with the information about the Endian format.
2. If the header is not present , you can guess the Endian format if you know the native format of the computer from which the file comes from. For instance, if the file was created in a Sun Workstation, the Endian format will most likely be Big-Endian.
3. If none of the above points apply, the Endian format can be determined by trial and error. For example, if after reading the file assuming one format, the spectrum does not make sense, you will know that you have to use the other format.

If the data points in the file are in floating point format (`double`), then the `_isnan()` function can be of some help to determine the Endian format. For example:

```double dValue;
FILE* fp;
(...)
bool bByteSwap = _isnan(dValue) ? true : false```

Note that this method does only guarantee that the byte swap operation is required if `_isnan()` returns a nonzero value (`TRUE`); if it returns 0 (`FALSE`), then it is not possible to know the correct Endian format without further information.

## Acknowledgments

Thanks to Santiago Domínguez, Ehsan Akhgari, Santiago Fraga and Ignacio Sordo for their helpful suggestions.

## References

1. Introducction to Endianness, by Michael Barr, Embedded Systems Programming.
2. Visual C++ Concepts: Adding Functionality. Windows Sockets: Byte Ordering

A list of licenses authors might use can be found here

## Share

No Biography provided

## You may also be interested in...

### Why “Good Enough” Isn’t Good Enough Anymore for Software Configuration Management

 First Prev Next
 My vote of 5 maplewang 13-Jul-14 2:21
 Compiler optimizations.... [modified] Coriiander 3-May-11 15:24
 Ehm... It surprises me that noone has realized that the compiler will simply optimize the test out, and will put a fixed result as return value. This renders all code examples above, effectively useless. The only thing that would be returned is the endianness at compile-time! And yes, I tested all of the above examples. Here's an example with MSVC 9.0 (Visual Studio 2008).   Pure C code ``` int32 DNA_GetEndianness(void) { union { uint8 c[4]; uint32 i; } u;   u.i = 0x01020304;   if (0x04 == u.c[0]) return DNA_ENDIAN_LITTLE; else if (0x01 == u.c[0]) return DNA_ENDIAN_BIG; else return DNA_ENDIAN_UNKNOWN; }```   Disassembly   ``` PUBLIC _DNA_GetEndianness ; Function compile flags: /Ogtpy ; File c:\development\dna\source\libraries\dna\endian.c ; COMDAT _DNA_GetEndianness _TEXT SEGMENT _DNA_GetEndianness PROC ; COMDAT   ; 11 : union ; 12 : { ; 13 : uint8 c[4]; ; 14 : uint32 i; ; 15 : } u; ; 16 : ; 17 : u.i = 1; ; 18 : ; 19 : if (4 == u.c[0]) ; 20 : return DNA_ENDIAN_LITTLE;   mov eax, 1   ; 21 : else if (1 == u.c[0]) ; 22 : return DNA_ENDIAN_BIG; ; 23 : else ; 24 : return DNA_ENDIAN_UNKNOWN; ; 25 : }   ret _DNA_GetEndianness ENDP END```   For some compilers it is possible to temporarily turn off any compile-time optimizations for just this function. Otherwise it's maybe possible to hardcode it in assembly, although that's not portable. And even then even that might get optimized out. It makes me think I need some really crappy assemblers, implement the same code for all existing CPUs/instruction sets, and well.... never mind.   Disassembly (without optimization) `#pragma optimize("", on)` ```PUBLIC _DNA_GetEndianness
int32 DNA_GetEndianness(void) {   ; Function compile flags: /Odtp ; File c:\development\dna\source\libraries\dna\endian.c ;	COMDAT _DNA_GetEndianness _TEXT	SEGMENT _u\$ = -4						; size = 4 _DNA_GetEndianness PROC					; COMDAT   ; 16   : {   	push	ebp 	mov	ebp, esp 	push	ecx   ; 17   :     union ; 18   :     { ; 19   :         uint8  c[4]; ; 20   :         uint32 i; ; 21   :     } u; ; 22   :  ; 23   :     u.i = 0x01020304;   	mov	DWORD PTR _u\$[ebp], 16909060		; 01020304H   ; 24   :  ; 25   :     if (0x04 == u.c[0])   	movzx	eax, BYTE PTR _u\$[ebp] 	cmp	eax, 4 	jne	SHORT \$LN4@DNA_GetEnd   ; 26   :         return DNA_ENDIAN_LITTLE;   	mov	eax, 1 	jmp	SHORT \$LN5@DNA_GetEnd 	jmp	SHORT \$LN5@DNA_GetEnd \$LN4@DNA_GetEnd:   ; 27   :     else if (0x01 == u.c[0])   	movzx	ecx, BYTE PTR _u\$[ebp] 	cmp	ecx, 1 	jne	SHORT \$LN2@DNA_GetEnd   ; 28   :         return DNA_ENDIAN_BIG;   	mov	eax, 2 	jmp	SHORT \$LN5@DNA_GetEnd   ; 29   :     else   	jmp	SHORT \$LN5@DNA_GetEnd \$LN2@DNA_GetEnd:   ; 30   :         return DNA_ENDIAN_UNKNOWN;   	xor	eax, eax \$LN5@DNA_GetEnd:   ; 31   : }   	mov	esp, ebp 	pop	ebp 	ret	0 _DNA_GetEndianness ENDP END``` `pragma optimize("", off)`   Also, someone here said that endianness does not change during run-time. WRONG. There are bi-endian machines out there. Their endianness can vary durng execution. ALSO, there's not only Little Endian and Big Endian, but also other endiannesses (what a word).   I hate and love coding at the same time...  modified on Tuesday, May 3, 2011 9:46 PM
 The correct htonll code Berna_Gensis 10-Aug-10 11:47
 Detecting big/little endian at compile time? Patrick Hoffmann 8-Sep-06 11:58
 errors in macro Anonymous 3-Oct-05 12:10
 What about left-to-right or right-to-left.? rbid 8-Jan-05 21:16
 incorrect double value reading from binary file anonymous 8-Nov-04 22:08
 incorrect reading double value .... Anonymous 8-Nov-04 22:00
 SIMD Optimized SwapEndian function for large Unicode Strings immo 26-Aug-03 9:28
 An error in macro ntohll(x) Runner 20-Aug-03 6:59
 Re: An error in macro ntohll(x) Juan Carlos Cobas 20-Aug-03 8:21
 Re: An error in macro ntohll(x) Anonymous 23-Mar-05 6:28
 Re: An error in macro ntohll(x) dotanb 17-Sep-08 4:09
 Re: An error in macro ntohll(x) Anonymous 21-Jun-05 8:49
 Faster assembly JohnAtTopcon 20-Aug-03 3:03
 Re: Faster assembly Juan Carlos Cobas 20-Aug-03 4:29
 Re: Faster assembly Anonymous 9-May-05 18:50
 Use of Union in ByteSwap1 ReorX 20-Aug-03 0:35
 Nice ;) Kochise 19-Aug-03 19:56
 Re: Nice ;) Juan Carlos Cobas 19-Aug-03 22:19
 Last Visit: 31-Dec-99 18:00     Last Update: 21-Sep-14 6:41 Refresh 1