Click here to Skip to main content
Click here to Skip to main content

Endian Conversion in ARM and x86 Assembly

By , 30 Sep 2008
Rate this:
Please Sign up or sign in to vote.


In this article, I shall explore the Endian conversion problem and give a set of assembly functions to solve it. They are for x86 and ARM, and can compile under VS and GCC.


In a cross platform project, we faced an age old problem – Endian conversion. If the file is generated on a little Endian machine, an integer 255 may be stored like:

ff 00 00 00 

But when it's read into the memory, the value will be different for different platforms. And this will cause a porting problem.

int a;
fread(&a, sizeof(int), 1, file);
// on little endian machine, a = 0xff;
// but on big endian machine, a = 0xff000000; 

First Approach

A very simple and effective way to solve it is as follows. We can write a function called readInt().

void readInt(void* p, file)
    char buf[4];
    fread(buf, 4, 1, file); 
    *((uint32*)p) = buf[0] << 24 | buf[1] << 16 
                   | buf[2] << 8 | buf[3];}    

The function has the advantage of working on both big and small Endian platforms. But it reaped the convenience of reading a structure from us.

fread(&header, sizeof(struct MyFileHeader), 1, file);

If MyFileHeader contains many integers, we will have to break it down to many read()s. It’s not only cumbersome to code, but also runs slowly due to increased IO operation. So I shall propose another method.
We can leave the code unchanged, and use several macros to post-process the data.

fread(&header, sizeof(struct MyFileHeader), 1, file);
CQ_NTOHL_ARRAY(&, 4); // box is a RECT structure 

If the endianness of the machine doesn’t match with that of the data file, these macros are defined to execute certain functions, or else they are defined empty:

#    define CQ_NTOHL(a) {a = ((a) >> 24) | (((a) & 0xff0000) >> 8) | 
	(((a) & 0xff00) << 8) | ((a) << 24); }
#    define CQ_NTOHL_ARRAY(arr, num) {uint32 i; 
	for(i = 0; i < num; i++) {CQ_NTOHL(arr[i]); }}
#    define CQ_NTOHL(a)
#    define CQ_NTOHL_ARRAY(arr, num)

This approach has the advantage of wasting no CPU cycle if ENDIAN_CONVERSION is not defined. And the code can be preserved in its natural form of reading a whole structure at a time.

The Fall of C

This is the best we can achieve with C language. But I recalled that 80x86 series of CPU has a dedicated instruction to do Endian conversion. A little Googling confirmed it as BSWAP. And for ARM series CPU, there is also an algorithm to accelerate it.

Sadly, there is a limitation with the C language. Because the underlying machine structures are widely different, how can one high level language harness all their power?

In the simplest scenario of division, 8086 has an instruction DIV which will store the quotient and the remainder in AH and AL respectively. But in C, we will have to write:

Quotient = a / b; 
Remainder = a % b;

Seemingly, the calculation is made twice. But a good compiler will be able to optimize it.

For another example, almost every architecture has a right rotate instruction. I wonder why C doesn’t even consider it. So in C, if we want to right rotate a for b bits, we will have to write:

uint32 mask = (1 << b) - 1;
a = (a >> b) | ((a & mask) << (32 - b)); 

Instructions like algorithm shift left/right, rotate left/right, none of them has a counterpart in C.

For RISC machines like ARM, with its interesting but brilliant design, there is a huge gap between the instruction set and C semantics, so a tremendous amount of effort must be put on the compiler. Can we still trust the compiler for generating the desired code? No. I tried the above CQ_NTOHL() with VS2008, it failed to use BSWAP. I debugged the function ntohl() from winsock.h, it doesn’t use BSWAP either.

How to Use the Code

I think there is a possible explanation for this. Because the time spent in Endian conversion is dwarfed by the preceding IO, it seems I’m a little carried away. But that’s what I am. Wink | ;) I'm having fun pushing it to the limit.

I shall write assembly code for 80486 and ARM CPU, and tell you how to compile it under VS and GCC alike in the following sections. But first, let’s declare the functions we are going to write:

uint32 cq_ntohl(uint32 value);
void cq_ntohl_array(uint32* arr, uint32 num);
uint16 cq_ntohs(uint16 value);
void cq_ntohs_array(uint16* arr, uint32 num);
#define CQ_NTOHL(a) a = cq_ntohl(a);
#define CQ_NTOHL_ARRAY(p, n) cq_ntohl_array(p, n);
#define CQ_NTOHS(a) a = cq_ntohs(a);
#define CQ_NTOHS_ARRAY(p, n) cq_ntohs_array(p, n); 

As for the implementations, I shall give you 4 versions of them:

  • Visual Studio. X86
  • Visual Studio. Smart device
  • GCC arm-linux-gcc Inline assembly
  • GCC arm-linux-as Assembler

The source codes are packed into a zip file. Here I shall only explain how to use them in your project and some points that need your attention.

Visual Studio. X86

Visual Studio supports inlined x86 assembly. So simply adding compile ec_x86.c into your VS project will suffice. 32bit integer is converted with 80486 instruction BSWAP, and 16 bit integer with a simple right rotate of 8 bits.

uint32 cq_ntohl(uint32 a) {
        mov eax, a;
        bswap eax; 

Visual Studio. Smart Device

Visual Studio doesn’t support inlined arm assembly. So you need to compile the ec_arm.s file using armasm.exe:

armasm.exe ec_arm.s ec_arm.obj 

And then link your EXE against the .obj file just like an ordinary static link library. Or you can add .s into your project and setup Custom Build Steps. But I won’t explain the details here.

The 32bit Endian conversion algorithm involves exclusive or, right rotate and right shift.

    eor r1, r0, r0, ROR #16
    bic r1, r1, #0xFF, 16
    mov r0, r0, ror #8
    eor r0, r0, r1, lsr #8

It’s a little hard to understand. But we can verify it. Let:

r0 = 0xaabbccdd. 
r1 = r0 ^ (r0 rotate-right 16) 
   = 0xaabbccdd ^ 0xccddaabb 
   = 0x(aa^cc,bb^dd,cc^aa,dd^bb); 
r1 = r1 & not(0xff0000) 
   = r1 & 0xff00ffff 
   = 0x(aa^cc, 0, cc^aa, dd^bb);
r0 = r0 rotate-right 8 = 0xddaabbcc;
r0 = r0 ^ (r1 >> 8) 
   = 0xddaabbcc ^ 0x(0, aa^cc, 0, cc^aa) 
   = 0x(dd, aa^aa^cc, bb, cc^cc^aa)
   = 0x(dd,cc,bb,aa);

It’s amazing that these calculations can be packed into just four 32-bit arm instructions.

GCC Arm-Linux-gcc

arm-linux-gcc supports both inlined assembly and .s file. So you can choose to use either ec_arm_linux.c or ec_arm_linux.s.

I’m not experienced with assembly language or assembler. I only taught myself the things that are needed to do the conversion, nothing much else. So please excuse me if there is any error or omission.

Points of Interest

From this experience, I learned that there is a huge gap between machine instruction and C language. And so I feel more in dept to compiler writers. It’s indeed a very hard job. And I also learned that C language is platform independent, compiler independent. But sometimes, writing assemblies will do the job more quickly and directly.



  • October 1, 2008 - First version


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Software Developer (Senior) mapbar
China China
No Biography provided

Comments and Discussions

GeneralNetBSD Pinmembermpuerto8-Jun-11 4:12 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.140415.2 | Last Updated 1 Oct 2008
Article Copyright 2008 by kingsimba0511
Everything else Copyright © CodeProject, 1999-2014
Terms of Use
Layout: fixed | fluid