![]() |
General Programming »
Collections »
General
Intermediate
License: The Code Project Open License (CPOL)
Endian Conversion in ARM and x86 assemblyBy kingsimba0511How to solve endian conversion in multiplatform application. |
C++, C, Windows, Win Mobile, Dev
|
||||||||
|
Advanced Search |
|
|
|
||||||||||||||||
In this article, I shall explore the endian conversion problem and give a set of assembly functions to solve it. They are for x86 and ARM, and can compile under VS and GCC.
In a cross platform project, we faced an age old problem – endian conversion. If the file is generate on a little endian machine. A integer 255 may be stored like:
ff 00 00 00 But when it's read into the memory, the value will be different for different platforms. And this will cause a porting problem.
int a;
fread(&a, sizeof(int), 1, file);
// on little endian machine, a = 0xff;
// but on big endian machine, a = 0xff000000;
void readInt(void* p, file)
{
char buf[4];
fread(buf, 4, 1, file);
*((uint32*)p) = buf[0] << 24 | buf[1] << 16
| buf[2] << 8 | buf[3];
} The function has the advantage of working on both big and small endian platforms. But it reaped the convenience of reading a structure from us.fread(&header, sizeof(struct MyFileHeader), 1, file);If MyFileHeader contains many integers, we will have to break it down to many read()s. It’s not only cumbersome to code, but also runs slowly due to increased IO operation. So I shall propose another method.
fread(&header, sizeof(struct MyFileHeader), 1, file); CQ_NTOHL(header.version); CQ_NTOHL_ARRAY(&header.box, 4); // box is a RECT structureIf the endianness of the machine doesn’t match with that of the data file, these macros are defined to execute certain functions, or else they are defined empty:
#if defined(ENDIAN_CONVERSION) # define CQ_NTOHL(a) {a = ((a) >> 24) | (((a) & 0xff0000) >> 8) | (((a) & 0xff00) << 8) | ((a) << 24); } # define CQ_NTOHL_ARRAY(arr, num) {uint32 i; for(i = 0; i < num; i++) {CQ_NTOHL(arr[i]); }} #else # define CQ_NTOHL(a) # define CQ_NTOHL_ARRAY(arr, num) #endif
This approach has the advantage of wasting no CPU cycle if ENDIAN_CONVERSION is not defined. And the code can be preserved in its natural form of reading a whole structure at a time.
This is the best we can achieve with C language. But I recalled that 80x86 series of CPU has a dedicated instruction to do endian conversion. A little googling confirmed it as BSWAP. And for ARM series CPU, there is also an algorithm to accelerate it.
Sadly, there is a limitation with C language. Because the underlying machine structures are widely different, how can one high level language harness all their power?
In the simplest scenario of division, 8086 have an instruction DIV which will store the quotient and the remainder in AH and AL respectively. But in C, we will have to write:
Quotient = a / b; Remainder = a % b;
Seemingly, the calculation is made twice. But a good compiler will be able to optimize it.
For another example, almost every architecture has a right rotate instruction. I wonder why C doesn’t even consider it. So in C, if we want to right rotate a for b bits, we will have to write:
uint32 mask = (1 << b) - 1; a = (a >> b) | ((a & mask) << (32 - b));
Instructions like algorithm shift left/right, rotate left/right, none of them has counterpart in C.
For RISC machines like ARM, with its interesting but brilliant design, there is a huge gap between the instruction set and C semantics, so tremendous amount of effort must be put on the Compiler. Can we still trust Compiler for generating the desired code? No. I tried the above CQ_NTOHL() with VS2008, it failed to use BSWAP. I debugged the function ntohl() from winsock.h, it doesn’t use BSWAP either.
I think there is a possible explanation for this. Because the time spent in endian conversion is dwarfed by the preceding IO. So it seems I’m a little carried away. But that’s what I am ;) I'm having fun to push it to the limit.
I shall write assembly code for 80486 and ARM CPU, and tell you how to compile it under VS and GCC alike in the following sections. But first, let’s declare the functions we are going to write:
uint32 cq_ntohl(uint32 value); void cq_ntohl_array(uint32* arr, uint32 num); uint16 cq_ntohs(uint16 value); void cq_ntohs_array(uint16* arr, uint32 num); #define CQ_NTOHL(a) a = cq_ntohl(a); #define CQ_NTOHL_ARRAY(p, n) cq_ntohl_array(p, n); #define CQ_NTOHS(a) a = cq_ntohs(a); #define CQ_NTOHS_ARRAY(p, n) cq_ntohs_array(p, n);
As for the implementations, I shall give you 4 versions of them.
The source codes are packed into a zip file. Here I shall only explain how to use them in your project and some points that need your attention.
Visual Studio support inlined x86 assembly. So simply adding compile ec_x86.c into your VS project will suffice. 32bit integer is converted with 80486 instruction BSWAP, and 16 bit integer with a simple right rotate of 8 bits.
uint32 cq_ntohl(uint32 a) { __asm{ mov eax, a; bswap eax; } } void cq_ntohl_array(uint32* p, uint32 num) { __asm{ mov eax, dword ptr [p]; mov ecx, num; next: mov ebx, [eax]; bswap ebx; mov dword ptr[eax], ebx; add eax, 4; loop next; } } uint16 cq_ntohs(uint16 a) { __asm{ mov ax, a; ror ax, 8; } } void cq_ntohs_array(uint16* arr, uint32 num) { __asm{ mov eax, dword ptr [arr]; mov ecx, num; next: mov bx, word ptr [eax]; ror bx, 8; mov word ptr[eax], bx; add eax, 2; loop next; } }
Visual Studio doesn’t support inlined arm assembly. So you need to compile the ec_arm.s file using armasm.exe:
armasm.exe ec_arm.s ec_arm.obj
And then link your exe against the .obj file just like an ordinary static link library. Or you can add .s into your project and setup Custom Build Steps. But I won’t explain the detail here.
The 32bit endian conversion algorithm involves exclusive or, right rotate and right shift.
eor r1, r0, r0, ROR #16
bic r1, r1, #0xFF, 16
mov r0, r0, ror #8
eor r0, r0, r1, lsr #8It’s a little hard to understand. But we can verify it. Let:
r0 = 0xaabbccdd. r1 = r0 ^ (r0 rotate-right 16) = 0xaabbccdd ^ 0xccddaabb = 0x(aa^cc,bb^dd,cc^aa,dd^bb); r1 = r1 & not(0xff0000) = r1 & 0xff00ffff = 0x(aa^cc, 0, cc^aa, dd^bb); r0 = r0 rotate-right 8 = 0xddaabbcc; r0 = r0 ^ (r1 >> 8) = 0xddaabbcc ^ 0x(0, aa^cc, 0, cc^aa) = 0x(dd, aa^aa^cc, bb, cc^cc^aa) = 0x(dd,cc,bb,aa);
It’s amazing that these calculations can be packed into just four 32-bit arm instructions.
arm-linux-gcc support both inlined assembly and .s file. So you can choose to use either ec_arm_linux.c or ec_arm_linux.s
I’m not experienced with assembly language or assembler. I only taught myself the things that are needed to do the conversion, nothing much else. So please excuse me if there is any error or omission.
From this experience, I learned that there is a huge gap between machine instruction and C language. And so I feel more in dept to compiler writers. It’s indeed a very hard job. And I also learned that C language is platform independent, compiler independent. But sometimes, writing assemblies will do the job more quickly and directly.
Basic concepts on Endianness by Juan Carlos Cobas
Otc 1, 2008 - First Version.
General
News
Question
Answer
Joke
Rant
Admin
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 30 Sep 2008 Editor: |
Copyright 2008 by kingsimba0511 Everything else Copyright © CodeProject, 1999-2009 Web16 | Advertise on the Code Project |