|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
Note: This is an unedited contribution. If this article is inappropriate,
needs attention or copies someone else's work without reference then please
Report This Article
IntroductionIn this article, I shall explore the endian conversion problem and give a set of assembly functions to solve it. They are for x86 and ARM, and can compile under VS and GCC. BackgroundIn a cross platform project, we faced an age old problem – endian conversion. If the file is generate on a little endian machine. A integer 255 may be stored like: ff 00 00 00 But when it's read into the memory, the value will be different for different platforms. And this will cause a porting problem. int a;
fread(&a, sizeof(int), 1, file);
// on little endian machine, a = 0xff;
// but on big endian machine, a = 0xff000000;
First approachA very simple and effective way to solve it is as the following. we can write a function called readInt().void readInt(void* p, file)
{
char buf[4];
fread(buf, 4, 1, file);
*((uint32*)p) = buf[0] << 24 | buf[1] << 16
| buf[2] << 8 | buf[3];
} The function has the advantage of working on both big and small endian platforms. But it reaped the convenience of reading a structure from us.fread(&header, sizeof(struct MyFileHeader), 1, file);If MyFileHeader contains many integers, we will have to break it down to many read()s. It’s not only cumbersome to code, but also runs slowly due to increased IO operation. So I shall propose another method. We can leave the code unchanged, and use several macros to post-process the data. fread(&header, sizeof(struct MyFileHeader), 1, file); CQ_NTOHL(header.version); CQ_NTOHL_ARRAY(&header.box, 4); // box is a RECT structureIf the endianness of the machine doesn’t match with that of the data file, these macros are defined to execute certain functions, or else they are defined empty: #if defined(ENDIAN_CONVERSION) # define CQ_NTOHL(a) {a = ((a) >> 24) | (((a) & 0xff0000) >> 8) | (((a) & 0xff00) << 8) | ((a) << 24); } # define CQ_NTOHL_ARRAY(arr, num) {uint32 i; for(i = 0; i < num; i++) {CQ_NTOHL(arr[i]); }} #else # define CQ_NTOHL(a) # define CQ_NTOHL_ARRAY(arr, num) #endif This approach has the advantage of wasting no CPU cycle if ENDIAN_CONVERSION is not defined. And the code can be preserved in its natural form of reading a whole structure at a time. The fall of CThis is the best we can achieve with C language. But I recalled that 80x86 series of CPU has a dedicated instruction to do endian conversion. A little googling confirmed it as BSWAP. And for ARM series CPU, there is also an algorithm to accelerate it. Sadly, there is a limitation with C language. Because the underlying machine structures are widely different, how can one high level language harness all their power? In the simplest scenario of division, 8086 have an instruction DIV which will store the quotient and the remainder in AH and AL respectively. But in C, we will have to write: Quotient = a / b; Remainder = a % b; Seemingly, the calculation is made twice. But a good compiler will be able to optimize it. For another example, almost every architecture has a right rotate instruction. I wonder why C doesn’t even consider it. So in C, if we want to right rotate a for b bits, we will have to write: uint32 mask = (1 << b) - 1; a = (a >> b) | ((a & mask) << (32 - b)); Instructions like algorithm shift left/right, rotate left/right, none of them has counterpart in C. For RISC machines like ARM, with its interesting but brilliant design, there is a huge gap between the instruction set and C semantics, so tremendous amount of effort must be put on the Compiler. Can we still trust Compiler for generating the desired code? No. I tried the above CQ_NTOHL() with VS2008, it failed to use BSWAP. I debugged the function ntohl() from winsock.h, it doesn’t use BSWAP either. How to use the codeI think there is a possible explanation for this. Because the time spent in endian conversion is dwarfed by the preceding IO. So it seems I’m a little carried away. But that’s what I am ;) I'm having fun to push it to the limit. I shall write assembly code for 80486 and ARM CPU, and tell you how to compile it under VS and GCC alike in the following sections. But first, let’s declare the functions we are going to write: uint32 cq_ntohl(uint32 value); void cq_ntohl_array(uint32* arr, uint32 num); uint16 cq_ntohs(uint16 value); void cq_ntohs_array(uint16* arr, uint32 num); #define CQ_NTOHL(a) a = cq_ntohl(a); #define CQ_NTOHL_ARRAY(p, n) cq_ntohl_array(p, n); #define CQ_NTOHS(a) a = cq_ntohs(a); #define CQ_NTOHS_ARRAY(p, n) cq_ntohs_array(p, n); As for the implementations, I shall give you 4 versions of them.
The source codes are packed into a zip file. Here I shall only explain how to use them in your project and some points that need your attention. Visual Studio. X86Visual Studio support inlined x86 assembly. So simply adding compile ec_x86.c into your VS project will suffice. 32bit integer is converted with 80486 instruction BSWAP, and 16 bit integer with a simple right rotate of 8 bits. uint32 cq_ntohl(uint32 a) { __asm{ mov eax, a; bswap eax; } } void cq_ntohl_array(uint32* p, uint32 num) { __asm{ mov eax, dword ptr [p]; mov ecx, num; next: mov ebx, [eax]; bswap ebx; mov dword ptr[eax], ebx; add eax, 4; loop next; } } uint16 cq_ntohs(uint16 a) { __asm{ mov ax, a; ror ax, 8; } } void cq_ntohs_array(uint16* arr, uint32 num) { __asm{ mov eax, dword ptr [arr]; mov ecx, num; next: mov bx, word ptr [eax]; ror bx, 8; mov word ptr[eax], bx; add eax, 2; loop next; } } Visual Studio. Smart deviceVisual Studio doesn’t support inlined arm assembly. So you need to compile the ec_arm.s file using armasm.exe: armasm.exe ec_arm.s ec_arm.obj And then link your exe against the .obj file just like an ordinary static link library. Or you can add .s into your project and setup Custom Build Steps. But I won’t explain the detail here.
eor r1, r0, r0, ROR #16
bic r1, r1, #0xFF, 16
mov r0, r0, ror #8
eor r0, r0, r1, lsr #8It’s a little hard to understand. But we can verify it. Let: r0 = 0xaabbccdd. r1 = r0 ^ (r0 rotate-right 16) = 0xaabbccdd ^ 0xccddaabb = 0x(aa^cc,bb^dd,cc^aa,dd^bb); r1 = r1 & not(0xff0000) = r1 & 0xff00ffff = 0x(aa^cc, 0, cc^aa, dd^bb); r0 = r0 rotate-right 8 = 0xddaabbcc; r0 = r0 ^ (r1 >> 8) = 0xddaabbcc ^ 0x(0, aa^cc, 0, cc^aa) = 0x(dd, aa^aa^cc, bb, cc^cc^aa) = 0x(dd,cc,bb,aa); It’s amazing that these calculations can be packed into just four 32-bit arm instructions. GCC arm-linux-gccarm-linux-gcc support both inlined assembly and .s file. So you can choose to use either ec_arm_linux.c or ec_arm_linux.s Points of InterestFrom this experience, I learned that there is a huge gap between machine instruction and C language. And so I feel more in dept to compiler writers. It’s indeed a very hard job. And I also learned that C language is platform independent, compiler independent. But sometimes, writing assemblies will do the job more quickly and directly. ReferencesBasic concepts on Endianness by Juan Carlos Cobas HistoryOtc 1, 2008 - First Version.
|
||||||||||||||||||||||