I think you have manually modified the
buffer variable to ensure that it points to an address aligned to 16 bytes boundaries; this is not enough because the
fxsave instruction requires that all the memory accesses it does are aligned to 16 bytes boundaries, this means that:
- the buffer pointed by buffer must be at an address aligned to 16 bytes bundaries
- the address of the buffer variable must be aligned to 16 bytes boundaries
At runtime, you can add a watch to
&buffer and see that, due to the function stack frame layout, the
buffer variable is allocated on the stack at an address in the form XXXXXXX4h (e.g. on my PC I see
&buffer = 0x0010FA84
and
buffer = 0x01B48AF0
).
You can fix the problem in two ways:
- add a dummy DWORD local variable to the function, declared on the line that preceed the buffer variable declaration (this will shift down the address of buffer on the stack by 4 bytes and make it aligned to 16 bytes boundsries)
int MXCSR_MASK()
{
if (FXSAVE_SUPPORTED())
{
DWORD dummy = 0;
unsigned char *buffer = new unsigned char[512];
__asm fxsave [buffer]
return *(int*)(buffer + 28)
}
return 0;
}
- you can move the address pointed by buffer to a register and then call fxsave through it
int MXCSR_MASK()
{
if (FXSAVE_SUPPORTED())
{
unsigned char *buffer = new unsigned char[512];
__asm
{
mov eax, [buffer]
fxsave [eax]
}
return *(int*)(buffer + 28);
}
return 0;
}
However, the best solution is the one that I wrote on my previous answer, because it is not dependant on how the memory manager allocate your buffer, and its performance are better, because it make room for the 512 bytes stored by
fxsave directly on the stack and immediately remove them; a call to the
new
operator involves some kind of work for the C++ runtime to allocate that memory on the heap and then to deallocate it when it's no longer needed.