How do I figure out if the data in a variable is a word or a byte in C++?

Question

1.00/5 (2 votes)

See more:

I put my question all wrong earlier, allow me to explain.
I am struggling to write an assember for x86 processors. The x86 instruction set has different opcodes for the same instructions based on the operand sizes. What I want to know is that given a statement 'mov eax, 123' how do I figure out the the operands(byte, word, dword, qword...)? I'm using C++ as my implementation language.
Hope this makes a little more sense.
Thanks anyway.

Posted 20-Jun-15 6:37am

0xF4

Updated 20-Jun-15 11:33am

v2

Add a Solution

Comments

CPallini 20-Jun-15 12:50pm

How (the fresh Hell) could be 0xDEADBEEF a byte? Has your computer 32 bit bytes?

0xF4 20-Jun-15 13:02pm

It was just an example.... And i did not notice...

Frankie-C 20-Jun-15 13:18pm

Assembler use also DWORDs, QWORDs, etc.
But the point is another: what is the sense of checking something declared in 'C'?
In you sample you declared a variable of type 'int' which size, in 'C', is 4 bytes, so you will always get 4.
If your question is how can detect dimension of variables declared in an assembly module using another language, the answer is: you can't.
If the module is compiled there is no way to get this information, if you have sources you have to declare manually that variable with correct type in the other language.
If you want to check if a value fits in a type Carlo gave you an example of how you can do.

0xF4 20-Jun-15 13:51pm

Assemblers are written in C (at least some of 'em) and they do it... I guess...

Frankie-C 20-Jun-15 14:52pm

Yes sometimes they are written in 'C', but this doesn't mean that an assembler transforms an assembler source in 'C'. Often (or always) a 'C' compiler transforms 'C' code to assembler :)
The assembler get a symbolic description of instruction and implements it in machine code. If in assembler you reserve a memory operand then use it the assembler does:
1. Remember the address where your variable should be and associate it to variable name.
2. When found the operation that uses the variable address put the machine code for the operation replacing the address with the one saved before.
The machine code emitted is appropriate to handle the operand size (BYTE, WORD, DWORD, etc..)

0xF4 20-Jun-15 14:57pm

I am not saying that an assembler transforms assembly into C/C++. I did not post my question right, let me edit it. Wait for 5 minutes please.

0xF4 20-Jun-15 17:21pm

What I meant to ask is that if I encounter a statement 'mov eax, 0123' how do I figure out the operand size of '0123'(in C++)?

Frankie-C 20-Jun-15 18:31pm

Ok makes some sense now.
I updated the answer.

PIEBALDconsult 20-Jun-15 13:21pm

Nope, memory is just "a bunch of bits".

Michael_Davies 20-Jun-15 13:33pm

Hard to know what your question is:

Obviously the sizof operator will tell you the byte size of an type but the declaration will tell you the same, int vs. short, long etc.

For the intel op codes it is usually determined by an opcode prefix byte(s), LOOK AT SECTION 2.1.1 ONWARDS IN:

http://www.intel.co.uk/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf.

0xF4 20-Jun-15 18:10pm

I mean to say operand size in there, the darn thing is not letting me edit the answer again.

2 solutions

Solution 1

If you meant 'How could I establish if an int value would fit in a byte?' then the answer is yes, you may do it. A simple approach could be the round-trip:

C++

unsigned int a = 0xDEADBEEF; // see the nvr3 comment
bool fit_in_a_byte = (a == static_cast< unsigned char >(a));

Posted 20-Jun-15 6:57am

CPallini

Updated 21-Jun-15 21:11pm

v2

Comments

nv3 20-Jun-15 17:33pm

Or better cast to signed char, as int is signed as well? My 5.

CPallini 22-Jun-15 3:04am

Yes, that is a good observation (and my mistake). I would prefer work with both unsigned, however.

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Frankie-C · Accepted Answer · 2015-06-20T11:59:00

OK now it makes a little bit more sense.
To write an assembler is not such an easy task, at least to encode correct machine code.
In some cases even if a full specified instruction is present there could be more than one legal encoding.
Anyway going back to your question consider that in assembly there can be explicit or implicit operands and operands sizes.
In case of the instruction:

mov eax, 123

It is not fully qualified, but we can get the implicit operand size from the explicit operand. This instruction moves in a 32bits register, the explicit operand, a value. This value must be of same size, so it is a 32bits integer. In 'C' language it would be an 'int'.
In almost all instructions you can extrapolate the operands size from explicit operand. In cases where you can't the assembler use meta qualifiers to make it clear.
I.e.

movsx eax, byte ptr[var];

The instruction movsx load a register with a sign extended value of different size, in our specific case the destination is the 32bits register eax, but the operand is a byte.

Some assemblers allows unqualified types also where there could be more than one operand size, but in that case it must be clearly stated in in the manual to which operand size the assembler will default...