It's possible to write tons of pages discussing such things as code optimization, storing of the data in the executable image and its different layouts, different trade-off between size of code and performance, numerous examples of different ways to produce CPU instructions out of the same high-level code and their advantages and disadvantages, and a lot more, but the truly reasonable answer will remain the same: those file sizes are different because they don't have to be the same.
Answering two follow-up questions:
: same answer; too many reasons to be that way; if you really need to get the feel of it, use some disassembler
and look at the CPU-level code; please see: http://en.wikipedia.org/wiki/Disassembler
: it's impossible to define what is "best", because "better" is not defined; well, there are some compilers which clearly compare to others, for example, those which are apparently bad; but most of the compilers cannot compare; moreover, a compiler which make the shorter code does not have to be better; there are many criteria of quality, and the length of the code is the least significant, unless you are talking embedded system with severely limited resources (which are hardly ever used to write "Hello world!"