Gday, I'm not so sure I understand each of your points and your questions - though I'll do what I can to address them.
- XBox, PS3 actually only have a couple of cores, like PCs.
- PS3 - 8 for the cell, 6 as implemented in ps3 variants.
- Xbox360 - 3 for the PowerPC Xenon
Just like PCs, the massive parallelism is found within the GPUs - I believe you'll see bottle-necks sooner when spawning many processes. This is since each process gets it's own address space. This will mean a greater number of cache-misses.
- Multi-threaded programming can occur on both multi-core chips and single-core chips, provided the chip supports multiple simultaneous threads on each core (SMT), or in Intel parlance - HyperThreading. Even the lowly Intel Atom supports this.
An example of point3 in play - when I create an app to perform simultaneous download of rss feeds, the number of threads may go from 1 to about 15. All I do is spawn new threads. These threads execute on both of the 2 the physical cores in my i3, and all 4 of the logical processors without any direction from me. So my point is, in the i7 whenever you've got multi-threading in place, you're almost certainly (or certainly?) employing multi-core processing.
Because the main memory is now measured in GB, yet the cache memory is still only measured in single-digit MBs - the resource that will be most often taxed is the cache. It's my understanding that multiple processes will eat-up the cache much faster than multiple threads. Cache-misses are generally very expensive in terms of idle cpu-cycles, hence the effort taken to reduce them. - Modern cpus often have cache memory that runs at cpu speed, while main memory is never even close to this figure.
Truly broad and interesting topic for study.