I looks like a bit of is artificial problem. You never look at 400 images at a time. However the problem of optimization of latency due to loading time is interesting.
You need to load only as many images as you can see at one screen. It should not much more that just one high-resolution image with number of pixels several times more than the screen has, with re-sampling. I'm talking about this example just for speed comparison — it takes quite acceptable amount of time, so rendering the whole screen covered by images would take acceptable amount of time.
This latency can be improved be putting in some image cache some 100-200% of images (compared to screen-size) in memory in case the access of all images is consecutive, and most typical order of presentation is scrolling by one row. Consider some typical example: a typical screen shows 8x4 images, 4 rows of 8 images. The whole show may be many gigabytes, but you look only at some 32 images at a time. You can keep in memory some 32 * 3 image: 4 rows above the screen, 4 rows shown on screen and 4 rows below. If you scroll all your show just by one row or by 4 rows, you first render only the images already loaded in memory. When scrolled in view images are already presented, you can remove 1-4 rows from memory and load 1-4 rows to keep 4 + 4 + 4 rows in memory. In this way, the image loading happens behind the scene, when the user does not have to wait for it.
In this way, if the user tend to scroll the presentation mostly be 1-4 rows (where 4 rows is the full screen) in any direction, almost 100% of
cache hits is experienced.
Cache miss happens more rarely; and the latency is still acceptable if the images coming in view are loaded first, by the reasons
This strategy is typical for all uses of cache, such as CPU cache. The principles are pretty much the same and are well explained here:
http://en.wikipedia.org/wiki/Cache_memory[
^].
—SA