Terrain Rendering





0/5 (0 vote)
Terrain Rendering is a game technology code sample that demonstrates how to render large-scale terrains in real time by efficiently distributing the tasks between the CPU and the GPU. This article provides an overview of the terrain-rendering application and includes a link to the free code.
Brought to you by the Intel® Visual Computing Source
Download the source code and watch the demo
Introduction
This sample demonstrates how to render large-scale terrains on Intel® microarchitecture codename Sandy Bridge in real time by efficiently distributing the tasks between the CPU cores and the processor graphics unit. The sample pre-processes an input height map into a hierarchical quadtree representation which is used to render the terrain with adaptively selected level of detail (LOD). The adaptive simplified triangulation calculated during the pre-processing is compactly encoded to save runtime processing and memory space. LOD construction is asynchronously performed by the CPU cores while rendering is done by the processor graphics unit.
Application
Terrain Rendering is an application using DXUT and Microsoft DirectX* 11 with D3D_FEATURE_LEVEL_10_0. The application handles all rendering, user interaction and GUI. Upon initialization, the application loads all models, allocates resources and compiles shaders. On the first run, the application pre-calculates triangulation, which can take some time (up to one minute), and stores it on the disk. On subsequent runs, the application loads the data from the disk.
Overview
Terrain rendering demonstrates how to render large-scale terrains on Intel microarchitecture codename Sandy Bridge in real time by efficiently distributing the tasks between the CPU cores and the processor graphics unit. The terrain rendering can be optimized by constructing simplified triangulation, which is adaptive to the terrain surface characteristics. Such triangulation contains more primitives in sharp regions with high-frequency details and allocates a small number of large triangles for flat areas. This significantly reduces the total triangle count while providing almost the same visual quality (compare fig. 1 and 2).
Full-resolution triangulation (fig. 1) is more than 5x redundant compared to adaptive (fig. 2), which results in nearly 3x lower performance with almost the same visual quality.
Pre-computing adaptive triangulation is a complex and computation-intensive task. Doing this at runtime could require a significant amount of time and would lead to perceptible stalls or delays. To solve this problem, the application pre-computes adaptive triangulation for the whole terrain at the pre-process stage and stores the resulting data on the disk in a compact representation (see section 4). For example, the whole encoded triangulation for an 8192×8192 terrain consumes approximately 6 MB (compare this to 128 MB required to store the height map). At the runtime stage, this data is used to efficiently construct the triangulation.
The triangulation must be adaptive not only to the terrain surface features, but also to the camera position, because distant terrain regions can be rendered using coarser representation without loss of visual fidelity. To support multiple levels of detail (LODs), the input height map is pre-processed and a patch quadtree data structure [3] is constructed (fig. 3). At runtime, the level of detail selection is done on a per-patch level, not on the triangle level. If the adaptive triangulation were constructed every frame, it would require intensive data transfer between CPU and GPU memory, which is not efficient. With patch-based LOD selection, the data is uploaded to the GPU memory only when new patches are created, which happens once in a number of frames.
Each patch in the hierarchy is a height map of a fixed size with additional vertices required to
seamlessly connect neighboring patches. Each patch covers the same area as its
four direct children, but approximates the terrain with lower accuracy. Each
patch is assigned a unique adaptive triangulation which is pre-computed and
encoded as described in section 4.
For each patch in the hierarchy, a world-space geometric error τ is pre-computed during the pre-processing. This error shows the maximum geometric deviation of the patch polygonal surface to the height map samples at the finest resolution covered by the patch. Given τ and camera position, we can estimate the patch screen-space error, which is the maximum visible deviation of the simplified model to the samples of the original full-detail height map, using the following standard formula [3]:
where W and H are width and height of the view port, are horizontal and vertical fields
of view and dist(V,c) is the distance from the camera c to the nearest point on
the patch bounding box V.
At the runtime stage, an unbalanced patch quadtree is maintained with leaf
nodes satisfying the given screen-space error bound . Each patch in this tree stores
height map, normal map and adaptive triangulation indices. On each frame, a
recursive procedure is executed which updates the tree with respect to new
camera location (fig. 4). The procedure creates new patches and allocates resources
for terrain regions where additional accuracy is required (
), and coarsens the representation
where LOD has become unreasonably high (
).
Constructing a triangulation from the encoded representation is efficiently handled by the CPU cores while rendering is performed by the graphics unit. This technique is useful because it significantly reduces the GPU burden by constructing a simplified adaptive triangulation. The most useful application of this technique would be a system that has excess CPU computational power, but is utilizing all of its GPU power.
Adaptive Triangulation
As it was mentioned above, each patch in the quadtree is assigned its own unique triangulation which is adaptive to local terrain surface features. This triangulation is computed at the pre-process stage and stored on the disk. The adaptive triangulation construction exploits the method described in [1] and [2]. To build the triangulation, all samples of the patch are assigned to different levels as shown in figure 5. Note that to construct an adaptive triangulation, a quadtree data structure is also used. To distinguish this quadtree from the patch quadtree described above, we refer to it as the vertices quadtree.
To guarantee that a triangulation constructed from a vertices quadtree does not contain cracks, the quadtree is restricted with the dependency graph shown in figure 6. Every vertex depends on two other vertices of the same or the next finest level in the vertices quadtree hierarchy. Border vertices depend only on one other vertex. This means that if the vertex is selected for triangulation, then the related ones must be selected too.
The coarsest possible patch triangulation is represented by two right triangles, which are the roots of two triangle binary trees. The vertex in the middle of the triangles’ longest edge is called base. If some vertex is included into the restricted quadtree, it is called enabled, or disabled otherwise. Now, if we have the correct set of enabled vertices (with all dependencies properly kept), a crack-free triangulation can be constructed at runtime using the following simple recursive procedure which starts from the root:
1. if( triangle base vertex is enabled ) 2. { 3. bisect the triangle 4. process two new triangles 5. } 6. else 7. output current triangle to the list
To determine a set of enabled vertices, the following bottom-up algorithm is executed during the pre-process:
1. Clear enabled_array[] with false 2. for ( quadtree level l = finest resolution to coarsest resolution ) 3. for ( each vertex v in level l ) 4. { 5. if ( for all vertices d which v is dependent on: enabled_array[d] == false ) 6. { 7. merge two triangles for which v is base vertex 8. calculate the coalesced triangle world space approximation error e 9. if ( e < threshold ) 10. enabled_array[v] = false 11. else 12. enabled_array[v] = true 13. } 14. else 15. enabled_array[v] = true 16. }
Triangle world-space approximation error is the maximum vertical distance of all vertices covered by the triangle to the triangle plane. Adaptive triangulation constructed with Algorithm 2 described above guarantees that the maximum geometric deviation of the simplified triangulation from the original height map is below the given threshold.
It is now clear that the whole patch triangulation is thoroughly described by the set of flags indicating whether or not the vertex is enabled. These flags can be efficiently encoded during the recursive traversal by outputting 1-bit flags. This is done with the following algorithm, very similar to Algorithm 1:
1. if ( triangle base vertex is enabled ) 2. { 3. output 1 4. bisect the triangle 5. process two new triangles 6. } 7. else 8. output 0
Thus during the pre-processing, Algorithm 2 is first executed to determine the set of enabled vertices, which is followed by Algorithm 3 encoding them. At runtime, Algorithm 1 is executed, which uses pre-computed data to construct the adaptive triangulation.
Implementation Details
The terrain rendering system is built up as a collection of logically independent components. The principal architecture of the system is shown in figure 9.
The system contains an elevation data source (implemented with CElevationDataSource
class) and an encoded triangulation
data source (implemented as CTriangDataSource
class). The former provides access to the elevation data
through the GetElevData()
method that creates an instance of CPatchElevationData
class and returns pointer to it. The CPatchElevationData
class provides access to the stored
height map data through the GetDataPtr()
method.
The triangulation data source follows the same philosophy: when it is necessary
to create an adaptive triangulation for some patch, it creates an instance of CRQTTriangulation
class with the call to DecodeTriangulation()
. The triangulation indices are
generated by the call to GenerateIndices()
. CRQTTriangulation
class is also responsible for
determining enabled vertices and implements the described above Algorithm 2 by
the CTriangDataSource::CreateAdaptiveTriangulation()
method.
Algorithms 1 and 3 are implemented by the CRQTTriangulation::RecursiveGenerateIndices()
method. The method can operate in two
modes, encoding and decoding (indicated by the m_bIsEncodingMode
flag). In the first mode, the method
encodes the triangulation using enabled flags (pre-computed by Algorithm 2) as
input. Depending on the triangle level and orientation, the method determines
which vertex is its base. It then reads its enabled/disabled status from the
array and outputs a corresponding 1-bit flag into the output bit stream.
In decoding mode, the method reads the flags from the bit stream (which is loaded from the disk) and sets appropriate enable/disable status in the enabled flags array. At the same time, the method generates the triangulation.
All patch Microsoft Direct3D* resources are stored in an instance of CTerrainPatch
class. The resource management is
handled by the D3D resource cache. When a new texture is required, the cache
attempts to find an appropriate unused resource. If there are no spare
resources, the cache creates a new one. When resource is no longer needed, it
is not released, but placed into the cache. The cache is thread-safe so a
number of threads can access it simultaneously.
The quadtree construction is implemented by the CBlockBasedAdaptiveModel
class, while Microsoft DirectX*
11-specific methods are separated in CAdaptiveModelDX11Render
.
Asynchronous Task Execution
At the runtime stage, a quadtree-based adaptive
view-dependent terrain model is maintained. For this purpose, required patch
elevation data and adaptive triangulation are extracted from the corresponding
data sources, and terrain patches are created. To hide processing time required
to perform these tasks and eliminate stalls, the level-of-detail processing is
done asynchronously. Since LOD can be either increased or decreased, there are
two types of tasks that can be performed by the system, which are implemented
by CIncreaseLODTask
and CDecreaseLODTask
classes. These classes are derived
from the base class CTaskBase
which exposes Execute()
virtual method. To manage the tasks,
the system exploits TaskMgrTbb
component.
Normal Map Compression
A normal map is used to shade the terrain surface. There is no need to store the normal map on the disk because it can be calculated in runtime from the patch height map. This saves disk space but requires moderate additional computations. Besides, if the terrain is dynamic and can be deformed, the normal map cannot be statically pre-computed. To improve visual quality, normal resolution can be higher than that of the height map. For instance, if the height map has resolution (2^n+1)×(2^n+1) samples, then normal map could have 4x higher resolution (4∙(2^n+1)×4∙(2^n+1) samples). To reduce memory storage requirements, normal maps are kept in compressed form. BC3 compression format is exploited that enables storing the normal map using 1 byte per normal. The compression is done asynchronously; the DXTCompressor component is exploited for this purpose.
Performance
Figure 10 shows the performance during the recorded fly-over captured on an Intel microarchitecture codename Sandy Bridge-based machine at 1280x1024 screen resolution.
With a three-pixel error threshold, the LOD changes are not annoying, while the average performance is more than 100 fps on the Intel microarchitecture codename Sandy Bridge-based machine. With a five- pixel threshold, geometry changes become much more apparent, but the performance increases by a factor of 1.5x.
The time that is required to increase LOD primarily depends on the selected normal map level-of-detail bias. With 4x normal map up-sample factor, it takes approximately 44 ms to process one patch on one core. Thus, for a model consisting of 150 patches, it takes approximately 6.6 seconds to construct the whole model on one core. Since all patches are processed independently, the workload could be evenly distributed across available cores, and as a result the time scales well. Note also that during a typical fly-over, only a number of patches are updated during one second (usually less than 10). If the camera position drastically changes, the model will asynchronously be updated during a number of frames, which will cause no stalls. Note that until the model is updated, it will be rendered in coarse resolution.
References
- P. Lindstrom, D. Koller, W. Ribarsky, L. F. Hodges, N. Faust, and G. A. Turner. Real-time, continuous level of detail rendering of height fields. In Proc. SIGGRAPH 96, pages 109-118. ACM SIGGRAPH, 1996.
- Renato Pajarola. Large scale terrain visualization using the restricted quadtree triangulation. In Proceedings Visualization 98, pages 19-26 and 515. IEEE Computer Society Press, 1998.
- Thatcher Ulrich. Rendering massive terrains using chunked level of detail control. SIGGRAPH Course Notes (2002). Volume 3, Issue 5.