This article will
cover the process of debugging and evaluating Android based games and apps for
performance hotspots using Intel’s Graphics Performance Analyzers. Giving an
overview to the tool and actual usage with a live 3D game on the app store to
detect its real and current performance bottlenecks. The motivation behind this
article is to encourage you to use performance analysis tools to methodically
target code to optimize, rather than going in and blindly attempting to
Performance Analyzer is a collection of tools that let you analyze Desktop and
Android based apps. For our focus in this article, we will install and run the
Android version of the tool on a Mac.
Download the Android
version of the tools, unzip, and run the System Analyzer.dmg installer.
Running Intel GPA
When you first run the app, you’ll be presented with a
screen offering you the option of typing in an IP address or connecting to any
auto-detected Android devices (note the Intel GPA System Analyzer can only
analyze Android devices with Intel based architectures, for this walkthrough
I’m using a Samsung Galaxy Tab3).
you’ll be offered a list of installed apps on the device which can be run in
Now simply select
the app you’d like to analyze, and your app will be auto-launch and the
analysis will begin.
Once you have the Intel GPA tool running, your
immediatley greeted with a list of performance metrics and overrides you can
In the CPU sub-group
you can view the performance load of each individual CPU, but more useful is
the Aggregate and Target App CPU Load. By dragging these two metrics on to the
right panel, you can view just how much your app is using the CPU vs the overall
system usage to enable you to detect if any performance issues are being caused
by background operations.
In the Device IO
sub-group you can view disk and network activity, which can give you a quick
indication if your app’s performance is being affected by IO operations.
The GPU sub-section gives you direct metrics on the actual load on the GPU, with metrics on tiling operations, to verticies per second, to texture and shader processing loads.
- TA Load and USSE Vertex Load: TA Load tels you the time that the Tile Accelerator is being used. USSE Vertex Load tells you the percentage of time the shader engine is processing vertex instructions. Ideally these loads should be balanced for best performance. However, if the vertex load is low and the TA load is high, it means the scene is too complex. On the other hand, if the vertex load is high and the TA load is low, then it means that the vertex shader should targeted for optimization.
- PB Primitives/Second: If this value is high, it indicates that your bottleneck is your vertex format size.
- PB Vertices/Second: If this value is high, it’s worth inspecting the amount of vertex data you’re passing between the vertex and fragment shader.
- PB Vertices/Primitive: If this value is high, it’s worth looking into reducing the LOD of your models or sharing vertices with index buffers.
- ISP Load: The Image Synthesis Processor is responsible for hidden surface removal, if this metric is high it’s worth looking into implementing a software culling technique or order your draw calls. It’s also worth looking into if your Z-buffer is being used to manage several render targets, and if so, create a Z-buffer per render target.
- TSP Load, Texture Unit Load and USSE Pixel Load: The TSP metric gives you a percentage time that the Texture and Shading processor is busy. The Texture Unit Load tells you how busy the texture units are, and the USSE Pixel Load tells you how much time the shader units are processing pixel instructions. Using these metrics in combination will help you gauge which area to optimize. If your TSP Load is high, by looking at the Texture Unit Load and USSE Pixel Load you can deduce if the load is occurring because of texturing or shader complexity. If the Texture Unit load is high, then it’s worth optimizing your texture types by either using compression or reducing the resolution. However if the USSE Pixel Load is high, then it’s worth investigating the complexity of your fragment shaders.
- USSE Total Load: Tells you the percentage time that the shader units are being used. Worth using along side the Pixel and Vertex Loads to deduce which area is the bottleneck.
- USSE Cycles/Vertex, Cycles/Pixel and Stall Load: Gives you immediate stats on the processing efficiency of your frament and vertex shaders.
sub-section gives you an instant overview of your app’s memory usage and the available
system memory. This can be a very quick and useful check for your app leaking
memory, or if your performance issues are related to lack of available system
gives you really useful metrics for 3D apps and games, with instant access to
the FPS count, frame times, number of buffer creations, draw calls and state
The best way to
explain these metrics going through each one individually here:
- Draw Calls &
Indexed Draw Calls: Draw calls tend to be an expensive operation in the world
of 3D graphics, by looking at this metric, you can decide if perhaps you’re
making too many individual calls per frame and need to compact your verticies
into one draw call.
- Vertex Count &
Indexed Vertex Count: Gives you immediate stats on the number of verticies
you’re pumping through.
- FPS & Frame Time:
The ever critical frames per second metric, anything below 60fps or above 16ms
frame time, means no longer butter smooth.
- Buffer Creations:
Depending on your application, buffer creations should occur in the setup
phase, as they tend to be a slow operation, if you find that you’re creating
buffers at run-time, it means there’s a good chance you can optimize this area.
- Error Gets: Another
slow but useful for debugging operation, you should really be aiming for 0
- State Metrics: This
section gives you immediate metrics on the amount of state changes be it in
total, or individually as texture/shader/buffer changes you’re making. If you
find that your GPU bound then, looking to reduce the state changes by batching
similar draw calls together is a direction you’d be advised to go.
gives you immediate stats on the battery. Though it isn’t really useful for
performance, it is pretty cool to gain understanding on your device.
The state overrides
sub-section gives you a list of live experiments you can run, in order to test
causes for your performance bottlenecks.
1x1 Scissor Rect
& Simple Fragment Shader
The 1x1 scissor rect
disables pixel rendering and the simple fragment shader replaces your current
fragment shader with a simple colour output. Both these overrides let you test
if your app is fragment shader bound. If performance doesn’t increase, you’re
best served not attempting to optimize any fragment shaders, but instead
checking your vertex counts and draw calls.
operations tend to be the biggest performance killer for graphics intensive
apps such as games. By using this test, you can quickly see how disabling
transparency has on your app’s performance.
A really simple but
useful override to let you check if your app’s making too many draw calls. If
so you should look into batching the vertex data of the calls together into one
The Z-buffer is
typically used to clip objects that are being drawn behind objects. Running
this override should slow down your rendering, if it doesn’t it means that
you’re probably drawing too many objects in back-to-front order, meaning that
the renderer is constantly rendering new objects on top of background objects.
By implementing a front-to-back sort or manual occlusion code, your frame rate
could be improved.
This override shows
your render in wireframe mode, which helps you debug what your meshes look like
and how they’re layered out in your scene.
This override gives
you a quick and easy test to find out if your app is texture bound. If when you
turn this override on, your apps performance increases, you should try optimize
your textures. This can be done by either reducing the number of textures, their
resolution, their filter settings or how they’re used in your shaders.
Use Case: Finding the performance bottleneck in an app
The app we’ll be
testing is here is called PLAYIR; which is a 3D multiplayer game designer app
that lets you create and publish games across mobile and web devices using drag
and drop UX, and real-time source code editing. For this use case, we’re going
to load up one of the games inside called World of Fighters and experiment with
how introducing more characters in the game affects our performance bottleneck.
In the default
gameplay scene here, there’s a few of trees and two characters roaming around
the level. If we inspect our app using the GPA tools we can see the frame rate
hitting around 60fps.
Now when we ramp up
the number of enemies to 50, the frame rate drops to 8.
Without using analysis tools, our first thoughts may
be that the frame rate has dropped due to more characters having to be drawn on
the screen, and we’d go ahead and start pre-maturally optimizing the rendering
However, if we
actually test this assumption by enabling the 1x1 Scissor Rect and Disable Draw
Calls state overrides.
We notice that in fact the frame rate stays the same. Which means that in fact
optimizing any rendering would yield no actual performance benefits. Instead
our app is CPU bound and we should instead look to investigate other areas of
the code (i.e. the AI or Physics subroutines).
from this guide, you should now be motivated to at least fire up your app using
Intel’s GPA tool and check out some of the
metrics. Be it simply checking your memory and battery usage to going in and
finding out what your actual performance bottleneck is, as we did in the case
of PLAYIR. In the world of 3D graphics, it’s very easy to go on the assumption
that drawing more things, means slower performance. But it’s always a good idea
to first your assumptions, as you may find that your time would be better
invested optimizing other areas of the code.
To learn more about Intel tools for the Android developer, visit Intel® Developer Zone for Android.
Other Related Articles
To learn more about Intel tools for the Android developer, visit Intel® Developer Zone for Android.