Docs: New document on interpreting Profile GPU tool results
am: b65b81a074
Change-Id: I1eebc84f5bf857dedcd140b6e4dc991be17c11e8
This commit is contained in:
BIN
docs/html/topic/performance/images/bars.png
Normal file
BIN
docs/html/topic/performance/images/bars.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 330 KiB |
BIN
docs/html/topic/performance/images/s-profiler-legend.png
Normal file
BIN
docs/html/topic/performance/images/s-profiler-legend.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 81 KiB |
406
docs/html/topic/performance/profile-gpu.jd
Normal file
406
docs/html/topic/performance/profile-gpu.jd
Normal file
@@ -0,0 +1,406 @@
|
||||
page.title=Analyzing Rendering with Profile GPU
|
||||
page.metaDescription=Use the Profile GPU tool to help you optimize your app's rendering performance.
|
||||
|
||||
meta.tags="power"
|
||||
page.tags="power"
|
||||
|
||||
@jd:body
|
||||
|
||||
<div id="qv-wrapper">
|
||||
<div id="qv">
|
||||
|
||||
<h2>In this document</h2>
|
||||
<ol>
|
||||
<li>
|
||||
<a href="#visrep">Visual Representation</a></li>
|
||||
</li>
|
||||
|
||||
<li>
|
||||
<a href="#sam">Stages and Their Meanings</a>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<a href="#sv">Input Handling</a>
|
||||
</li>
|
||||
<li>
|
||||
<a href="#asd">Animation</a>
|
||||
</li>
|
||||
<li>
|
||||
<a href="#asd">Measurement/Layout</a>
|
||||
</li>
|
||||
<li>
|
||||
<a href="#asd">Drawing</a>
|
||||
</li>
|
||||
</li>
|
||||
<li>
|
||||
<a href="#asd">Sync/Upload</a>
|
||||
</li>
|
||||
<li>
|
||||
<a href="#asd">Issuing Commands</a>
|
||||
</li>
|
||||
<li>
|
||||
<a href="#asd">Processing/Swapping Buffer</a>
|
||||
</li>
|
||||
<li>
|
||||
<a href="#asd">Miscellaneous</a>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ol>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<p>
|
||||
The <a href="/studio/profile/dev-options-rendering.html">
|
||||
Profile GPU Rendering</a> tool indicates the relative time that each stage of
|
||||
the rendering pipeline takes to render the previous frame. This knowledge
|
||||
can help you identify bottlenecks in the pipeline, so that you
|
||||
can know what to optimize to improve your app's rendering performance.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
This page briefly explains what happens during each pipeline stage, and
|
||||
discusses issues that can cause bottlenecks there. Before reading
|
||||
this page, you should be familiar with the information presented in the
|
||||
<a href="/studio/profile/dev-options-rendering.html">Profile GPU
|
||||
Rendering Walkthrough</a>. In addition, to understand how all of the
|
||||
stages fit together, it may be helpful to review
|
||||
<a href="https://www.youtube.com/watch?v=we6poP0kw6E&index=64&list=PLWz5rJ2EKKc9CBxr3BVjPTPoDPLdPIFCE">
|
||||
how the rendering pipeline works.</a>
|
||||
</p>
|
||||
|
||||
<h2 id="#visrep">Visual Representation</h2>
|
||||
|
||||
<p>
|
||||
The Profile GPU Rendering tool displays stages and their relative times in the
|
||||
form of a graph: a color-coded histogram. Figure 1 shows an example of
|
||||
such a display.
|
||||
</p>
|
||||
|
||||
<img src="{@docRoot}topic/performance/images/bars.png">
|
||||
<p class="img-caption">
|
||||
<strong>Figure 1.</strong> Profile GPU Rendering Graph
|
||||
</p>
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Each segment of each vertical bar displayed in the Profile GPU Rendering
|
||||
graph represents a stage of the pipeline and is highlighted using a specific
|
||||
color in
|
||||
the bar graph. Figure 2 shows a key to the meaning of each displayed color.
|
||||
</p>
|
||||
|
||||
<img src="{@docRoot}topic/performance/images/s-profiler-legend.png">
|
||||
<p class="img-caption">
|
||||
<strong>Figure 2.</strong> Profile GPU Rendering Graph Legend
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Once you understand what each color signfiies,
|
||||
you can target specific aspects of your
|
||||
app to try to optimize its rendering performance.
|
||||
</p>
|
||||
|
||||
<h2 id="sam">Stages and Their Meanings</a></h2>
|
||||
|
||||
<p>
|
||||
This section explains what happens during each stage corresponding
|
||||
to a color in Figure 2, as well as bottleneck causes to look out for.
|
||||
</p>
|
||||
|
||||
|
||||
<h3 id="ih">Input Handling</h3>
|
||||
|
||||
<p>
|
||||
The input handling stage of the pipeline measures how long the app
|
||||
spent handling input events. This metric indicates how long the app
|
||||
spent executing code called as a result of input event callbacks.
|
||||
</p>
|
||||
|
||||
<h4>When this segment is large</h4>
|
||||
|
||||
<p>
|
||||
High values in this area are typically a result of too much work, or
|
||||
too-complex work, occurring inside the input-handler event callbacks.
|
||||
Since these callbacks always occur on the main thread, solutions to this
|
||||
problem focus on optimizing the work directly, or offloading the work to a
|
||||
different thread.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
It’s also worth noting that {@link android.support.v7.widget.RecyclerView}
|
||||
scrolling can appear in this phase.
|
||||
{@link android.support.v7.widget.RecyclerView} scrolls immediately when it
|
||||
consumes the touch event. As a result,
|
||||
it can inflate or populate new item views. For this reason, it’s important to
|
||||
make this operation as fast as possible. Profiling tools like Traceview or
|
||||
Systrace can help you investigate further.
|
||||
</p>
|
||||
|
||||
<h3 id="at">Animation</h3>
|
||||
|
||||
<p>
|
||||
The Animations phase shows you just how long it took to evaluate all the
|
||||
animators that were running in that frame. The most common animators are
|
||||
{@link android.animation.ObjectAnimator},
|
||||
{@link android.view.ViewPropertyAnimator}, and
|
||||
<a href="/training/transitions/overview.html">Transitions</a>.
|
||||
</p>
|
||||
|
||||
<h4>When this segment is large</h4>
|
||||
|
||||
<p>
|
||||
High values in this area are typically a result of work that’s executing due
|
||||
to some property change of the animation. For example, a fling animation,
|
||||
which scrolls your {@link android.widget.ListView} or
|
||||
{@link android.support.v7.widget.RecyclerView}, causes large amounts of view
|
||||
inflation and population.
|
||||
</p>
|
||||
|
||||
<h3 id="ml">Measurement/Layout</h3>
|
||||
|
||||
<p>
|
||||
In order for Android to draw your view items on the screen, it executes
|
||||
two specific operations across layouts and views in your view hierarchy.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
First, the system measures the view items. Every view and layout has
|
||||
specific data that describes the size of the object on the screen. Some views
|
||||
can have a specific size; others have a size that adapts to the size
|
||||
of the parent layout container
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Second, the system lays out the view items. Once the system calculates
|
||||
the sizes of children views, the system can proceed with layout, sizing
|
||||
and positioning the views on the screen.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The system performs measurement and layout not only for the views to be drawn,
|
||||
but also for the parent hierarchies of those views, all the way up to the root
|
||||
view.
|
||||
</p>
|
||||
|
||||
<h4>When this segment is large</h4>
|
||||
|
||||
<p>
|
||||
If your app spends a lot of time per frame in this area, it is
|
||||
usually either because of the sheer volume of views that need to be
|
||||
laid out, or problems such as
|
||||
<a href="/topic/performance/optimizing-view-hierarchies.html#double">
|
||||
double taxation</a> at the wrong spot in your
|
||||
hierarchy. In either of these cases, addressing performance involves
|
||||
<a href="/topic/performance/optimizing-view-hierarchies.html">improving
|
||||
the performance of your view hierarchies</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Code that you’ve added to
|
||||
{@link android.view.View#onLayout(boolean, int, int, int, int)} or
|
||||
{@link android.view.View#onMeasure(int, int)}
|
||||
can also cause performance
|
||||
issues. <a href="/studio/profile/traceview.html">Traceview</a> and
|
||||
<a href="/studio/profile/systrace.html">Systrace</a> can help you examine
|
||||
the callstacks to identify problems your code may have.
|
||||
</p>
|
||||
|
||||
<h3 id="draw">Drawing</h3>
|
||||
|
||||
<p>
|
||||
The draw stage translates a view’s rendering operations, such as drawing
|
||||
a background or drawing text, into a sequence of native drawing commands.
|
||||
The system captures these commands into a display list.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The Draw bar records how much time it takes to complete capturing the commands
|
||||
into the display list, for all the views that needed to be updated on the screen
|
||||
this frame. The measured time applies to any code that you have added to the UI
|
||||
objects in your app. Examples of such code may be the
|
||||
{@link android.view.View#onDraw(android.graphics.Canvas) onDraw()},
|
||||
{@link android.view.View#dispatchDraw(android.graphics.Canvas) dispatchDraw()},
|
||||
and the various <code>draw ()methods</code> belonging to the subclasses of the
|
||||
{@link android.graphics.drawable.Drawable} class.
|
||||
</p>
|
||||
|
||||
<h4>When this segment is large</h4>
|
||||
|
||||
<p>
|
||||
In simplified terms, you can understand this metric as showing how long it took
|
||||
to run all of the calls to
|
||||
{@link android.view.View#onDraw(android.graphics.Canvas) onDraw()}
|
||||
for each invalidated view. This
|
||||
measurement includes any time spent dispatching draw commands to children and
|
||||
drawables that may be present. For this reason, when you see this bar spike, the
|
||||
cause could be that a bunch of views suddenly became invalidated. Invalidation
|
||||
makes it necessary to regenerate views' display lists. Alternatively, a
|
||||
lengthy time may be the result of a few custom views that have some extremely
|
||||
complex logic in their
|
||||
{@link android.view.View#onDraw(android.graphics.Canvas) onDraw()} methods.
|
||||
</p>
|
||||
|
||||
<h3 id="su">Sync/Upload</h3>
|
||||
|
||||
<p>
|
||||
The Sync & Upload metric represents the time it takes to transfer
|
||||
bitmap objects from CPU memory to GPU memory during the current frame.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
As different processors, the CPU and the GPU have different RAM areas
|
||||
dedicated to processing. When you draw a bitmap on Android, the system
|
||||
transfers the bitmap to GPU memory before the GPU can render it to the
|
||||
screen. Then, the GPU caches the bitmap so that the system doesn’t need to
|
||||
transfer the data again unless the texture gets evicted from the GPU texture
|
||||
cache.
|
||||
</p>
|
||||
|
||||
<p class="note"><strong>Note:</strong> On Lollipop devices, this stage is
|
||||
purple.
|
||||
</p>
|
||||
|
||||
<h4>When this segment is large</h4>
|
||||
|
||||
<p>
|
||||
All resources for a frame need to reside in GPU memory before they can be
|
||||
used to draw a frame. This means that a high value for this metric could mean
|
||||
either a large number of small resource loads or a small number of very large
|
||||
resources. A common case is when an app displays a single bitmap that’s
|
||||
close to the size of the screen. Another case is when an app displays a
|
||||
large number of thumbnails.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
To shrink this bar, you can employ techniques such as:
|
||||
</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
Ensuring your bitmap resolutions are not much larger than the size at which they
|
||||
will be displayed. For example, your app should avoid displaying a 1024x1024
|
||||
image as a 48x48 image.
|
||||
</li>
|
||||
|
||||
<li>
|
||||
Taking advantage of {@link android.graphics.Bitmap#prepareToDraw()}
|
||||
to asynchronously pre-upload a bitmap before the next sync phase.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<h3 id="ic">Issuing Commands</h3>
|
||||
|
||||
<p>
|
||||
The <em>Issue Commands</em> segment represents the time it takes to issue all
|
||||
of the commands necessary for drawing display lists to the screen.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
For the system to draw display lists to the screen, it sends the
|
||||
necessary commands to the GPU. Typically, it performs this action through the
|
||||
<a href="/guide/topics/graphics/opengl.html">OpenGL ES</a> API.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
This process takes some time, as the system performs final transformation
|
||||
and clipping for each command before sending the command to the GPU. Additional
|
||||
overhead then arises on the GPU side, which computes the final commands. These
|
||||
commands include final transformations, and additional clipping.
|
||||
</p>
|
||||
|
||||
<h4>When this segment is large</h4>
|
||||
|
||||
<p>
|
||||
The time spent in this stage is a direct measure of the complexity and
|
||||
quantity of display lists that the system renders in a given
|
||||
frame. For example, having many draw operations, especially in cases where
|
||||
there's a small inherent cost to each draw primitive, could inflate this time.
|
||||
For example:
|
||||
</p>
|
||||
|
||||
<pre>
|
||||
for (int i = 0; i < 1000; i++)
|
||||
canvas.drawPoint()
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
is a lot more expensive to issue than:
|
||||
</p>
|
||||
|
||||
<pre>
|
||||
canvas.drawPoints(mThousandPointArray);
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
There isn’t always a 1:1 correlation between issuing commands and
|
||||
actually drawing display lists. Unlike <em>Issue Commands</em>,
|
||||
which captures the time it takes to send drawing commands to the GPU,
|
||||
the <em>Draw</em> metric represents the time that it took to capture the issued
|
||||
commands into the display list.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
This difference arises because the display lists are cached by
|
||||
the system wherever possible. As a result, there are situations where a
|
||||
scroll, transform, or animation requires the system to re-send a display
|
||||
list, but not have to actually rebuild it—recapture the drawing
|
||||
commands—from scratch. As a result, you can see a high “Issue
|
||||
commands” bar without seeing a high <em>Draw commands</em> bar.
|
||||
</p>
|
||||
|
||||
<h3 id="psb">Processing/Swapping Buffers</h3>
|
||||
|
||||
<p>
|
||||
Once Android finishes submitting all its display list to the GPU,
|
||||
the system issues one final command to tell the graphics driver that it's
|
||||
done with the current frame. At this point, the driver can finally present
|
||||
the updated image to the screen.
|
||||
</p>
|
||||
|
||||
<h4>When this segment is large</h4>
|
||||
|
||||
<p>
|
||||
It’s important to understand that the GPU executes work in parallel with the
|
||||
CPU. The Android system issues draw commands to the GPU, and then moves on to
|
||||
the next task. The GPU reads those draw commands from a queue and processes
|
||||
them.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
In situations where the CPU issues commands faster than the GPU
|
||||
consumes them, the communications queue between the processors can become
|
||||
full. When this occurs, the CPU blocks, and waits until there is space in the
|
||||
queue to place the next command. This full-queue state arises often during the
|
||||
<em>Swap Buffers</em> stage, because at that point, a whole frame’s worth of
|
||||
commands have been submitted.
|
||||
</p>
|
||||
|
||||
</p>
|
||||
The key to mitigating this problem is to reduce the complexity of work occurring
|
||||
on the GPU, in similar fashion to what you would do for the “Issue Commands”
|
||||
phase.
|
||||
</p>
|
||||
|
||||
|
||||
<h3 id="mt">Miscellaneous</h3>
|
||||
|
||||
<p>
|
||||
In addition to the time it takes the rendering system to perform its work,
|
||||
there’s an additional set of work that occurs on the main thread and has
|
||||
nothing to do with rendering. Time that this work consumes is reported as
|
||||
<em>misc time</em>. Misc time generally represents work that might be occurring
|
||||
on the UI thread between two consecutive frames of rendering.
|
||||
</p>
|
||||
|
||||
<h4>When this segment is large</h4>
|
||||
|
||||
<p>
|
||||
If this value is high, it is likely that your app has callbacks, intents, or
|
||||
other work that should be happening on another thread. Tools such as
|
||||
<a href="/studio/profile/traceview.html">Method
|
||||
Tracing</a> or <a href="/studio/profile/systrace.html">Systrace</a> can provide
|
||||
visibility into the tasks that are running on
|
||||
the main thread. This information can help you target performance improvements.
|
||||
</p>
|
||||
Reference in New Issue
Block a user