550 lines
22 KiB
Plaintext
550 lines
22 KiB
Plaintext
page.title=Designing for Performance
|
|
@jd:body
|
|
|
|
<p>An Android application should be fast. Well, it's probably more accurate to
|
|
say that it should be <em>efficient</em>. That is, it should execute as
|
|
efficiently as possible in the mobile device environment, with its limited
|
|
computing power and data storage, smaller screen, and constrained battery life.
|
|
|
|
<p>As you develop your application, keep in mind that, while the application may
|
|
perform well enough in your emulator, running on your dual-core development
|
|
computer, it will not perform that well when run a mobile device — even
|
|
the most powerful mobile device can't match the capabilities of a typical
|
|
desktop system. For that reason, you should strive to write efficient code, to
|
|
ensure the best possible performance on a variety of mobile devices.</p>
|
|
|
|
<p>Generally speaking, writing fast or efficient code means keeping memory
|
|
allocations to a minimum, writing tight code, and avoiding certain language and
|
|
programming idioms that can subtly cripple performance. In object-oriented
|
|
terms, most of this work takes place at the <em>method</em> level, on the order of
|
|
actual lines of code, loops, and so on.</p>
|
|
|
|
<p>This document covers these topics: </p>
|
|
<ul>
|
|
<li><a href="#intro">Introduction</a></li>
|
|
<li><a href="#object_creation">Avoid Creating Objects</a></li>
|
|
<li><a href="#native_methods">Use Native Methods</a></li>
|
|
<li><a href="#prefer_virtual">Prefer Virtual Over Interface</a></li>
|
|
<li><a href="#prefer_static">Prefer Static Over Virtual</a></li>
|
|
<li><a href="#internal_get_set">Avoid Internal Getters/Setters</a></li>
|
|
<li><a href="#cache_fields">Cache Field Lookups</a></li>
|
|
<li><a href="#use_final">Declare Constants Final</a></li>
|
|
<li><a href="#foreach">Use Enhanced For Loop Syntax With Caution</a></li>
|
|
<li><a href="#avoid_enums">Avoid Enums</a></li>
|
|
<li><a href="#package_inner">Use Package Scope with Inner Classes</a></li>
|
|
<li><a href="#avoidfloat">Avoid Float</a> </li>
|
|
<li><a href="#samples">Some Sample Performance Numbers</a> </li>
|
|
<li><a href="#closing_notes">Closing Notes</a></li>
|
|
</ul>
|
|
|
|
<a name="intro" id="intro"></a>
|
|
<h2>Introduction</h2>
|
|
|
|
<p>There are two basic rules for resource-constrained systems:</p>
|
|
|
|
<ul>
|
|
<li>Don't do work that you don't need to do.</li>
|
|
<li>Don't allocate memory if you can avoid it.</li>
|
|
</ul>
|
|
|
|
<p>All the tips below follow from these two basic tenets.</p>
|
|
|
|
<p>Some would argue that much of the advice on this page amounts to "premature
|
|
optimization." While it's true that micro-optimizations sometimes make it
|
|
harder to develop efficient data structures and algorithms, on embedded
|
|
devices like handsets you often simply have no choice. For instance, if you
|
|
bring your assumptions about VM performance on desktop machines to Android,
|
|
you're quite likely to write code that exhausts system memory. This will bring
|
|
your application to a crawl — let alone what it will do to other programs
|
|
running on the system!</p>
|
|
|
|
<p>That's why these guidelines are important. Android's success depends on
|
|
the user experience that your applications provide, and that user experience
|
|
depends in part on whether your code is responsive and snappy, or slow and
|
|
aggravating. Since all our applications will run on the same devices, we're
|
|
all in this together, in a way. Think of this document as like the rules of
|
|
the road you had to learn when you got your driver's license: things run
|
|
smoothly when everybody follows them, but when you don't, you get your car
|
|
smashed up.</p>
|
|
|
|
<p>Before we get down to brass tacks, a brief observation: nearly all issues
|
|
described below are valid whether or not the VM features a JIT compiler. If I
|
|
have two methods that accomplish the same thing, and the interpreted execution
|
|
of foo() is faster than bar(), then the compiled version of foo() will
|
|
probably be as fast or faster than compiled bar(). It is unwise to rely on a
|
|
compiler to "save" you and make your code fast enough.</p>
|
|
|
|
<a name="object_creation"></a>
|
|
<h2>Avoid Creating Objects</h2>
|
|
|
|
<p>Object creation is never free. A generational GC with per-thread allocation
|
|
pools for temporary objects can make allocation cheaper, but allocating memory
|
|
is always more expensive than not allocating memory.</p>
|
|
|
|
<p>If you allocate objects in a user interface loop, you will force a periodic
|
|
garbage collection, creating little "hiccups" in the user experience.</p>
|
|
|
|
<p>Thus, you should avoid creating object instances you don't need to. Some
|
|
examples of things that can help:</p>
|
|
|
|
<ul>
|
|
<li>When extracting strings from a set of input data, try
|
|
to return a substring of the original data, instead of creating a copy.
|
|
You will create a new String object, but it will share the char[]
|
|
with the data.</li>
|
|
<li>If you have a method returning a string, and you know that its result
|
|
will always be appended to a StringBuffer anyway, change your signature
|
|
and implementation so that the function does the append directly,
|
|
instead of creating a short-lived temporary object.</li>
|
|
</ul>
|
|
|
|
<p>A somewhat more radical idea is to slice up multidimensional arrays into parallel
|
|
single one-dimension arrays:</p>
|
|
|
|
<ul>
|
|
<li>An array of ints is a much better than an array of Integers,
|
|
but this also generalizes to the fact that two parallel arrays of ints
|
|
are also a <strong>lot</strong> more efficient than an array of (int,int)
|
|
objects. The same goes for any combination of primitive types.</li>
|
|
<li>If you need to implement a container that stores tuples of (Foo,Bar)
|
|
objects, try to remember that two parallel Foo[] and Bar[] arrays are
|
|
generally much better than a single array of custom (Foo,Bar) objects.
|
|
(The exception to this, of course, is when you're designing an API for
|
|
other code to access; in those cases, it's usually better to trade
|
|
correct API design for a small hit in speed. But in your own internal
|
|
code, you should try and be as efficient as possible.)</li>
|
|
</ul>
|
|
|
|
<p>Generally speaking, avoid creating short-term temporary objects if you
|
|
can. Fewer objects created mean less-frequent garbage collection, which has
|
|
a direct impact on user experience.</p>
|
|
|
|
<a name="native_methods" id="native_methods"></a>
|
|
<h2>Use Native Methods</h2>
|
|
|
|
<p>When processing strings, don't hesitate to use specialty methods like
|
|
String.indexOf(), String.lastIndexOf(), and their cousins. These are typically
|
|
implemented in C/C++ code that easily runs 10-100x faster than doing the same
|
|
thing in a Java loop.</p>
|
|
|
|
<p>The flip side of that advice is that punching through to a native
|
|
method is more expensive than calling an interpreted method. Don't use native
|
|
methods for trivial computation, if you can avoid it.</p>
|
|
|
|
<a name="prefer_virtual" id="prefer_virtual"></a>
|
|
<h2>Prefer Virtual Over Interface</h2>
|
|
|
|
<p>Suppose you have a HashMap object. You can declare it as a HashMap or as
|
|
a generic Map:</p>
|
|
|
|
<pre>Map myMap1 = new HashMap();
|
|
HashMap myMap2 = new HashMap();</pre>
|
|
|
|
<p>Which is better?</p>
|
|
|
|
<p>Conventional wisdom says that you should prefer Map, because it
|
|
allows you to change the underlying implementation to anything that
|
|
implements the Map interface. Conventional wisdom is correct for
|
|
conventional programming, but isn't so great for embedded systems. Calling
|
|
through an interface reference can take 2x longer than a virtual
|
|
method call through a concrete reference.</p>
|
|
|
|
<p>If you have chosen a HashMap because it fits what you're doing, there
|
|
is little value in calling it a Map. Given the availability of
|
|
IDEs that refactor your code for you, there's not much value in calling
|
|
it a Map even if you're not sure where the code is headed. (Again, though,
|
|
public APIs are an exception: a good API usually trumps small performance
|
|
concerns.)</p>
|
|
|
|
<a name="prefer_static" id="prefer_static"></a>
|
|
<h2>Prefer Static Over Virtual</h2>
|
|
|
|
<p>If you don't need to access an object's fields, make your method static. It can
|
|
be called faster, because it doesn't require a virtual method table
|
|
indirection. It's also good practice, because you can tell from the method
|
|
signature that calling the method can't alter the object's state.</p>
|
|
|
|
<a name="internal_get_set" id="internal_get_set"></a>
|
|
<h2>Avoid Internal Getters/Setters</h2>
|
|
|
|
<p>In native languages like C++ it's common practice to use getters (e.g.
|
|
<code>i = getCount()</code>) instead of accessing the field directly (<code>i
|
|
= mCount</code>). This is an excellent habit for C++, because the compiler can
|
|
usually inline the access, and if you need to restrict or debug field access
|
|
you can add the code at any time.</p>
|
|
|
|
<p>On Android, this is a bad idea. Virtual method calls are expensive,
|
|
much more so than instance field lookups. It's reasonable to follow
|
|
common object-oriented programming practices and have getters and setters
|
|
in the public interface, but within a class you should always access
|
|
fields directly.</p>
|
|
|
|
<a name="cache_fields" id="cache_fields"></a>
|
|
<h2>Cache Field Lookups</h2>
|
|
|
|
<p>Accessing object fields is much slower than accessing local variables.
|
|
Instead of writing:</p>
|
|
<pre>for (int i = 0; i < this.mCount; i++)
|
|
dumpItem(this.mItems[i]);</pre>
|
|
|
|
<p>You should write:</p>
|
|
<pre> int count = this.mCount;
|
|
Item[] items = this.mItems;
|
|
|
|
for (int i = 0; i < count; i++)
|
|
dumpItems(items[i]);
|
|
</pre>
|
|
|
|
<p>(We're using an explicit "this" to make it clear that these are
|
|
member variables.)</p>
|
|
|
|
<p>A similar guideline is never call a method in the second clause of a "for"
|
|
statement. For example, the following code will execute the getCount() method
|
|
once per iteration, which is a huge waste when you could have simply cached
|
|
the value as an int:</p>
|
|
|
|
<pre>for (int i = 0; i < this.getCount(); i++)
|
|
dumpItems(this.getItem(i));
|
|
</pre>
|
|
|
|
<p>It's also usually a good idea to create a local variable if you're going to be
|
|
accessing an instance field more than once. For example:</p>
|
|
|
|
<pre>
|
|
protected void drawHorizontalScrollBar(Canvas canvas, int width, int height) {
|
|
if (isHorizontalScrollBarEnabled()) {
|
|
int size = <strong>mScrollBar</strong>.getSize(<em>false</em>);
|
|
if (size <= 0) {
|
|
size = mScrollBarSize;
|
|
}
|
|
<strong>mScrollBar</strong>.setBounds(0, <em>height</em> - size, width, height);
|
|
<strong>mScrollBar</strong>.setParams(
|
|
computeHorizontalScrollRange(),
|
|
computeHorizontalScrollOffset(),
|
|
computeHorizontalScrollExtent(), <em>false</em>);
|
|
<strong>mScrollBar</strong>.draw(canvas);
|
|
}
|
|
}</pre>
|
|
|
|
<p>That's four separate lookups of the member field <code>mScrollBar</code>.
|
|
By caching mScrollBar in a local stack variable, the four member field lookups
|
|
become four stack variable references, which are much more efficient.</p>
|
|
|
|
<p>Incidentally, method arguments have the same performance characteristics
|
|
as local variables.</p>
|
|
|
|
<a name="use_final" id="use_final"></a>
|
|
<h2>Declare Constants Final</h2>
|
|
|
|
<p>Consider the following declaration at the top of a class:</p>
|
|
|
|
<pre>static int intVal = 42;
|
|
static String strVal = "Hello, world!";</pre>
|
|
|
|
<p>The compiler generates a class initializer method, called
|
|
<code><clinit></code>, that is executed when the class is first used.
|
|
The method stores the value 42 into <code>intVal</code>, and extracts a
|
|
reference from the classfile string constant table for <code>strVal</code>.
|
|
When these values are referenced later on, they are accessed with field
|
|
lookups.</p>
|
|
|
|
<p>We can improve matters with the "final" keyword:</p>
|
|
|
|
<pre>static final int intVal = 42;
|
|
static final String strVal = "Hello, world!";</pre>
|
|
|
|
<p>The class no longer requires a <code><clinit></code> method,
|
|
because the constants go into classfile static field initializers, which are
|
|
handled directly by the VM. Code accessing <code>intVal</code> will use
|
|
the integer value 42 directly, and accesses to <code>strVal</code> will
|
|
use a relatively inexpensive "string constant" instruction instead of a
|
|
field lookup.</p>
|
|
|
|
<p>Declaring a method or class "final" does not confer any immediate
|
|
performance benefits, but it does allow certain optimizations. For example, if
|
|
the compiler knows that a "getter" method can't be overridden by a sub-class,
|
|
it can inline the method call.</p>
|
|
|
|
<p>You can also declare local variables final. However, this has no definitive
|
|
performance benefits. For local variables, only use "final" if it makes the
|
|
code clearer (or you have to, e.g. for use in an anonymous inner class).</p>
|
|
|
|
<a name="foreach" id="foreach"></a>
|
|
<h2>Use Enhanced For Loop Syntax With Caution</h2>
|
|
|
|
<p>The enhanced for loop (also sometimes known as "for-each" loop) can be used for collections that implement the Iterable interface.
|
|
With these objects, an iterator is allocated to make interface calls
|
|
to hasNext() and next(). With an ArrayList, you're better off walking through
|
|
it directly, but for other collections the enhanced for loop syntax will be equivalent
|
|
to explicit iterator usage.</p>
|
|
|
|
<p>Nevertheless, the following code shows an acceptable use of the enhanced for loop:</p>
|
|
|
|
<pre>public class Foo {
|
|
int mSplat;
|
|
static Foo mArray[] = new Foo[27];
|
|
|
|
public static void zero() {
|
|
int sum = 0;
|
|
for (int i = 0; i < mArray.length; i++) {
|
|
sum += mArray[i].mSplat;
|
|
}
|
|
}
|
|
|
|
public static void one() {
|
|
int sum = 0;
|
|
Foo[] localArray = mArray;
|
|
int len = localArray.length;
|
|
|
|
for (int i = 0; i < len; i++) {
|
|
sum += localArray[i].mSplat;
|
|
}
|
|
}
|
|
|
|
public static void two() {
|
|
int sum = 0;
|
|
for (Foo a: mArray) {
|
|
sum += a.mSplat;
|
|
}
|
|
}
|
|
}</pre>
|
|
|
|
<p><strong>zero()</strong> retrieves the static field twice and gets the array
|
|
length once for every iteration through the loop.</p>
|
|
|
|
<p><strong>one()</strong> pulls everything out into local variables, avoiding
|
|
the lookups.</p>
|
|
|
|
<p><strong>two()</strong> uses the enhanced for loop syntax introduced in version 1.5 of
|
|
the Java programming language. The code generated by the compiler takes care
|
|
of copying the array reference and the array length to local variables, making
|
|
it a good choice for walking through all elements of an array. It does
|
|
generate an extra local load/store in the main loop (apparently preserving
|
|
"a"), making it a teensy bit slower and 4 bytes longer than one().</p>
|
|
|
|
<p>To summarize all that a bit more clearly: enhanced for loop syntax performs well
|
|
with arrays, but be cautious when using it with Iterable objects since there is
|
|
additional object creation.</p>
|
|
|
|
<a name="avoid_enums" id="avoid_enums"></a>
|
|
<h2>Avoid Enums</h2>
|
|
|
|
<p>Enums are very convenient, but unfortunately can be painful when size
|
|
and speed matter. For example, this:</p>
|
|
|
|
<pre>public class Foo {
|
|
public enum Shrubbery { GROUND, CRAWLING, HANGING }
|
|
}</pre>
|
|
|
|
<p>turns into a 900 byte .class file (Foo$Shrubbery.class). On first use, the
|
|
class initializer invokes the <init> method on objects representing each
|
|
of the enumerated values. Each object gets its own static field, and the full
|
|
set is stored in an array (a static field called "$VALUES"). That's a lot of
|
|
code and data, just for three integers.</p>
|
|
|
|
<p>This:</p>
|
|
|
|
<pre>Shrubbery shrub = Shrubbery.GROUND;</pre>
|
|
|
|
<p>causes a static field lookup. If "GROUND" were a static final int,
|
|
the compiler would treat it as a known constant and inline it.</p>
|
|
|
|
<p>The flip side, of course, is that with enums you get nicer APIs and
|
|
some compile-time value checking. So, the usual trade-off applies: you should
|
|
by all means use enums for public APIs, but try to avoid them when performance
|
|
matters.</p>
|
|
|
|
<p>In some circumstances it can be helpful to get enum integer values
|
|
through the <code>ordinal()</code> method. For example, replace:</p>
|
|
|
|
<pre>for (int n = 0; n < list.size(); n++) {
|
|
if (list.items[n].e == MyEnum.VAL_X)
|
|
// do stuff 1
|
|
else if (list.items[n].e == MyEnum.VAL_Y)
|
|
// do stuff 2
|
|
}</pre>
|
|
|
|
<p>with:</p>
|
|
|
|
<pre> int valX = MyEnum.VAL_X.ordinal();
|
|
int valY = MyEnum.VAL_Y.ordinal();
|
|
int count = list.size();
|
|
MyItem items = list.items();
|
|
|
|
for (int n = 0; n < count; n++)
|
|
{
|
|
int valItem = items[n].e.ordinal();
|
|
|
|
if (valItem == valX)
|
|
// do stuff 1
|
|
else if (valItem == valY)
|
|
// do stuff 2
|
|
}</pre>
|
|
|
|
<p>In some cases, this will be faster, though this is not guaranteed.</p>
|
|
|
|
<a name="package_inner" id="package_inner"></a>
|
|
<h2>Use Package Scope with Inner Classes</h2>
|
|
|
|
<p>Consider the following class definition:</p>
|
|
|
|
<pre>public class Foo {
|
|
private int mValue;
|
|
|
|
public void run() {
|
|
Inner in = new Inner();
|
|
mValue = 27;
|
|
in.stuff();
|
|
}
|
|
|
|
private void doStuff(int value) {
|
|
System.out.println("Value is " + value);
|
|
}
|
|
|
|
private class Inner {
|
|
void stuff() {
|
|
Foo.this.doStuff(Foo.this.mValue);
|
|
}
|
|
}
|
|
}</pre>
|
|
|
|
<p>The key things to note here are that we define an inner class (Foo$Inner)
|
|
that directly accesses a private method and a private instance field
|
|
in the outer class. This is legal, and the code prints "Value is 27" as
|
|
expected.</p>
|
|
|
|
<p>The problem is that Foo$Inner is technically (behind the scenes) a totally
|
|
separate class, which makes direct access to Foo's private
|
|
members illegal. To bridge that gap, the compiler generates a
|
|
couple of synthetic methods:</p>
|
|
|
|
<pre>/*package*/ static int Foo.access$100(Foo foo) {
|
|
return foo.mValue;
|
|
}
|
|
/*package*/ static void Foo.access$200(Foo foo, int value) {
|
|
foo.doStuff(value);
|
|
}</pre>
|
|
|
|
<p>The inner-class code calls these static methods whenever it needs to
|
|
access the "mValue" field or invoke the "doStuff" method in the outer
|
|
class. What this means is that the code above really boils down to a case
|
|
where you're accessing member fields through accessor methods instead of
|
|
directly. Earlier we talked about how accessors are slower than direct field
|
|
accesses, so this is an example of a certain language idiom resulting in an
|
|
"invisible" performance hit.</p>
|
|
|
|
<p>We can avoid this problem by declaring fields and methods accessed
|
|
by inner classes to have package scope, rather than private scope.
|
|
This runs faster and removes the overhead of the generated methods.
|
|
(Unfortunately it also means the fields could be accessed directly by other
|
|
classes in the same package, which runs counter to the standard OO
|
|
practice of making all fields private. Once again, if you're
|
|
designing a public API you might want to carefully consider using this
|
|
optimization.)</p>
|
|
|
|
<a name="avoidfloat" id="avoidfloat"></a>
|
|
<h2>Avoid Float</h2>
|
|
|
|
<p>Before the release of the Pentium CPU, it was common for game authors to do
|
|
as much as possible with integer math. With the Pentium, the floating point
|
|
math co-processor became a built-in feature, and by interleaving integer and
|
|
floating-point operations your game would actually go faster than it would
|
|
with purely integer math. The common practice on desktop systems is to use
|
|
floating point freely.</p>
|
|
|
|
<p>Unfortunately, embedded processors frequently do not have hardware floating
|
|
point support, so all operations on "float" and "double" are performed in
|
|
software. Some basic floating point operations can take on the order of a
|
|
millisecond to complete.</p>
|
|
|
|
<p>Also, even for integers, some chips have hardware multiply but lack
|
|
hardware divide. In such cases, integer division and modulus operations are
|
|
performed in software — something to think about if you're designing a
|
|
hash table or doing lots of math.</p>
|
|
|
|
<a name="samples" id="samples"></a>
|
|
<h2>Some Sample Performance Numbers</h2>
|
|
|
|
<p>To illustrate some of our ideas, here is a table listing the approximate
|
|
run times for a few basic actions. Note that these values should NOT be taken
|
|
as absolute numbers: they are a combination of CPU and wall clock time, and
|
|
will change as improvements are made to the system. However, it is worth
|
|
noting how these values apply relative to each other — for example,
|
|
adding a member variable currently takes about four times as long as adding a
|
|
local variable.</p>
|
|
|
|
<table>
|
|
<tr>
|
|
<th>Action</th>
|
|
<th>Time</th>
|
|
</tr>
|
|
<tr>
|
|
<td>Add a local variable </td>
|
|
<td>1</td>
|
|
</tr>
|
|
<tr class="alt">
|
|
<td>Add a member variable </td>
|
|
<td>4</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Call String.length()</td>
|
|
<td>5</td>
|
|
</tr>
|
|
<tr class="alt">
|
|
<td>Call empty static native method</td>
|
|
<td>5</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Call empty static method </td>
|
|
<td>12</td>
|
|
</tr>
|
|
<tr class="alt">
|
|
<td>Call empty virtual method </td>
|
|
<td>12.5</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Call empty interface method </td>
|
|
<td>15</td>
|
|
</tr>
|
|
<tr class="alt">
|
|
<td>Call Iterator:next() on a HashMap </td>
|
|
<td>165</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Call put() on a HashMap</td>
|
|
<td>600</td>
|
|
</tr>
|
|
<tr class="alt">
|
|
<td>Inflate 1 View from XML </td>
|
|
<td>22,000</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Inflate 1 LinearLayout containing 1 TextView </td>
|
|
<td>25,000</td>
|
|
</tr>
|
|
<tr class="alt">
|
|
<td>Inflate 1 LinearLayout containing 6 View objects </td>
|
|
<td>100,000</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Inflate 1 LinearLayout containing 6 TextView objects </td>
|
|
<td>135,000</td>
|
|
</tr>
|
|
<tr class="alt">
|
|
<td>Launch an empty activity </td>
|
|
<td>3,000,000</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<a name="closing_notes" id="closing_notes"></a>
|
|
<h2>Closing Notes</h2>
|
|
|
|
<p>The best way to write good, efficient code for embedded systems is to
|
|
understand what the code you write really does. If you really want to allocate
|
|
an iterator, by all means use enhanced for loop syntax on a List; just make it a
|
|
deliberate choice, not an inadvertent side effect.</p>
|
|
|
|
<p>Forewarned is forearmed! Know what you're getting into! Insert your
|
|
favorite maxim here, but always think carefully about what your code is doing,
|
|
and be on the lookout for ways to speed it up.</p>
|