Android Profiling Tools
Checked with version: 2018.1
Android supports a vast variety of devices, which comes with some constraints, such as specific tools for specific chipsets. Many platform-specific tools yield useful information about both the target device and app performance.
There are profiling tools available which help to profile the Java Managed Environment, such as the Memory Monitor in Android Studio. The Memory Monitor shows how an app allocates managed memory over the course of a single session. This tool, however, only provides data for the Java Environment and does not offer sufficient data on the native systems, which are essential for most applications made with Unity. Only a small portion of Unity’s code runs in a Java managed environment, with the majority being driven by systems in the native environment.
Android native profiling tools depend on the target chipset, and each chipset manufacturer has their own suite of tools to gain data from CPU, GPU, and memory. There are tools that provide chipset spanning performance analysis as well, such as Android Studio.
The following table includes a collection of tools available to profile Android devices.
|Systrace||Any chipset||Yes||Yes||Systrace is not a sampling profiler. The tracing tool captures system-wide traces, and Android Studio embeds it in 3.0.||Android|
|Adreno Profiler||Adreno GPU||Yes||Replaced by Snapdragon Profiler Requires old mono framework 2.10.5 and xquartz 2.7.11||Android||DirectX OpenGL ES OpenCL|
|Snapdragon Profiler||Snapdragon Chipset||Yes||Yes||Yes||Requires latest mono framework and Android 6.0+ (frame capture)||Android||OpenGL ES OpenCL Vulkan|
|Mali Graphics Debugger||Mali GPU||Yes||Additional .so file added (via Plugin in Unity) otherwise requires source changes or a rooted device.||Android Linux||OpenGL ES OpenCL|
|Simpleperf||Any chipset||Command Line only, but Android Studio integrates it in 3.0.||Android|
|Android Studio GPU Debugger||Any chipset||Yes||Create Android Studio Project and build/run.||Android||OpenGL ES|
|DS-5 Streamline||ARM Chipset (Cortex-M, Cortex-A)||Yes||Yes||Streamline requires a custom kernel and a rooted device.||Linux, Android|
|Tegra System Profiler||Tegra K1, X1||Yes||Yes||A system trace and multi-core CPU call stack sampling profiler.||Android Linux|
|Tegra Graphics Debugger||Tegra K1, X1||Yes||A tool for debugging and profiling applications.||Android Linux||OpenGL ES OpenGL|
|Intel VTune Amplifier Performance Analyzer||Intel x86 Desktop||Yes||Yes||Yes||A tool for analysing and profiling processor performance.||Windows Android Linux||DirectX OpenGL ES OpenGL|
|Intel System Analyzer||Intel x86, ARM||Yes||Yes||A tool for analysing CPU live traces.||Windows Android Linux||DirectX OpenGL ES OpenGL|
|Intel Graphics Frame Analyzer||Intel x86||Yes||A tool for single-frame analysis and optimization.||Windows Android Linux||DirectX OpenGL ES OpenGL|
|Intel Graphics Trace Analyzer||Intel x86||Yes||Yes||A trace tool for analyzing workload performance across the CPU and GPU.||Windows Android Linux||DirectX OpenGL ES OpenGL|
|PVRTrace||Any chipset||Yes||Used for older GPUs and imgtec is not producing new GPUs right now.||Android Linux||OpenGL ES OpenGL|
|gapid||Any chipset||Yes||You need to build it manually, but Google also offers pre-built executables.||Android Windows Linux||OpenGL ES Vulcan|
|vkTrace||Any chipset||Yes||Command Line only.||Linux Windows||Vulcan|
For other platforms, similar tools are available. It’s good practice to verify and compare results on different platforms, especially when they share similar hardware or the same graphics APIs. Systems often act similarly, even across different platforms. Profiling on non-target platforms is not ideal, but it is better than not profiling at all.
Each tool helps to identify bottlenecks and takes appropriate actions towards solving the causes. The following section shares tips on profiling Unity applications using specific tools.
This profiler is for devices using Snapdragon chipsets and it can show a significant amount of performance data. All the data can be overwhelming at first, but you can filter data by bundle id.
There is an excellent talk from Unite 2015 called Uncover Your Game's Power and Performance Profile that covers the basics of Snapdragon Profiling and which filters are a great choice for profiling.
If the Snapdragon Profiler cannot find a device, go to the Snapdragon Settings and check whether the auto locator has set the adb path to the correct adb location. When a device is visible via the adb devices command, it should be visible for Snapdragon as well.
Between profiling sessions, it is useful to shutdown and restart the Snapdragon profiler. Before you unplug a device, quit the Snapdragon Profiler so that the profiler can clean up memory and network connections, and then reconnect to the device the next time without issues.
Simpleperf is a sampling profiler and provides multiple CPU counters. Currently, it only exists as a command line tool.
Modern CPUs have a hardware component called the performance monitoring unit (PMU). The PMU has several hardware counters that count events like CPU cycles, executed instructions, and the number of cache misses.
Simpleperf uses perf_event_open system calls on Android to get the data from hardware perf events and uses its Linux kernel to wrap hardware counters into these hardware perf events. The Linux kernel also provides hardware-independent software events and tracepoint events, and exposes all these events to user space via the perf_event_open system call.
The best way to get the sampling profiler working with call-graphs is by using dwarf debug info instead of frame-pointer. For more information, see in the Simpleperf readme file.
You can run Simpleperf from any host development platform that the NDK supports. You can get the Simpleperf tool from the Android NDK r13b and higher (under the ndk-location/simpleperf/ directory).
Note: You cannot use the run-as command on some Samsung devices. If you do, you will receive a Could not set capabilities: Operation not permitted error.
Install Simpleperf from the NDK location onto the device into a temp data folder:
~$ adb push ndk-location/.../simpleperf /data/local/tmp
Google provides a way to access the internal storage of debuggable versions of their packages using the run-as command:
~$ adb shell run-as com.unity.androidtest cp /data/local/tmp/simpleperf .
Change the execution right of Simpleperf to everyone:
~$ adb shell run-as com.unity.androidtest chmod a+x simpleperf
Google has blocked access to the Perf tool by default since Android Nougat, so you need to change this flag to be able to access the tool:
~$ adb shell setprop security.perf_harden 0
Now the device is ready to record data. In the following example, we:
Record on the process (-p) with an event (-e) for 20 seconds (--duration). You don’t need to add cpu-cycles explicitly because it is also the default event.
Record a dwarf-based call graph (-g) and use the --symfs argument to redirect the path.
Set the frequency to dump records (-f) with approximately 2000 records every second when the monitored thread runs.
Return the pid of the given bundle-id using `adb shell pidof com.unity.androidtest`.
~$ adb shell run-as com.unity.androidtest ./simpleperf record -p `adb shell pidof com.unity.androidtest` -e cpu-cycles:u -f 2000 -g --symfs . --duration 20
Alternatively, you can record stack frame-based call graphs:
~$ adb shell run-as com.unity.androidtest ./simpleperf record -p `adb shell pidof com.unity.androidtest` --call-graph fp --dump-symbols --symfs . --duration 20
You can also select which processes (-p) or threads (-t) to monitor. Monitoring a process is the same as monitoring all threads in the process:
~$ adb shell run-as com.unity.androidtest ./simpleperf record -t 7146,6471,7148,7147 --call-graph fp --symfs . --duration 20
Copy the perf.data onto the sd card:
~$ adb shell run-as com.unity.androidtest cp perf.data /sdcard/
Fetch data from the sd card to your local dir:
~$ adb pull /sdcard/perf.data
Write caller information from the perf.data on the device into the local perf.caller.report log file:
~$ adb shell run-as com.unity.androidtest ./simpleperf report -g caller -n --symfs . > perf.caller.report
Write callee information from the perf.data on the device into the perf.caller.report log file:
~$ adb shell run-as com.unity.androidtest ./simpleperf report -g callee -n --symfs . > perf.callee.report
Also, you can also flatten any of the reports:
~$ adb shell run-as com.unity.androidtest ./simpleperf report -n --symfs . > perf.flat.report
A flat report looks similar to the image below:
Android Studio 3.1 integrates Simpleperf into its Performance Profiler and offers a simple UI for capturing data, so you do not have to use the command line. For details, see the Android CPU Profiler documentation.
The Android Studio CPU Profiler works in a similar way to Native Debugging and works out of the box for many phones, such as the Samsung S8. To get native traces, follow the following steps:
Export your Unity Project as Gradle project and open it in Android Studio. For more information, see Gradle for Android.
Select the Sampled (Native) profile in the CPU Profiler.
Select the thread you want to profile, for instance, UnityMain.
Press the record button and stop recording when done.
When the Profiler stops recording, inspect the sampled data after. Note: For very long sessions, you should increase Android Studios’ memory limit.
The Call Chart is similar to the data in the Unity Timeline Profiler.
The Top Down view shows the native stack trace to identify bottlenecks. For more information on interpreting traces, see the dissecting native stack traces section.
The Profiler also offers a memory tab which provide data on the memory consumption. If this is relevant for you, read more about it in our guide on Memory Management in Unity.
Note: It is not possible to symbolicate user code and display function names when compiled with IL2CPP, although you can set the path of the symbols in the Android Studio Project Settings (Menu: Run > Edit Configurations > Debugger).
It’s often useful to look into dissecting traces, for example, when looking at a trace of the start-up process to evaluate startup time. Read more about dissecting stack traces in the Profiling section of the Understanding optimization in Unity Best Practice Guide.
You can use the NDK utility ndk-stack to symbolicate the call stack of an adb logcat output. You can find the tool in the root directory of your NDK installation. You need to ensure that your symbol files are called libunity.so instead of libunity.dbg.so, otherwise the ndk-stack tool will fail to locate them; simply rename the files if needed. You can call the tool using the following command in the terminal:
~$ ndk-stack -sym unity_path\build\AndroidPlayer\Variations\mono\Development\Symbols\armeabi-v7a -dump path_to_your_logcat
For more details and instructions, please visit the official ndk-stack documentation.
Alternatively to the ndk-stack tool, there is a tool available on Bitbucket that allows you to symbolicate the release crash dumps available on the Developer Console from the Play Store.
It does not provide the full line numbers from the call stack, but it allows you to see the method names and give you a much better idea of what could be going wrong. Full instructions on using the tool are in the tool’s readme file on Bitbucket.