As the UI shader is generally standardized, the most common problem is simply excessive fill-rate usage. This is most commonly due to a large number of overlapping UI elements and/or having multiple UI elements that occupy significant portions of the screen. Both of these problems can lead to extremely high levels of overdraw.
In order to alleviate fill-rate overutilization and reduce overdraw, consider the following possible remediations.
Eliminating invisible UI
The method that requires the least redesigning of existing UI elements is to simply disable elements that are not visible to the player. The most common case where this is applicable is opening full-screen UIs with opaque backgrounds. In this case, any UI elements placed beneath the full-screen UI can be disabled.
The simplest way to do this is to disable the root GameObject or GameObjects containing the invisible UI elements. For an alternate solution, see the Disabling Canvases section. Finally, make sure that no UI elements are hidden by setting their alpha to 0, as the element will still be sent to the GPU and may take precious rendering time. If a UI element doesn’t need a Graphic component, you can simply remove it and raycasting will still work.
Simplify UI structure
To reduce the time required to rebuild and render the UI, it is important to keep the number of UI objects as low as possible. Try to bake things as much as you can. For example, don’t use a blended GameObject just to change the hue to an element, do this via material properties instead. Also, don’t create game objects acting like folders and having no other purpose than organizing your Scenes.
Disabling invisible camera output
If a full-screen UI with an opaque background is opened, the world-space camera will still render the standard 3D scene behind the UI. The renderer is not aware that the full-screen Unity UI will obscure the entire 3D scene.
Therefore, if a completely full-screen UI is opened, disabling any and all of the obscured world-space cameras will help reduce GPU stress by simply eliminating the useless work of rendering the 3D world.
If the UI doesn’t cover the whole 3D scene, you may want to render the scene to a texture once and use it instead of continuously render it. You will lose the possibility to see animated content in the 3D scene, but that should be acceptable most of the time.
Note: If a Canvas is set as “Screen Space – Overlay”, then it will be drawn irrespective of the number of cameras active in the scene.
Majority-obscured cameras
Many “full-screen” UIs do not actually obscure the entire 3D world, but leave a small portion of the world visible. In these cases, it may be more optimal to capture just the portions of the world that are visible into a render texture. If the visible portion of the world is “cached” in a render texture, then the actual world-space camera can be disabled, and the cached render texture can be drawn behind the UI screen to provide an impostor version of the 3D world.
Composition-based UIs
It is very common for designers to create UIs via composition – combining and layering standard backgrounds and elements to create the final UI. While this is relatively simple to do, and very friendly to iteration, it is non-performant due to Unity UI’s use of the transparent rendering queue.
Consider a simple UI with a background, a button and some text on the button. Because objects in the transparent queue are sorted from back to front, in the case that a pixel falls within a text glyph, the GPU must sample the background’s texture, then the button’s texture, and finally the text atlas’ texture, for a total of three samples. As the complexity of the UI grows, and more decorative elements are layered onto the background, the number of samples can rise rapidly.
If a large UI is discovered to be fill-rate bound, the best recourse is to create specialized UI sprites that merge as many of the decorative/invariant elements of the UI into its background texture. This reduces the number of elements that must be layered atop one another to achieve the desired design, but is labor-intensive and increases the size of the project’s texture atlases.
This principle of condensing the number of layered elements necessary to create a given UI onto specialized UI sprites can also be used for sub-elements. Consider a store UI with a scrolling pane of products. Each product UI element has a border, a background, and some icons to denote price, name and other information.
The store UI will need a background, but because its products scroll across the background, the product elements cannot be merged onto the store UI’s background texture. However, the border, price, name and other elements of the product’s UI element could be merged onto the product’s background. Depending on the size and number of icons, the fill-rate savings can be considerable.
There are several drawbacks to combining layered elements. Specialized elements can no longer be reused, and require additional artist resources to create. The addition of large new textures may significantly increase the amount of memory needed to hold the UI textures, particularly if the UI textures are not loaded and unloaded on demand.
UI shaders and low-spec devices
The built-in shader used by Unity UI incorporates support for masking, clipping and numerous other complex operations. Because of this added complexity, the UI shader performs poorly compared to the simpler Unity 2D shader on low-end devices such as the iPhone 4.
If masking, clipping and other “fancy” features are unneeded for an application targeted at low-end devices, it is possible to create a custom shader that omits the unused operations, such as this minimal UI shader:
<pre> Shader "UI/Fast-Default" { Properties { [PerRendererData] _MainTex ("Sprite Texture", 2D) = "white" {} _Color ("Tint", Color) = (1,1,1,1) }
Both of these problems tend to become acute as the number of elements on a Canvas increases.
Important reminder: Whenever any drawable UI element on a given Canvas changes, the Canvas must re-run the batch building process. This process re-analyzes every drawable UI element on the Canvas, regardless of whether it has changed or not. Note that a “change” is any change which affects a UI object’s appearance, including the sprite assigned to a sprite renderer, transform position & scale, the text contained in a text mesh, etc.
Child order
Unity UIs are constructed back-to-front, with objects’ order in the hierarchy determining their sort order. Objects earlier in the hierarchy are considered behind objects later in the hierarchy. Batches are built by walking the hierarchy top-to-bottom and collecting all objects which use the same material, the same texture and do not have intermediate layers. An “intermediate layer” is a graphical object with a different material, whose bounding box overlaps two otherwise-batchable objects and is placed in the hierarchy between the two batchable objects. Intermediate layers force batches to be broken.
As discussed in the Unity UI Profiling Tools step, the UI profiler and frame debugger can be used to inspect a UI for intermediate layers. This is the situation where one drawable object interposes itself between two other drawable objects that are otherwise batchable.
This problem most commonly occurs when text and sprites are located near one another: the text’s bounding box can invisibly overlap nearby sprites, because much of a text glyph’s polygon is transparent. This can be solved in two ways:
Both of these operations can be carried out in the Unity Editor with the Unity Frame Debugger open and enabled. By simply observing the number of draw calls visible in the Unity Frame Debugger, it is possible to find an order and position that minimizes the number of draw calls wasted due to overlapping UI elements.
Splitting Canvases
In all but the most trivial cases, it is generally a good idea to split up a Canvas, either by moving elements to a Sub-canvas or to a sibling Canvas.
Sibling Canvases are best used in cases where certain portions of a UI must have their draw depth controlled separately from the rest of the UI, to be always above or below other layers (e.g. tutorial arrows).
In most other cases, Sub-canvases are more convenient as they inherit their display settings from their parent Canvas.
While it may seem at first glance that it is a best practice to subdivide a UI into many Sub-canvases, remember that the Canvas system also does not combine batches across separate Canvases. Performant UI design requires a balance between minimizing the cost of rebuilds and minimizing wasted draw calls.
General guidelines
Because a Canvas rebatches any time any of its constituent drawable components changes, it is generally best to split any non-trivial Canvas into at least two parts. Further, it is best to try to co-locate elements on the same Canvas if the elements are expected to change simultaneously. An example might be a progress bar and a countdown timer. These both rely on the same underlying data and therefore will require updates at the same time, and so they should be placed on the same Canvas.
On one Canvas, place all elements that are static and unchanging, such as backgrounds and labels. These will batch once, when the Canvas is first displayed, and then will no longer need to rebatch afterwards.
On the second Canvas, place all of the “dynamic” elements – the ones that change frequently. This will ensure that this Canvas is rebatching primarily dirty elements. If the number of dynamic elements grows very large, it may be necessary to further subdivide the dynamic elements into a set of elements that are constantly changing (e.g. progress bars, timer readouts, anything animated) and a set of elements that change only occasionally.
This is actually rather difficult in practice, especially when encapsulating UI controls into prefabs. Many UIs instead elect to subdivide a Canvas by splitting out the costlier controls onto a Sub-canvas.
Unity 5.2 and Optimized Batching
In Unity 5.2, the batching code was substantially rewritten, and is considerably more performant compared to Unity 4.6, 5.0 and 5.1. Further, on devices with more than 1 core, the Unity UI system will move most of the processing to worker threads. In general, Unity 5.2 reduces the need for aggressively splitting a UI into dozens of Sub-canvases. Many UIs on mobile devices can now be made performant with as few as two or three Canvases.
More information on the optimizations in Unity 5.2 can be found in this blog post. Input and raycasting in Unity UI
By default, Unity UI uses the Graphic Raycaster component to handle input events, such as touch events and pointer-hover events. This is generally handled by the Standalone Input Manager component. Despite the name, the Standalone Input Manager is meant to be a “universal” input manager system, and will handle both pointers and touches. Erroneous mouse input detection on mobile (5.3)
Prior to Unity 5.4, each active Canvas with a Graphic Raycaster attached will run a raycast once per frame to check the position of the pointer so long as there is currently no touch input available. This will occur regardless of platform; iOS and Android devices without mice will still query the mouse’s position and attempt to discover which UI elements are beneath that position to determine if any hover events need to be sent.
This is a waste of CPU time, and has been witnessed consuming 5% or more of a Unity application’s CPU frame time.
This issue is resolved in Unity 5.4. From 5.4 onward, devices without mice will not query for the mouse position and will not perform unnecessary raycasts.
If using a version of Unity older than 5.4, it is strongly recommended that mobile developers create their own Input Manager class. This can be as simple as copying Unity’s Standard Input Manager from the Unity UI source and commenting out the ProcessMouseEvent method as well as all calls to that method.
Raycast optimization
The Graphic Raycaster is a relatively straightforward implementation that iterates over all Graphic components that have the ‘Raycast Target’ setting set to true. For each Raycast Target, the Raycaster performs a set of tests. If a Raycast Target passes all of its tests, then it is added to the list of hits.
Raycast implementation details
The list of hit Raycast Targets is then sorted by depth, filtered for reversed targets, and filtered to ensure that elements rendered behind the camera (i.e. not visible in the screen) are removed.
The Graphic Raycaster also may cast a ray into the 3D or 2D physics system if the respective flag is set on the Graphic Raycaster’s “Blocking Objects” property. (From script, the property is named blockingObjects.) If 2D or 3D blocking objects are enabled, then any Raycast Targets that draw beneath a 2D or 3D object on a raycast-blocking Physics Layer will also be eliminated from the list of hits.
The final list of hits is then returned.
Raycasting optimization tips
Given that all Raycast Targets must be tested by the Graphic Raycaster, it is a best practice to only enable the ‘Raycast Target’ setting on UI components that must receive pointer events. The smaller the list of Raycast Targets, and the shallower the hierarchy that must be traversed, the faster each Raycast test will be.
For composite UI controls that have multiple drawable UI objects that must respond to pointer events, such as a button that wishes to have its background and text both change colors, it is generally better to place a single Raycast Target at the root of the composite UI control. When that single Raycast Target receives a pointer event, it can then forward the event to each interested component within the composite control.
Hierarchy depth and raycast filters
Each Graphic Raycast traverses the Transform hierarchy all the way to the root when searching for raycast filters. The cost of this operation grows linearly in proportion to the depth of the hierarchy. All components found attached to each Transform in the hierarchy must be tested to see if they implement ICanvasRaycastFilter, so this is not a cheap operation. Sub-canvases and the OverrideSorting property
The overrideSorting property on a Sub-canvas will cause a Graphic Raycast test to stop climbing the transform hierarchy. If it can be enabled without causing sorting or raycast detection issues, then it should be used to decrease the cost of raycast hierarchy traversals.