Checked with version: 2018.1
The Camera is a core component, and every Unity application heavily relies upon it. This means that there are many options that, if not managed appropriately, can lead to poor performance, such as clear, culling and skybox options.
On mobile tile-based renderers, the clear command is particularly important. Unity takes care of the details, so you only have to set the clear flags on the Camera and avoid using the "Don’t Clear" flag when targeting mobile devices. The underlying behavior of the clear command depends on the platform and graphics driver, but depending on the clear flag you choose, it can impact performance significantly, because Unity has to either clear the previous content, set flags to ignore the previous content, or read previous content back from the buffer. Do NOT, however, perform unnecessary clears on streaming GPUs -- that is, the sort usually found in desktops and consoles.
On mobile, avoid Unity’s default Skybox (appropriately named Default-Skybox), which is computationally expensive and which is enabled by default in all new Scenes. To disable Skybox rendering completely, set the Camera.clearFlags to SolidColor. Then go to the Lighting Settings (menu: Window > Lighting > Settings) window, remove the Skybox Material, and set the Ambient Source to Color.
When using OpenGLES on Adreno GPUs, Unity only discards the framebuffer to avoid a framebuffer restore. On PVR and Mali GPUs, Unity clears to prevent a framebuffer restore.
Moving things in or out of graphics memory is resource-intensive on mobile devices, because the devices use a shared memory architecture, meaning CPU and GPU share the same physical memory. On tile-based GPUs like Adreno, PowerVR or the Apple A-series, loading or storing data in the logical buffer uses significant system time and battery power. Transferring content from shared memory to the portion of the framebuffer for each tile (or from the framebuffer to shared memory) is the main source of resource-heavy activity.
Tile-based rendering divides the viewport into smaller tiles with a typical size of 32x32px, and keeps these tiles in faster memory closer to the GPU. The copy operations between this smaller memory and the real framebuffer can take some time, because memory operations are a lot slower than arithmetic operations.
These slow memory operations are the main reason you should avoid loading the previous framebuffer with a glClear (OpenGLES) call on tile-based GPUs each new frame. By issuing a glClear command, you are telling the hardware that you do not need previous buffer content, so it does not need to copy the color buffer, depth buffer, and stencil buffer from the framebuffer to the smaller tile memory.
Note: Viewports with less than 16 pixels can be very slow on certain chipsets due to the way chips fetch information; setting the viewport to 2x2 pixels, for instance, can actually be slower than setting it to 16x16 pixels. This slowdown is device-specific and not something over which Unity has control, so it’s vital that you profile it.
The graphics driver executes load and store operations on the framebuffer when you switch rendering targets. For example, if you render to a view’s color buffer and Texture in two continuous frames, the system repeatedly transfers (loads and stores) the Texture’s content between shared memory and the GPU.
The clear command also has an effect on the compression of the frame buffer, including the color, depth, and stencil buffers. Clearing the entire buffer allows it to compress more tightly, reducing the amount of data the driver has to transfer between the GPU and memory, therefore allowing for higher frame rates due to improved throughput. On tile-based architecture, clearing tiles is a small task that involves setting a few bits in each tile. When complete, this makes the tile very cheap to fetch from memory. Note: These optimizations apply to tile-based deferred rendering GPUs and streaming GPUs.
Culling happens per-camera and can have a severe impact on performance, especially when multiple cameras are enabled concurrently. The two types of culling are frustum and occlusion culling:
Frustum Culling is performed automatically on every Unity Camera
Occlusion culling is controlled by the developer
Frustum Culling makes sure that GameObjects outside of the Camera frustum are not rendered to save rendering performance.
An example of Frustum Culling.
Note: Frustum culling is jobified in 2017.1 and later, and Unity now also culls by layer first. Culling by layer means that Unity only culls the GameObjects on layers the Camera uses, and ignores GameObjects on other layers. Afterwards, Unity uses jobs on threads to cull GameObjects based on the camera frustum.
When you enable Occlusion Culling, Unity does not render GameObjects if the Camera cannot see them. For example, rendering another room is unnecessary if a door is closed and the Camera cannot see the room.
An example of Occlusion Culling.
If you enable Occlusion Culling it can significantly increase performance, but it occupies more disk space and RAM because the Unity Umbra integration bakes the occlusion data during the build and Unity needs to load it from disk to RAM while loading a Scene.
When you use many active cameras in your Scene, there is a significant fixed culling and render overhead per-camera. Unity reduced the culling overhead in Unity 2017.1 due to layer culling, but if Cameras do not use a different layer to structure the content to render, this does not have any effect.
The Unity CPU Profiler shows the main thread in the timeline view. It indicates that there a multiple Cameras and you can see that Unity performs culling for each Camera.
You can set per-layer culling distances manually on the Camera via Scripts. Setting the cull distance is useful for culling small GameObjects that do not contribute to the Scene when the Camera views them from a given distance.
You can enable Skinned Motion Vectors on the Camera. If you do not activate them, Unity does not use Motion Vectors, and there is no performance implication. See the scripting docs for Camera.depthTextureMode for more information. Post-processing effects such as Temporal Anti-aliasing (TAA) require and activate Motion Vectors.
Decreased pixel fillrate is a result of overdraw and fragment shader complexity. Unity often implements shaders as multiple passes (draw diffuse, draw specular, and so forth). Using multiple passes leads to overdraw, where the different Shaders touch (read/write) the same pixels multiple times. For further information, read the Fill Rate section in the Optimizing graphics rendering in Unity games tutorial.
Unity's Frame Debugger is very useful for getting a sense of how Unity draws your Scene. Watch out for situations where you cover large sections of the screen with GameObjects, as Unity continues to draw everything behind the GameObject even though it is hidden. A common example of such a scenario is calling menu screens over an active 3D screen (such as settings or the player’s inventory), but leaving the 3D Scene active behind it. You should also beware of GameObjects that Unity draws multiple times; this happens, for example, when multiple lights touch a single GameObject, because Unity then draws it for each pass (see documentation on Forward Rendering Path).
As mentioned above, UI is often the cause of overdraw and fillrate issues. Avoid these by following the tips in the Optimizing Unity UI guide.
The Overdraw view allows you to see the objects that Unity draws on top of one another. You can view overdraw in the Scene View by using the Scene View Control Bar.
Overdraw in the Scene View Control Bar.
A Scene in standard Shaded view.
The same Scene in Overdraw view.
The overdraw view works best when you adjust the Scene View to your target resolution. Unity renders objects as transparent silhouettes. As the transparencies accumulate, it becomes easier to spot places where GameObjects draw over one another. White is the least optimal, because a pixel is overdrawn multiple times, while black means no overdraw is occurring.
Transparency also adds to overdraw. In the optimal case, every pixel on the screen is touched only once per frame.
You should avoid overlapping alpha-blended geometry (such as dense particle effects and full-screen post-processing effects) to keep fillrate low.
Objects in the Unity opaque queue are rendered in front-to-back order using a bounding box (AABB center coordinates) and depth testing to minimize overdraw. However, Unity renders objects in the transparent queue in a back-to-front order, and does not perform depth testing, making objects in the transparent queue subject to overdraw. Unity also sorts Transparent GameObjects based on the center position of their bounding boxes.
Z-testing is faster than drawing a pixel. Unity performs culling and opaque sorting via bounding box. Therefore, Unity may draw large background objects first, such as the Skybox or a ground plane, because the bounding box is large and fills a large number of pixels that end up not being visible later after being overdrawn with other objects. If you see this happen, move those objects to the end of the queue manually. See Material.renderQueue in the Scripting API Reference for more information.
PC hardware can push a lot of draw calls, but the overhead of each call is still high enough to warrant trying to reduce them. On mobile devices, however, draw call optimization is vital, and you can achieve it with draw call batching.
You can maximize batching by following these simple rules:
Use as few Textures in a Scene as possible. Fewer Textures require fewer unique Materials, making them easier to batch. Additionally, use Texture atlases wherever possible.
Mark all Meshes that never move as Static in the Inspector. Unity combines all Meshes marked as Static into one large Mesh at build time. You can also generate static batches yourself at runtime (for example, after generating a procedural level of static parts) using StaticBatchingUtility.
Always bake lightmaps at the largest atlas size possible. Fewer lightmaps require fewer Material state changes. For instance, a Samsung S6/S8 can push 4096k lightmaps without too much trouble, but keep an eye on the memory footprint. Note: You don’t need to include every last GameObject in the lightmap (which happens when setting GameObjects to lightmap-static). While the above advice is generally true - you should mark all non-moving GameObjects as Static - you should omit small objects (gravel, cups, books), because adding them forces Unity to create another lightmap if there is not enough space. Small objects can look great when you light them with Light Probes.
Be careful not to accidentally instance Materials. Accessing Renderer.material automatically creates an instance and opts that object out of batching. Use Renderer.sharedMaterial instead whenever possible.
Watch out for multi-pass shaders. Add noforwardadd to your shaders whenever you can to prevent Unity from applying more than one directional as multiple directionals break batching (see documentation on Writing Surface Shaders for more details).
Keep an eye on the number of static and dynamic batch counts versus the total draw call count by using the Profiler, internal profiler log, or the stats gizmo during optimizations.
For additional information and tips, read the section about draw call batching in the Unity documentation.
Instancing forces Unity to use constant buffers, which work well on desktop GPUs but are slow on mobile devices. Instancing only starts to become useful at around 50-100 Meshes, depending on the underlying hardware.
It is essential to keep the geometric complexity of GameObjects in your Scenes to a minimum, otherwise Unity has to push a lot of vertex data to the graphics card. 200k static triangles is a conservative target for low-end mobile. However, this also depends on whether your GameObjects are animated or static.
|Platform||Static Geometry [million triangles]||Animated Skinned Meshes [million triangles]|
In Unity, you get better render performance by having few GameObjects with high polycounts, rather than many GameObjects with low poly counts.
Remove faces from geometry that you cannot see, and don’t render things the player never sees. For example, if you never see the back of a cupboard resting against a wall, the cupboard model should not have any faces on its posterior side.
Simplify Meshes as much as possible. Depending on the target platform (especially on mobile), look into adding details via high-resolution Textures to compensate for low poly geometry, potentially parallax mapping and tessellation. Be mindful and remember to profile regularly, as this can impact performance, and may not be suitable or available on the target platform.
Reduce pixel complexity (per-pixel calculations) by baking as much detail into the Textures as possible. For example, bake specular highlights into the Texture to avoid having to compute the highlight in the fragment shader.
Level Of Detail (LOD) rendering allows you to reduce the number of triangles rendered for an object as its distance from camera increases. As long as objects aren’t all close to the camera at the same time, LOD reduces the load on the hardware and improves rendering performance by adding a LOD component. It also provides lower-detail Meshes for distance groups further from the camera. There are 8 LOD levels in total. Tools like Simplygon can automate much of the Asset preparation process for LOD.
When you use a static camera setup where the user cannot move (such as in some VR experiences) it is better to use a Mesh which you model with the correct details for the distance in mind, rather than storing multiple LODs per object. You can apply a similar concept for Textures by using a proper resolution for a Texture instead of mipmapping the right resolution at runtime. Applying the right details saves a lot of disk space and some run-time memory.
If you can afford memory, you can use Mesh combination and then LOD the result. For example, a bookcase consists of unique pieces up close, but you merge them into a single Mesh and LOD them in the distance. It takes time and effort to maintain and generate high-quality LODs, even if you already have the optimal geometry, Materials, and Shaders.
If creating high-quality LODs is not possible, you can still get good results with the run-time Mesh combination (baking), which you can run when the Scene changes or during loading. However, the framerate might be lower while you run the combination, so it’s usually not recommended for mobile.
When you want LODs for animations, you must manually set them up via masking.
Here’s an example: you have a human character model which does not animate fingers in lower LODs and needs no rig.
Put only the fingers in one mask without the rest of the hand or body
Create another mask without the fingers and add the rest of the body (including the hand)
Setup two layers in the Animator. The base layer uses the lower LOD (animations without the fingers). Next, create a new layer, and in its settings, enable the Sync checkbox and choose the Base Layer as the source layer. This second layer contains only the mask with fingers.
This setup doesn’t read all the animation’s curves, but it makes sure Unity only loads the masks which are needed. Using the sync layer makes it possible to use LODs although you need to set them up manually. Using LODs in Animation layers also saves CPU time, because animations do not evaluate with zero weights on the animation clip.