Optimizing graphics rendering in Unity games

Revisado con versión: 5.4

-

Dificultad: Intermedio

Introduction

In this article we will learn what happens behind the scenes when Unity renders a frame, what kind of performance problems can occur when rendering and how to fix performance problems related to rendering.

Before we read this article, it is vital to understand that there is no one size fits all approach to improving rendering performance. Rendering performance is affected by many factors within our game and is also highly dependent on the hardware and operating system that our game runs on. The most important thing to remember is that we solve performance problems by investigating, experimenting and rigorously profiling the results of our experiments.

This article contains information on most common rendering performance problems with suggestions on how to fix them and links to further reading. It’s possible that our game could have a problem - or combination of problems - not covered here. This article, however, will still help us to understand our problem and give us the knowledge and vocabulary to effectively search for a solution.

A brief introduction to rendering

Before we begin, let’s take a quick and somewhat simplified look at what happens when Unity renders a frame. Understanding the flow of events and the correct terms for things will help us to understand, research and work towards fixing our performance problems.

NB: Throughout this article, we will use the term "object" to mean an object that may be rendered in our game. Any GameObject with a Renderer component will be referred to as an object.

At the most basic level, rendering can be described as follows:

  • The central processing unit, known as the CPU, works out what must be drawn and how it must be drawn.

  • The CPU sends instructions to the graphics processing unit, known as the GPU.

  • The GPU draws things according to the CPU’s instructions.

Now let’s take a closer look at what happens. We’ll cover each of these steps in greater detail later in the article, but for now let’s just familiarise ourselves with the words used and understand the different roles that the CPU and GPU play in rendering.

The phrase often used to describe rendering is the rendering pipeline, and this is a useful image to bear in mind; efficient rendering is all about keeping information flowing.

For every frame that is rendered, the CPU does the following work:

  • The CPU checks every object in the scene to determine whether it should be rendered. An object is only rendered if it meets certain criteria; for example, some part of its bounding box must be within a camera’s view frustum. Objects that will not be rendered are said to be culled. For more information on the frustum and frustum culling please see this page.

  • The CPU gathers information about every object that will be rendered and sorts this data into commands known as draw calls. A draw call contains data about a single mesh and how that mesh should be rendered; for example, which textures should be used. Under certain circumstances, objects that share settings may be combined into the same draw call. Combining data for different objects into the same draw call is known as batching.

  • The CPU creates a packet of data called a batch for each draw call. Batches may sometimes contain data other than draw calls, but these situations are unlikely to contribute to common performance issues and we therefore won’t consider these in this article.

For every batch that contains a draw call, the CPU now must do the following:

  • The CPU may send a command to the GPU to change a number of variables known collectively as the render state. This command is known as a SetPass call. A SetPass call tells the GPU which settings to use to render the next mesh. A SetPass call is sent only if the next mesh to be rendered requires a change in render state from the previous mesh.

  • The CPU sends the draw call to the GPU. The draw call instructs the GPU to render the specified mesh using the settings defined in the most recent SetPass call.

  • Under certain circumstances, more than one pass may be required for the batch. A pass is a section of shader code and a new pass requires a change to the render state. For each pass in the batch, the CPU must send a new SetPass call and then must send the draw call again.

Meanwhile, the GPU does the following work:

  • The GPU handles tasks from the CPU in the order that they were sent.

  • If the current task is a SetPass call, the GPU updates the render state.

  • If the current task is a draw call, the GPU renders the mesh. This happens in stages, defined by separate sections of shader code. This part of rendering is complex and we won’t cover it in great detail, but it’s useful for us to understand that a section of code called the vertex shader tells the GPU how to process the mesh’s vertices and then a section of code called the fragment shader tells the GPU how to draw the individual pixels.

  • This process repeats until all tasks sent from the CPU have been processed by the GPU.

Now that we understand what’s happening when Unity renders a frame, let’s consider the sort of problems that can occur when rendering.

Types of rendering problems

The most important thing to understand about rendering is this: both the CPU and the GPU must finish all of their tasks in order to render the frame. If any one of these tasks takes too long to complete, it will cause a delay to the rendering of the frame.

Rendering problems have two fundamental causes. The first type of problem is caused by an inefficient pipeline. An inefficient pipeline occurs when one or more of the steps in the rendering pipeline takes too long to complete, interrupting the smooth flow of data. Inefficiencies within the pipeline are known as bottlenecks. The second type of problem is caused by simply trying to push too much data through the pipeline. Even the most efficient pipeline has a limit to how much data it can handle in a single frame.

When our game takes too long to render a frame because the CPU takes too long to perform its rendering tasks, our game is what is known as CPU bound. When our game takes too long to render a frame because the GPU takes too long to perform its rendering tasks, our game is what is known as GPU bound.

Understanding rendering problems

It is vital that we use profiling tools to understand the cause of performance problems before we make any changes. Different problems require different solutions. It is also very important that we measure the effects of every change we make; fixing performance problems is a balancing act, and improving one aspect of performance can negatively impact another.

We will use two tools to help us understand and fix our rendering performance problems: the Profiler window and the Frame Debugger. Both of these tools are built into Unity.

The Profiler window

The Profiler window allows us to see real-time data about how our game is performing. We can use the Profiler window to see data about many aspects of our game, including memory usage, the rendering pipeline and the performance of user scripts.

If you are not yet familiar with using the Profiler window, this page of the Unity Manual is a good introduction and this tutorial shows how to use it in detail.

The Frame Debugger

The Frame Debugger allows us to see how a frame is rendered, step by step. Using the Frame Debugger, we can see detailed information such as what is drawn during each draw call, shader properties for each draw call and the order of events sent to the GPU. This information helps us to understand how our game is rendered and where we can improve performance.

If you are not yet familiar with using the Frame Debugger, this page of the Unity Manual is a very useful guide to what it does and this tutorial video shows it in use.

Finding the cause of performance problems

Before we try to improve the rendering performance of our game, we must be certain that our game is running slowly due to rendering problems. There is no point trying to optimize our rendering performance if the real cause of our problem is overly complex user scripts! If you’re not sure whether your performance problems relate to rendering, you should follow this tutorial.

Once we have established that our problems relate to rendering, we must also understand whether our game is CPU bound or GPU bound. These different problems require different solutions, so it’s vital that we understand the cause of the problem before trying to fix it. If you’re not yet sure whether your game is CPU bound or GPU bound, you should follow this tutorial.

If we are certain that our problems relate to rendering and we know whether our game is CPU bound or GPU bound, we are ready to read on.

If our game is CPU bound

Broadly speaking, the work that must be carried out by the CPU in order to render a frame is divided into three categories:

  • Determining what must be drawn

  • Preparing commands for the GPU

  • Sending commands to the GPU

These broad categories contain many individual tasks, and these tasks may be carried out across multiple threads. Threads allow separate tasks to happen simultaneously; while one thread performs one task, another thread can perform a completely separate task. This means that the work can be done more quickly. When rendering tasks are split across separate threads, this is known as multithreaded rendering.

There are three types of thread involved in Unity’s rendering process: the main thread, the render thread and worker threads. The main thread is where the majority of CPU tasks for our game take place, including some rendering tasks. The render thread is a specialised thread that sends commands to the GPU. Worker threads each perform a single task, such as culling or mesh skinning. Which tasks are performed by which thread depends on our game’s settings and the hardware on which our game runs. For example, the more CPU cores our target hardware has, the more worker threads can be spawned. For this reason, it is very important to profile our game on target hardware; our game may perform very differently on different devices.

Because multithreaded rendering is complex and hardware-dependent, we must understand which tasks are causing our game to be CPU bound before we try to improve performance. If our game is running slowly because culling operations are taking too long on one thread, then it won’t help us to reduce the amount of time it takes to send commands to the GPU on a different thread.

NB: Not all platforms support multithreaded rendering; at the time of writing, WebGL does not support this feature. On platforms that do not support multithreaded rendering, all CPU tasks are carried out on the same thread. If we are CPU bound on such a platform, optimizing any CPU work will improve CPU performance. If this is the case for our game, we should read all of the following sections and consider which optimizations may be most suitable for our game.

Graphics jobs

The Graphics jobs option in Player Settings determines whether Unity uses worker threads to carry out rendering tasks that would otherwise be done on the main thread and, in some cases, the render thread. On platforms where this feature is available, it can deliver a considerable performance boost. If we wish to use this feature, we should profile our game with and without Graphics jobs enabled and observe the effect that it has on performance.

Finding out which tasks are contributing to problems

We can determine which tasks are causing our game to be CPU bound by using the Profiler window. This tutorial shows how to determine where the problems lie.

Now that we understand which tasks are causing our game to be CPU bound, let’s look at a few common problems and their solutions.

Sending commands to the GPU

The time taken to send commands to the GPU is the most common reason for a game to be CPU bound. This task is performed on the render thread on most platforms, although on certain platforms (for example, PlayStation 4) this may be performed by worker threads.

The most costly operation that occurs when sending commands to the GPU is the SetPass call. If our game is CPU bound due to sending commands to the GPU, reducing the number of SetPass calls is likely to be the best way to improve performance.

We can see how many SetPass calls and batches are being sent in Rendering profiler of Unity’s Profiler window. The number of SetPass calls that can be sent before performance suffers depends very much on the target hardware; a high-end PC can send many more SetPass calls before performance suffers than a mobile device.

The number of SetPass calls and its relationship to the number of batches depends on several factors, and we’ll cover these topics in more detail later in the article. However, it’s usually the case that:

  • Reducing the number of batches and/or making more objects share the same render state will, in most cases, reduce the number of SetPass calls.

  • Reducing the number of SetPass calls will, in most cases, improve CPU performance.

If reducing the number of batches doesn’t reduce the number of SetPass calls, it may still lead to performance improvements in its own right. This is because the CPU can more efficiently process a single batch than several batches, even if they contain the same amount of mesh data.

There are, broadly, three ways of reducing the number of batches and SetPass calls. We will look more in-depth at each one of these:

  • Reducing the number of objects to be rendered will likely reduce both batches and SetPass calls.

  • Reducing the number of times each object must be rendered will usually reduce the number of SetPass calls.

  • Combining the data from objects that must be rendered into fewer batches will reduce the number of batches.

Different techniques will be suitable for different games, so we should consider all of these options, decide which ones could work in our game and experiment.

Reducing the number of objects being rendered

Reducing the number of objects that must be rendered is the simplest way to reduce the number of batches and SetPass calls. There are a several techniques we can use to reduce the number of objects being rendered.

  • Simply reducing the number of visible objects in our scene can be an effective solution. If, for example, we are rendering a large number of different characters in a crowd, we can experiment with simply having fewer of these characters in the scene. If the scene still looks good and performance improves, this will likely be a much quicker solution than more sophisticated techniques.

  • We can reduce our camera’s draw distance using the camera’s Far Clip Plane property. This property is the distance beyond which objects are no longer rendered by the camera. If we wish to disguise the fact that distant objects are no longer visible, we can trying using fog to hide the lack of distant objects.

  • For a more fine-grained approach to hiding objects based on distance, we can use our camera’s Layer Cull Distances property to provide custom culling distances for objects that are on separate layers. This approach can be useful if we have lots of small foreground decorative details; we could hide these details at a much shorter distance than large terrain features.

  • We can use a technique called occlusion culling to disable the rendering of objects that are hidden by other objects. For example, if there is a large building in our scene we can use occlusion culling to disable the rendering of objects behind it. Unity’s occlusion culling is not suitable for all scenes, can lead to additional CPU overhead and can be complex to set up, but it can greatly improve performance in some scenes. This Unity blog post on occlusion culling best practices is a great guide to to the subject. In addition to using Unity’s occlusion culling, we can also implement our own form of occlusion culling by manually deactivating objects that we know cannot be seen by the player. For example, if our scene contains objects that are used for a cutscene but aren't visible before or afterwards, we should deactivate them. Using our knowledge of our own game is always more efficient than asking Unity to work things out dynamically.

Reducing the number of times each object must be rendered

Realtime lighting, shadows and reflections add a great deal of realism to games but can be very expensive. Using these features can lead to objects to be rendered multiple times, which can greatly impact performance.

The exact impact of these features depends on the rendering path that we choose for our game. Rendering path is the term for the order in which calculations are performed when drawing the scene, and the major difference between rendering paths is how they handle realtime lights, shadows and reflections. As a general rule, Deferred Rendering is likely to be a better choice if our game runs on higher-end hardware and uses a lot of realtime lights, shadows and reflections. Forward Rendering is likely to be more suitable if our game runs on lower-end hardware and does not use these features. However, this is a very complex issue and if we wish to make use of realtime lights, shadows and reflections it is best to research the subject and experiment. This page of the Unity Manual gives more information on the different rendering paths available in Unity and is a useful jumping-off point. This tutorial contains useful information on the subject of lighting in Unity.

Regardless of the rendering path chosen, the use of realtime lights, shadows and reflections can impact our game’s performance and it’s important to understand how to optimize them.

  • Dynamic lighting in Unity is a very complex subject and discussing it in depth is beyond the scope of this article, but this tutorial is an excellent introduction to the subject and this page of the Unity Manual has details on common lighting optimizations.

  • Dynamic lighting is expensive. When our scene contains objects that don’t move, such as scenery, we can use a technique called baking to precompute the lighting for the scene so that runtime lighting calculations are not required. This tutorial gives an introduction to the technique, and this section of the Unity Manual covers baked lighting in detail.

  • If we wish to use realtime shadows in our game, this is likely an area where we can improve performance. This page of the Unity Manual is a good guide to the shadow properties that can be tweaked in Quality Settings and how these will affect appearance and performance. For example, we can use the Shadow Distance property to ensure that only nearby objects cast shadows.

  • Reflection probes create realistic reflections but can be very costly in terms of batches. It’s best to keep our use of reflections to a minimum where performance is a concern, and to optimize them as much as possible where they are used. This page of the Unity Manual is a useful guide to optimizing reflection probes.

Combining objects into fewer batches

A batch can contain the data for multiple objects when certain conditions are met. To be eligible for batching, objects must:

  • Share the same instance of the same material

  • Have identical material settings (i.e., texture, shader and shader parameters)

Batching eligible objects can improve performance, although as with all optimization techniques we must profile carefully to ensure that the cost of batching does not exceed the performance gains.

There are a few different techniques for batching eligible objects:

  • Static batching is a technique that allows Unity to batch nearby eligible objects that do not move. A good example of something that could benefit from static batching is a pile of similar objects, such as boulders. This page of the Unity Manual contains instructions on setting up static batching in our game. Static batching can lead to higher memory usage so we should bear this cost in mind when profiling our game.

  • Dynamic batching is another technique that allows Unity to batch eligible objects, whether they move or not. There are a few restrictions on the objects that can be batched using this technique. These restrictions are listed, along with instructions, on this page of the Unity Manual. Dynamic batching has an impact on CPU usage that can cause it to cost more in CPU time than it saves. We should bear this cost in mind when experimenting with this technique and be cautious with its use.

  • Batching Unity’s UI elements is a little more complex, as it can be affected by the layout of our UI. This video from Unite Bangkok 2015 gives a good overview of the subject and this guide to optimizing Unity UI provides in-depth information on how to ensure that UI batching works as we intend it to.

  • GPU instancing is a technique that allows large numbers of identical objects to be very efficiently batched. There are limitations to its use and it is not supported by all hardware, but if our game has many identical objects onscreen at once we may be able to benefit from this technique. This page of the Unity Manual contains an introduction to GPU instancing in Unitywith details of how to use it, which platforms support it and the circumstances under which it may benefit our game.

  • Texture atlasing is a technique where multiple textures are combined into one larger texture. It is commonly used in 2D games and UI systems, but can also be used in 3D games. If we use this technique when creating art for our game, we can ensure that objects share textures and are therefore eligible for batching. Unity has a built-in texture atlasing tool called Sprite Packer for use with 2D games.

  • It is possible to manually combine meshes that share the same material and texture, either in the Unity Editor or via code at runtime. When combining meshes in this way, we must be aware that shadows, lighting and culling will still operate on a per-object level; this means that a performance increase from combining meshes could be counteracted by no longer being able to cull those objects when they would otherwise not have been rendered. If we wish to investigate this approach, we should examine the the Mesh.CombineMeshes function. The CombineChildren script in Unity’s Standard Assets package is an example of this technique.

  • We must be very careful when accessing Renderer.material in scripts. This duplicates the material and returns a reference to the new copy. Doing so will break batching if the renderer was part of a batch because the renderer no longer has a reference to the same instance of the material. If we wish to access a batched object’s material in a script, we should use Renderer.sharedMaterial.

Culling, sorting and batching

Culling, gathering data on objects that will be drawn, sorting this data into batches and generating GPU commands can all contribute to being CPU bound. These tasks will either be performed on the main thread or on individual worker threads, depending on our game’s settings and target hardware.

  • Culling is unlikely to be very costly on its own, but reducing unnecessary culling may help performance. There is a per-object-per-camera overhead for all active scene objects, even those which are on layers that are not being rendered. To reduce this, we should disable cameras and deactivate or disable renderers that are not currently in use.

  • Batching can greatly improve the speed of sending commands to the GPU, but it can sometimes add unwanted overhead elsewhere. If batching operations are contributing to our game being CPU bound, we may wish to limit the number of manual or automatic batching operations in our game.

Skinned meshes

SkinnedMeshRenderers are used when we animate a mesh by deforming it using a technique called bone animation. It’s most commonly used in animated characters. Tasks related to rendering skinned meshes will usually be performed on the main thread or on individual worker threads, depending on our game’s settings and target hardware.

Rendering skinned meshes can be a costly operation. If we can see in Profiler window that rendering skinned meshes is contributing to our game being CPU bound, there are a few things we can try to improve performance:

  • We should consider whether we need to use SkinnedMeshRenderer components for every object that currently uses one. It may be that we have imported a model that uses a SkinnedMeshRenderer component but we are not actually animating it, for example. In a case like this, replacing the SkinnedMeshRenderer component with a MeshRenderer component will aid performance. When importing models into Unity, if we choose not to import animations in the model’s Import Settings, the model will have a MeshRenderer instead of a SkinnedMeshRenderer.

  • If we are animating our object only some of the time (for example, only on start up or only when it is within a certain distance of the camera), we could switch its mesh for a less detailed version or its SkinnedMeshRenderer component for a MeshRenderer component. The SkinnedMeshRenderer component has a BakeMesh function that can create a mesh in a matching pose, which is useful for swapping between different meshes or renderers without any visible change to the object.

  • This page of the Unity Manual contains advice on optimizing animated characters that use skinned meshes, and the Unity Manual page on the SkinnedMeshRenderer component includes tweaks that can improve performance. In addition to the suggestions on these pages, it is worth bearing in mind that the cost of mesh skinning increases per vertex; therefore using fewer vertices in our models with reduce the amount of work that must be done.

  • On certain platforms, skinning can be handled by the GPU rather than the CPU. This option may be worth experimenting with if we have a lot of capacity on the GPU. We can enable GPU skinning for the current platform and quality target in Player Settings.

Main thread operations unrelated to rendering

It’s important to understand that many CPU tasks unrelated to rendering take place on the main thread. This means that if we are CPU bound on the main thread, we may be able to improve performance by reducing the CPU time spent on tasks not related to rendering.

As an example, our game may be carrying out expensive rendering operations and expensive user script operations on the main thread at a certain point in our game, making us CPU bound. If we have optimized the rendering operations as much as we can without losing visual fidelity, it is possible that we may be able to reduce the CPU cost of our own scripts to improve performance.

If our game is GPU bound

The first thing to do if our game is GPU bound is to find out what is causing the GPU bottleneck. GPU performance is most often limited by fill rate, especially on mobile devices, but memory bandwidth and vertex processing can also be concerns. Let’s examine each of these problems and learn what causes it, how to diagnose it and how to fix it.

Fill rate

Fill rate refers to the number of pixels the GPU can render to the screen each second. If our game is limited by fill rate, this means that our game is trying to draw more pixels per frame than the GPU can handle.

It’s simple to check if fill rate is causing our game to be GPU bound:

  • Profile the game and note the GPU time.

  • Decrease the display resolution in Player Settings.

  • Profile the game again. If performance has improved, it is likely that fill rate is the problem.

If fill rate is the cause of our problem, there are a few approaches that may help us to fix the problem.

  • Fragment shaders are the sections of shader code that tell the GPU how to draw a single pixel. This code is executed by the GPU for every pixel it must draw, so if the code is inefficient then performance problems can easily stack up. Complex fragment shaders are a very common cause of fill rate problems.

    • If our game is using built-in shaders, we should aim to use the simplest and most optimized shaders possible for the visual effect we want. As an example, the mobile shaders that ship with Unity are highly optimized; we should experiment with using them and see if this improves performance without affecting the look of our game. These shaders were designed for use on mobile platforms, but they are suitable for any project. It is perfectly fine to use "mobile" shaders on non-mobile platforms to increase performance if they give the visual fidelity required for the project.

    • If objects in our game use Unity’s Standard Shader, it is important to understand that Unity compiles this shader based on the current material settings. Only features that are currently being used are compiled. This means that removing features such as detail maps can result in much less complex fragment shader code which can greatly benefit performance. Again, if this is the case in our game, we should experiment with the settings and see if we are able to improve performance without affecting visual quality.

    • If our project uses bespoke shaders, we should aim to optimize them as much as possible. Optimizing shaders is a complex subject, but this page of the Unity Manual and the Shader optimization section of this page of the Unity Manual contain useful starting points for optimizing our shader code.

  • Overdraw is the term for when the same pixel is drawn multiple times. This happens when objects are drawn on top of other objects and contributes greatly to fill rate issues. To understand overdraw, we must understand the order in which Unity draws objects in the scene. An object’s shader determines its draw order, usually by specifying which render queue the object is in. Unity uses this information to draw objects in a strict order, as detailed on this page of the Unity Manual. Additionally, the objects in different render queues are sorted differently before they are drawn. For example, Unity sorts items front-to-back in the Geometry queue to minimize overdraw, but sorts objects back-to-front in the Transparent queue to achieve the required visual effect. This back-to-front sorting actually has the effect of maximizing overdraw for objects in the Transparent queue. Overdraw is a complex subject and there is no one size fits all approach to solving overdraw problems, but reducing the number of overlapping objects that Unity cannot automatically sort is key. The best place to start investigating this issue is in Unity’s Scene view; there is a Draw Mode that allows us to see overdraw in our scene and, from there, identify where we can work to reduce it. The most common culprits for excessive overdraw are transparent materials, unoptimized particles and overlapping UI elements, so we should experiment with optimizing or reducing these. This article on the Unity Learn site focuses primarily on Unity UI, but also contains good general guidance on overdraw.

  • The use of image effects can greatly contribute to fill rate issues, especially if we are using more than one image effect. If our game makes use of image effects and is struggling with fill rate issues, we may wish to experiment with different settings or more optimized versions of the image effects (such as Bloom (Optimized) in place of Bloom). If our game uses more than one image effect on the same camera, this will result in multiple shader passes. In this case, it may be beneficial to combine the shader code for our image effects into a single pass, such as in Unity’s PostProcessing Stack. If we have optimized our image effects and are still having fill rate issues, we may need to consider disabling image effects, particularly on lower-end devices.

Memory bandwidth

Memory bandwidth refers to the rate at which the GPU can read from and write to its dedicated memory. If our game is limited by memory bandwidth, this usually means that we are using textures that are too large for the GPU to handle quickly.

To check if memory bandwidth is a problem, we can do the following:

  • Profile the game and note the GPU time.

  • Reduce the Texture Quality for the current platform and quality target in Quality Settings.

  • Profile the game again and note the GPU time. If performance has improved, it is likely that memory bandwidth is the problem.

If memory bandwidth is our problem, we need to reduce the texture memory usage in our game. Again, the technique that works best for each game will be different, but there are a few ways in which we can optimize our textures.

  • Texture compression is a technique that can greatly reduce the size of textures both on disk and in memory. If memory bandwidth is a concern in our game, using texture compression to reduce the size of textures in memory can aid performance. There are lots of different texture compression formats and settings available within Unity, and each texture can have separate settings. As a general rule, some form of texture compression should be used whenever possible; however, a trial and error approach to find the best setting for each texture works best. This page in the Unity Manual contains useful information on different compression formats and settings.

  • Mipmaps are lower resolution versions of textures that Unity can use on distant objects. If our scene contains objects that are far from the camera, we may be able to use mipmaps to ease problems with memory bandwidth. The Mipmaps Draw Mode in Scene view allows us to see which objects in our scene could benefit from mipmaps, and this page of the Unity Manual contains more information on enabling mipmaps for textures.

Vertex processing

Vertex processing refers to the work that the GPU must do to render each vertex in a mesh. The cost of vertex processing is impacted by two things: the number of vertices that must be rendered, and the number of operations that must be performed on each vertex.

If our game is GPU bound and we have established that it isn’t limited by fill rate or memory bandwidth, then it is likely that vertex processing is the cause of the problem. If this is the case, experimenting with reducing the amount of vertex processing that the GPU must do is likely to result in performance gains.

There are a few approaches we could consider to help us reduce the number of vertices or the number of operations that we are performing on each vertex.

  • Firstly, we should aim to reduce any unnecessary mesh complexity. If we are using meshes that have a level of detail that cannot be seen in game, or inefficient meshes that have too many vertices due to errors in creating them, this is wasted work for the GPU. The simplest way to reduce the cost of vertex processing is to create meshes with a lower vertex count in our 3D art program.

  • We can experiment with a technique called normal mapping, which is where textures are used to create the illusion of greater geometric complexity on a mesh. Although there is some GPU overhead to this technique, it will in many cases result in a performance gain. This page of the Unity Manual has a useful guide to using normal mapping to simulate complex geometry in our meshes.

  • If a mesh in our game does not make use of normal mapping, we can often disable the use of vertex tangents for that mesh in the mesh’s import settings. This reduces the amount of data that is sent to the GPU for each vertex.

  • Level of detail, also known as LOD, is an optimisation technique where meshes that are far from the camera are reduced in complexity. This reduces the the number of vertices that the GPU has to render without affecting the visual quality of the game. The LOD Group page of the Unity Manual contains more information on how to set up LOD in our game.

  • Vertex shaders are blocks of shader code that tell the GPU how to draw each vertex. If our game is limited by vertex processing, then reducing the complexity of our vertex shaders may help.

    • If our game uses built-in shaders, we should aim to use the simplest and most optimized shaders possible for the visual effect we want. As an example, the mobile shaders that ship with Unity are highly optimized; we should experiment with using them and see if this improves performance without affecting the look of our game.

    • If our project uses bespoke shaders, we should aim to optimize them as much as possible. Optimizing shaders is a complex subject, but this page of the Unity Manual and the Shader optimization section of this page of the Unity Manual contain useful starting points for optimizing our shader code.

Conclusion

We’ve learned how rendering works in Unity, what sort of problems can occur when rendering and how to improve rendering performance in our game. Using this knowledge and our profiling tools, we can fix performance problems related to rendering and structure our games so that they have a smooth and efficient rendering pipeline.

The links below provide further information on the topics covered in this article.

Resources

Unity Learn: A guide to optimizing Unity UI

Unity Knowledge Base: Why is my static batching breaking or otherwise not working as expected?

Fabian Giesen: A trip through the graphics pipeline

Simon Schreibt: Render hell

Gamasutra: How to choose between Forward or Deferred rendering paths in Unity

Gamasutra: Batching independently moving GameObjects into a single mesh to reduce draw calls

FlameBait Games: Optimizing SkinnedMeshRenderers for Unity 5

Pencil Square Games: Reducing draw calls (also named SetPass calls) in Unity 5