If you are using Unity 2019.3 or above, click here. Introduction
When our game runs, the central processing unit (CPU) of our device carries out instructions. Every single frame of our game requires many millions of these CPU instructions to be carried out. To maintain a smooth frame rate, the CPU must carry out of its instructions within a set amount of time. When the CPU cannot carry out all of its instructions in time, our game may slow down, stutter or freeze. Many things can cause the CPU to have too much work to do. Examples could include demanding rendering code, overly complex physics simulations or too many animation callbacks. This article focuses on only one of these reasons: CPU performance problems caused by the code that we write in our scripts.
In this article we will learn how our scripts are turned into CPU instructions, what can cause our scripts to generate an excessive amount of work for the CPU, and how to fix performance problems that are caused by the code in our scripts.
Diagnosing problems with our code
Performance problems caused by excessive demands on the CPU can manifest as low frame rates, jerky performance or intermittent freezes. However, other problems can cause similar symptoms. If our game has performance problems like this, the first thing we must do is to use Unity’s Profiler window to establish whether our performance problems are due to the CPU being unable to complete its tasks in time. Once we have established this, we must determine whether user scripts are the cause of the problem, or whether the problem is caused by some other part of our game: complex physics or animations, for example.
A brief introduction to how Unity builds and runs our game
To understand why our code may not be performing well, we first need to understand what happens when Unity builds our game. Knowing what's going on behind the scenes will help us to make informed decisions about how we can improve our game's performance.
The build process
When we build our game, Unity packages everything needed to run our game into a program that can be executed by our target device. CPUs can only run code written in very simple languages known as machine code or native code; they cannot run code written in more complex languages like C#. This means that Unity must translate our code into other languages. This translation process is called compiling. Unity first compiles our scripts into a language called Common Intermediate Language (CIL). CIL is a language that is easy to compile into a wide range of different native code languages. The CIL is then compiled to native code for our specific target device. This second step happens either when we build our game (known as ahead of time compilation or AOT compilation), or on the target device itself, just before the code is run (known as just in time compilation or JIT compilation). Whether our game uses AOT or JIT compilation usually depends on the target hardware. The relationship between the code we write and compiled code
Code that has not been compiled yet is known as source code. The source code that we write determines the structure and content of the compiled code. For the most part, source code that is well structured and efficient will result in compiled code that is well structured and efficient. However, it's useful for us to know a little about native code so that we can better understand why some source code is compiled into more efficient native code.
Firstly, some CPU instructions take more time to execute than others. An example of this is calculating a square root. This calculation takes a CPU more time to execute than, for example, multiplying two numbers. The difference between a single fast CPU instruction and a single slow CPU instruction is very small indeed, but it's useful for us to understand that, fundamentally, some instructions are simply faster than others.
The next thing we need to understand is that some operations that seem very simple in source code can be surprisingly complex when they are compiled to code. An example of this is inserting an element into a list. Many more instructions are needed to perform this operation than, for example, accessing an element from an array by index. Again, when we consider an individual example we are talking about a tiny amount of time, but it is important to understand that some operations result in more instructions than others.
Understanding these ideas will help us to understand why some code performs better than other code, even when both examples do quite similar things. Even a limited background understanding of how things work at a low level can help us to write games that perform well.
Run time communication between Unity Engine code and our script code
It's useful for us to understand that our scripts written in C# run in a slightly different way to the code that makes up much of the Unity Engine. Most of the core functionality of the Unity Engine is written in C++ and has already been compiled to native code. This compiled engine code is part of what we install when we install Unity.
Code compiled to CIL, such as our source code, is known as managed code. When managed code is compiled to native code, it is integrated with something called the managed runtime. The managed runtime takes care of things like automatic memory management and safety checks to ensure that a bug in our code will result in an exception rather than the device crashing.
When the CPU transitions between running engine code and managed code, work must be done to set up these safety checks. When passing data from managed code back to the engine code, the CPU may need to do work to convert the data from the format used by the managed runtime to the format needed by the engine code. This conversion is known as marshalling. Again, the overhead from any single call between managed and engine code is not particularly expensive, but it is important that we understand that this cost exists.
The causes of poorly-performing code
Now that we understand what happens to our code when Unity builds and runs our game we can understand that when our code performs poorly, it is because it creates too much work for the CPU at run time. Let's consider the different reasons for this.
The first possibility is that our code is simply wasteful or poorly structured. An example of this might be code that makes the same function call repeatedly when it could make the call only once. This article will cover several common examples of poor structure and show example solutions.
The second possibility is that our code appears to be well structured, but makes unnecessarily expensive calls to other code. An example of this might be code that results in unnecessary calls between managed and engine code. This article will give examples of Unity API calls that may be unexpectedly costly, with suggested alternatives that are more efficient.
The next possibility is that our code is efficient but it is being called when it does not need to be. An example of this might be code that simulates an enemy's line of sight. The code itself may perform well, but it is wasteful to run this code when the player is very far from the enemy. This article contains examples of techniques that can help us to write code that runs only when it needs to.
The final possibility is that our code is simply too demanding. An example of this might be a very detailed simulation where a large number of agents are using complex AI. If we have exhausted other possibilities and optimized this code as much as we can, then we may simply need to redesign our game to make it less demanding: for example, faking elements of our simulation rather than calculating them. Implementing this kind of optimization is beyond the scope of this article as it is extremely dependant on the game itself, but it will still benefit us to read the article and consider how to make our game as performant as possible.
Improving the performance of our code
Once we have established that performance problems in our game are due to our code, we must think carefully about how to resolve these problems. Optimizing a demanding function may seem like a good place to start, but it may be that the function in question is already as optimal as it can be and is simply expensive by nature. Instead of changing that function, there may be a small efficiency saving we can make in a script that is used by hundreds of GameObjects that gives us a much more useful performance increase. Furthermore, improving the CPU performance of our code may come at a cost: changes may increase memory usage or offload work to the GPU.
For these reasons, this article isn’t a set of simple steps to follow. This article is instead a series of suggestions for improving our code's performance, with examples of situations where these suggestions can be applied. As with all performance optimization, there are no hard and fast rules. The most important thing to do is to profile our game, understand the nature of the problem, experiment with different solutions and measure the results of our changes.
Writing efficient code
Writing efficient code and structuring it wisely can lead to improvements in our game's performance. While the examples shown are in the context of a Unity game, these general best practice suggestions are not specific to Unity projects or Unity API calls.
Move code out of loops when possible
Loops are a common place for inefficiencies to occur, especially when they are nested. Inefficiencies can really add up if they are in a loop that runs very frequently, especially if this code is found on many GameObjects in our game.
In the following simple example, our code iterates through the loop every time Update() is called, regardless of whether the condition is met.
We should examine our code for cases where we make frequent calls to functions that return a result. It is possible that we could reduce the cost of these calls by using caching.
Use the right data structure
How we structure our data can have a big impact on how our code performs. There is no single data structure that is ideal for all situations, so to get the best performance in our game we need to use the right data structure for each job.
To make the right decision about which data structure to use, we need to understand the strengths and weaknesses of different data structures and think carefully about what we want our code to do. We may have thousands of elements that we need to iterate over once per frame, or we may have a small number of elements that we need to frequently add to and remove from. These different problems will be best solved by different data structures.
Making the right decisions here depends on our knowledge of the subject. The best place to start, if this is a new area of knowledge, is to learn about Big O Notation. Big O Notation is how algorithmic complexity is discussed, and understanding this will help us to compare different data structures. This article is a clear and beginner-friendly guide to the subject. We can then learn more about the data structures available to us, and compare them to find the right data solutions for different problems. This MSDN guide to collections and data structures in C#) gives general guidance on choosing appropriate data structures and provides links to more in-depth documentation. A single choice about data structures is unlikely to have a large impact on our game. However, in a data-driven game that involves a great many of such collections the results of these choices can really add up. An understanding of algorithmic complexity and the strengths and weaknesses of different data structures will help us to create code that performs well.
Minimize the impact of garbage collection
Garbage collection is an operation that occurs as part of how Unity manages memory. The way that our code uses memory determines the frequency and CPU cost of garbage collection, so it's important that we understand how garbage collection works.
In the next step, we'll cover the topic of garbage collection in depth, and provide several different strategies for minimizing its impact.
Use object pooling
It's usually more costly to instantiate and destroy an object than it is to deactivate and reactivate it. This is especially true if the object contains start up code, such as calls to GetComponent() in an Awake() or Start() function. If we need to spawn and dispose of many copies of the same object, such as bullets in a shooting game, then we may benefit from object pooling. Object pooling is a technique where, instead of creating and destroying instances of an object, objects are temporarily deactivated and then recycled and reactivated as needed. Although well known as a technique for managing memory usage, object pooling can also be useful as a technique for reducing excessive CPU usage.
Avoiding expensive calls to the Unity API
Sometimes the calls our code makes to other functions or APIs can be unexpectedly costly. There could be many reasons for this. What looks like a variable could in fact be an accessor.) that contains additional code, triggers an event or makes a call from managed code to engine code. In this section we will look at a few examples of Unity API calls that are more costly than they may appear. We will consider how we might reduce or avoid these costs. These examples demonstrate different underlying causes for the cost, and the suggested solutions can be applied to other similar situations.
It's important to understand that there is no list of Unity API calls that we should avoid. Every API call can be useful in some situations and less useful in others. In all cases, we must profile our game carefully, identify the cause of costly code and think carefully about how to resolve the problem in a way that's best for our game.
SendMessage() and BroadcastMessage() are very flexible functions that require little knowledge of how a project is structured and are very quick to implement. As such, these functions are very useful for prototyping or for beginner-level scripting. However, they are extremely expensive to use. This is because these functions make use of reflection. Reflection is the term for when code examines and makes decisions about itself at run time rather than at compile time. Code that uses reflection results in far more work for the CPU than code that does not use reflection. It is recommended that SendMessage() and BroadcastMessage() are used only for prototyping and that other functions are used wherever possible. For example, if we know which component we want to call a function on, we should reference the component directly and call the function that way. If we do not know which component we wish to call a function on, we could consider using Events or Delegates. Find() and related functions are powerful but expensive. These functions require Unity to iterate over every GameObject and Component in memory. This means that they are not particularly demanding in small, simple projects but become more expensive to use as the complexity of a project grows.
It's best to use Find() and similar functions infrequently and to cache the results where possible. Some simple techniques that may help us to reduce the use of Find() in our code include setting references to objects using the Inspector panel where possible, or creating scripts that manage references to things that are commonly searched for.
Setting the position or rotation of a transform causes an internal OnTransformChanged event to propagate to all of that transform's children. This means that it's relatively expensive to set a transform's position and rotation values, especially in transforms that have many children.
To limit the number of these internal events, we should avoid setting the value of these properties more often than necessary. For example, we might perform one calculation to set a transform's x position and then another to set its z position in Update(). In this example, we should consider copying the transform's position to a Vector3, performing the required calculations on that Vector3 and then setting the transform's position to the value of that Vector3. This would result in only one OnTransformChanged event.
Transform.position is an example of an accessor that results in a calculation behind the scenes. This can be contrasted with Transform.localPosition. The value of localPosition is stored in the transform and calling Transform.localPosition simply returns this value. However, the transform's world position is calculated every time we call Transform.position.
If our code makes frequent use of Transform.position and we can use Transform.localPosition in its place, this will result in fewer CPU instructions and may ultimately benefit performance. If we make frequent use Transform.position, we should cache it where possible.
Update(), LateUpdate() and other event functions look like simple functions, but they have a hidden overhead. These functions require communication between engine code and managed code every time they are called. In addition to this, Unity carries out a number of safety checks before calling these functions. The safety checks ensure that the GameObject is in a valid state, hasn't been destroyed, and so on. This overhead is not particularly large for any single call, but it can add up in a game that has thousands of MonoBehaviours. For this reason, empty Update() calls can be particularly wasteful. We may assume that because the function is empty and our code contains no direct calls to it, the empty function will not run. This is not the case: behind the scenes, these safety checks and native calls still happen even when the body of the Update() function is blank. To avoid wasted CPU time, we should ensure that our game does not contain empty Update() calls.
If our game has a great many active MonoBehaviours with Update() calls, we may benefit from structuring our code differently to reduce this overhead. This Unity blog post on this subject goes into much more detail on this topic. We know that some operations simply result in more CPU instructions than other operations. Vector math operations are an example of this: they are simply more complex than float or int math operations. Although the actual difference in the time taken for two such calculations is tiny, at sufficient scale such operations can impact performance.
It's common and convenient to use Unity's Vector2 and Vector3 structs for mathematical operations, especially when dealing with transforms. If we perform many frequent Vector2 and Vector3 math operations in our code, for example in nested loops in Update() on a great many GameObjects, we may well be creating unnecessary work for the CPU. In these cases we may be able to make a performance saving by performing int or float calculations instead. Earlier in this article, we learned that the CPU instructions required to perform a square root calculation are slower than those used for, say, simple multiplication. Both Vector2.magnitude and Vector3.magnitude are examples of this, as they both involve square root calculations. Additionally, Vector2.Distance and Vector3.Distance use magnitude behind the scenes. If our game makes extensive and very frequent use of magnitude or Distance, it may be possible for us to avoid the relatively expensive square root calculation by using Vector2.sqrMagnitude and Vector3.sqrMagnitude instead. Again, replacing a single call will result in only a tiny difference, but at a sufficiently large scale it may be possible to make a useful performance saving. Camera.main is a convenient Unity API call that returns a reference to the first enabled Camera component that is tagged with "Main Camera". This is another example of something that looks like a variable but is in fact an accessor. In this case, the accessor calls an internal function similar to Find() behind the scenes. Camera.main therefore suffers from the same problem as Find(): it searches through all GameObjects and Components in memory and can be very expensive to use. To avoid this potentially expensive call, we should either cache the result of Camera.main or avoid its use altogether and manually manage references to our cameras.
Other Unity API calls and further optimizations
We have considered a few common examples of Unity API calls that may be unexpectedly costly, and learned about the different reasons behind this cost. However, this is by no means an exhaustive list of ways to improve the efficiency of our Unity API calls.
This article on performance in Unity is a wide-ranging guide to optimization in Unity that contains a number of other Unity API optimizations that we may find useful. Additionally, that article goes into considerable depth about further optimizations that are beyond the scope of this relatively high-level and beginner-friendly article. Running code only when it needs to run
There’s a saying in programming: "the fastest code is the code that doesn’t run". Often, the most efficient way to solve a performance problem is not to use an advanced technique: it is simply to remove code that doesn’t need to be there in the first place. Let’s look at a couple of examples to see where we could make to make this sort of saving.
Culling
Unity contains code that checks whether objects are within the frustum of a camera. If they are not within the frustum of a camera, code related to rendering these objects does not run. The term for this is frustum culling.
We can take a similar approach to the code in our scripts. If we have code that relates to the visual state of an object, we may not need to execute this code when the object cannot be seen by the player. In a complex Scene with many objects, this can result in considerable performance savings.
In the following simplified example code, we have an example of a patrolling enemy. Every time Update() is called, the script controlling this enemy calls two example functions: one related to moving the enemy, one related to its visual state.
Disabling code when things are not seen by the player can be achieved in a few ways. If we know that there certain objects in our scene are not visible at a particular point in the game, we can manually disable them. When we are less certain and need to calculate visibility, we could use a coarse calculation (for example, checking if the object behind the player), functions such as OnBecameInvisible() and OnBecameVisible(), or a more detailed raycast. The best implementation depends very much on our game, and experimentation and profiling are essential.
Level of detail
Level of detail, also known as LOD, is another common rendering optimization technique. Objects nearest to the player are rendered at full fidelity using detailed meshes and textures. Distant objects use less detailed meshes and textures. A similar approach can be used with our code. For example, we may have an enemy with an AI script that determines its behavior. Part of this behavior may involve costly operations for determining what it can see and hear, and how it should react to this input. We could use a level of detail system to enable and disable these expensive operations based on the enemy's distance from the player. In a Scene with many of these enemies, we could make a considerable performance saving if only the nearest enemies are performing the most expensive operations.
Unity's CullingGroup API allows us to hook into Unity's LOD system to optimize our code. The Manual page for the CullingGroup API contains several examples of how this might be used in our game. As ever, we should test, profile and find the right solution for our game. Conclusion
We’ve learned how what happens to the code we write when our Unity game is built and run, why our code can cause performance problems and how to minimize the impact of expensive on our game. We've learned about a number of common causes of performance problems in our code, and considered a few different solutions. Using this knowledge and our profiling tools, we should now be able to diagnose, understand and fix performance problems related to the code in our game.