Assets, Objects and serialization
Checked with version: 2017.3
This is the second chapter in a series of articles covering Assets, Resources and resource management in Unity 5.
This chapter covers the deep internals of Unity's serialization system and how Unity maintains robust references between different Objects, both in the Unity Editor and at runtime. It also discusses the technical distinctions between Objects and Assets. The topics covered here are fundamental to understanding how to efficiently load and unload Assets in Unity. Proper Asset management is crucial to keeping loading times short and memory usage low.
To understand how to properly manage data in Unity, it is important to understand how Unity identifies and serializes data. The first key point is the distinction between Assets and UnityEngine.Objects.
An Asset is a file on disk, stored in the Assets folder of a Unity project. Textures, 3D models, or audio clips are common types of Assets. Some Assets contain data in formats native to Unity, such as materials. Other Assets need to be processed into native formats, such as FBX files.
A UnityEngine.Object, or Object with a capitalized 'O', is a set of serialized data collectively describing a specific instance of a resource. This can be any type of resource which the Unity Engine uses, such as a mesh, sprite, AudioClip or AnimationClip. All Objects are subclasses of the UnityEngine.Object base class.
While most Object types are built-in, there are two special types.
A ScriptableObject provides a convenient system for developers to define their own data types. These types can be natively serialized and deserialized by Unity, and manipulated in the Unity Editor's Inspector window.
A MonoBehaviour provides a wrapper that links to a MonoScript. A MonoScript is an internal data type that Unity uses to hold a reference to a specific scripting class within a specific assembly and namespace. The MonoScript does not contain any actual executable code.
There is a one-to-many relationship between Assets and Objects; that is, any given Asset file contains one or more Objects.
All UnityEngine.Objects can have references to other UnityEngine.Objects. These other Objects may reside within the same Asset file, or may be imported from other Asset files. For example, a material Object usually has one or more references to texture Objects. These texture Objects are generally imported from one or more texture Asset files (such as PNGs or JPGs).
When serialized, these references consist of two separate pieces of data: a File GUID and a Local ID. The File GUID identifies the Asset file where the target resource is stored. A locally unique(1) Local ID identifies each Object within an Asset file because an Asset file may contain multiple Objects.
File GUIDs are stored in .meta files. These .meta files are generated when Unity first imports an Asset, and are stored in the same directory as the Asset.
The above identification and referencing system can be seen in a text editor: create a fresh Unity project and change its Editor Settings to expose Visible Meta Files and to serialize Assets as text. Create a material and import a texture into the project. Assign the material to a cube in the scene and save the scene.
Using a text editor, open the .meta file associated with the material. A line labeled "guid" will appear near the top of the file. This line defines the material Asset's File GUID. To find the Local ID, open the material file in a text editor. The material Object's definition will look like this:
--- !u!21 &2100000 Material: serializedVersion: 3 ... more data …
In the above example, the number preceded by an ampersand is the material's Local ID. If this material Object were located inside an Asset identified by the File GUID "abcdefg", then the material Object could be uniquely identified as the combination of the File GUID "abcdefg" and the Local ID "2100000".
Why is Unity's File GUID and Local ID system necessary? The answer is robustness and to provide a flexible, platform-independent workflow.
The File GUID provides an abstraction of a file's specific location. As long as a specific File GUID can be associated with a specific file, that file's location on disk becomes irrelevant. The file can be freely moved without having to update all Objects referring to the file.
As any given Asset file may contain (or produce via import) multiple UnityEngine.Object resources, a Local ID is required to unambiguously distinguish each distinct Object.
If the File GUID associated with an Asset file is lost, then references to all Objects in that Asset file will also be lost. This is why it is important that the .meta files must remain stored with the same file names and in the same folders as their associated Asset files. Note that Unity will regenerate deleted or misplaced .meta files.
The Unity Editor has a map of specific file paths to known File GUIDs. A map entry is recorded whenever an Asset is loaded or imported. The map entry links the Asset's specific path to the Asset's File GUID. If the Unity Editor is open when a .meta file goes missing and the Asset's path does not change, the Editor can ensure that the Asset retains the same File GUID.
If the .meta file is lost while the Unity Editor is closed, or the Asset's path changes without the .meta file moving along with the Asset, then all references to Objects within that Asset will be broken.
As mentioned in the Inside Assets and Objects section, non-native Asset types must be imported into Unity. This is done via an asset importer. While these importers are usually invoked automatically, they are also exposed to scripts via the AssetImporter API. For example, the TextureImporter API provides access to the settings used when importing individual texture Assets, such as PNG files.
The result of the import process is one or more UnityEngine.Objects. These are visible in the Unity Editor as multiple sub-assets within the parent Asset, such as multiple sprites nested beneath a texture Asset that has been imported as a sprite atlas. Each of these Objects will share a File GUID as their source data is stored within the same Asset file. They will be distinguished within the imported texture Asset by a Local ID.
The import process converts source Assets into formats suitable for the target platform selected in the Unity Editor. The import process can include a number of heavyweight operations, such as texture compression. As this is often a time-consuming process, imported Asset are cached in the Library folder, eliminating the need to re-import Assets again on the next Editor launch.
Specifically, the results of the import process are stored in a folder named for the first two digits of the Asset's File GUID. This folder is stored inside the Library/metadata/ folder. The individual Objects from the Asset are serialized into a single binary file that has a name identical to the Asset's File GUID.
This process applies to all Assets, not just non-native Assets. Native assets do not require lengthy conversion processes or re-serialization.
While File GUIDs and Local IDs are robust, GUID comparisons are slow and a more performant system is needed at runtime. Unity internally maintains a cache(2) that translates File GUIDs and Local IDs into simple, session-unique integers. These are called Instance IDs, and are assigned in a simple, monotonically-increasing order when new Objects are registered with the cache.
The cache maintains mappings between a given Instance ID, File GUID and Local ID defining the location of the Object's source data, and the instance of the Object in memory (if any). This allows UnityEngine.Objects to robustly maintain references to each other. Resolving an Instance ID reference can quickly return the loaded Object represented by the Instance ID. If the target Object is not yet loaded, the File GUID and Local ID can be resolved to the Object's source data, allowing Unity to load the object just-in-time.
At startup, the Instance ID cache is initialized with data for all Objects immediately required by the project (i.e., referenced in built Scenes), as well as all Objects contained in the Resources folder. Additional entries are added to the cache when new assets are imported at runtime(3) and when Objects are loaded from AssetBundles. Instance ID entries are only removed from the cache when an AssetBundle providing access to a specific File GUID and Local ID is unloaded. When this occurs, the mapping between the Instance ID, its File GUID and Local ID are deleted to conserve memory. If the AssetBundle is re-loaded, a new Instance ID will be created for each Object loaded from the re-loaded AssetBundle.
On specific platforms, certain events can force Objects out of memory. For example, graphical Assets can be unloaded from graphics memory on iOS when an app is suspended. If these Objects originated in an AssetBundle that has been unloaded, Unity will be unable to reload the source data for the Objects. Any extant references to these Objects will also be invalid. In the preceding example, the scene may appear to have invisible meshes or magenta textures.
Implementation note: At runtime, the above control flow is not literally accurate. Comparing File GUIDs and Local IDs at runtime would not be sufficiently performant during heavy loading operations. When building a Unity project, the File GUIDs and Local IDs are deterministically mapped into a simpler format. However, the concept remains identical, and thinking in terms of File GUIDs and Local IDs remains a useful analogy during runtime. This is also the reason why Asset File GUIDs cannot be queried at runtime.
It is important to understand that a MonoBehaviour has a reference to a MonoScript, and MonoScripts simply contain the information needed to locate a specific script class. Neither type of Object contains the executable code of script class.
A MonoScript contains three strings: assembly name, class name, and namespace.
While building a project, Unity compiles all the loose script files in the Assets folder into Mono assemblies. C# scripts outside of the Plugins subfolder are placed into Assembly-CSharp.dll. Scripts within the Plugins subfolder are placed into Assembly-CSharp-firstpass.dll, and so on. In addition, Unity 2017.3 also introduces the ability to define custom managed assemblies.
These assemblies, as well as pre-built assembly DLL files, are included in the final build of a Unity application. They are also the assemblies to which a MonoScript refers. Unlike other resources, all assemblies included in a Unity application are loaded on application start-up.
This MonoScript Object is the reason why an AssetBundle (or a Scene or a prefab) does not actually contain executable code in any of the MonoBehaviour Components in the AssetBundle, Scene or prefab. This allows different MonoBehaviours to refer to specific shared classes, even if the MonoBehaviours are in different AssetBundles.
To reduce loading times and manage an application's memory footprint, it's important to understand the resource lifecycle of UnityEngine.Objects. Objects are loaded into/unloaded from memory at specific and defined times.
An Object is loaded automatically when:
The Instance ID mapped to that Object is dereferenced
The Object is currently not loaded into memory
The Object's source data can be located.
Objects can also be explicitly loaded in scripts, either by creating them or by calling a resource-loading API (e.g., AssetBundle.LoadAsset). When an Object is loaded, Unity tries to resolve any references by translating each reference's File GUID and Local ID into an Instance ID. An Object will be loaded on-demand the first time its Instance ID is dereferenced if two criteria are true:
The Instance ID references an Object that is not currently loaded
The Instance ID has a valid File GUID and Local ID registered in the cache
This generally occurs very shortly after the reference itself is loaded and resolved.
If a File GUID and Local ID do not have an Instance ID, or if an Instance ID with an unloaded Object references an invalid File GUID and Local ID, then the reference is preserved but the actual Object will not be loaded. This appears as a "(Missing)" reference in the Unity Editor. In a running application, or in the Scene View, "(Missing)" Objects will be visible in different ways, depending on their types. For example, meshes will appear to be invisible, while textures may appear to be magenta.
Objects are unloaded in three specific scenarios:
Objects are automatically unloaded when unused Asset cleanup occurs. This process is triggered automatically when scenes are changed destructively (i.e. when SceneManager.LoadScene is invoked non-additively), or when a script invokes the Resources.UnloadUnusedAssets API. This process only unloads unreferenced Objects; an Object will only be unloaded if no Mono variable holds a reference to the Object, and there are no other live Objects holding references to the Object. Furthermore, note that anything marked with HideFlags.DontUnloadUnusedAsset and HideFlags.HideAndDontSave will not be unloaded.
Objects sourced from the Resources folder can be explicitly unloaded by invoking the Resources.UnloadAsset API. The Instance ID for these Objects remains valid and will still contain a valid File GUID and LocalID entry. If any Mono variable or other Object holds a reference to an Object that is unloaded with Resources.UnloadAsset, then that Object will be reloaded as soon as any of the live references are dereferenced.
Objects sourced from AssetBundles are automatically and immediately unloaded when invoking the AssetBundle.Unload(true) API. This invalidates the File GUID and Local ID of the Object's Instance ID, and any live references to the unloaded Objects will become "(Missing)" references. From C# scripts, attempting to access methods or properties on an unloaded object will produce a NullReferenceException.
If AssetBundle.Unload(false) is called, live Objects sourced from the unloaded AssetBundle will not be destroyed, but Unity will invalidate the File GUID and Local ID references of their Instance IDs. It will be impossible for Unity to reload these Objects if they are later unloaded from memory and live references to the unloaded Objects remain. (4)
When serializing hierarchies of Unity GameObjects, such as during prefabs serialization, it is important to remember that the entire hierarchy will be fully serialized. That is, every GameObject and Component in the hierarchy will be individually represented in the serialized data. This has interesting impacts on the time required to load and instantiate hierarchies of GameObjects.
When creating any GameObject hierarchy, CPU time is spent in several different ways:
Reading the source data (from storage, from an AssetBundle, from another GameObject, etc.)
Setting up the parent-child relationships between the new Transforms
Instantiating the new GameObjects and Components
Awakening the new GameObjects and Components on the main thread
The latter three time costs are generally invariant regardless of whether the hierarchy is being cloned from an existing hierarchy or is being loaded from storage. However, the time to read the source data increases linearly with the number of Components and GameObjects serialized into the hierarchy, and is also multiplied by the speed of the data source.
On all current platforms, it is considerably faster to read data from elsewhere in memory rather than loading it from a storage device. Further, the performance characteristics of the available storage media vary widely between different platforms. Therefore, when loading prefabs on platforms with slow storage, the time spent reading the prefab's serialized data from storage can rapidly exceed the time spent instantiating the prefab. That is, the cost of the loading operation is bound to storage I/O time.
As mentioned before, when serializing a monolithic prefab, every GameObject and component's data is serialized separately, which may duplicate data. For example, a UI screen with 30 identical elements will have the identical element serialized 30 times, producing a large blob of binary data. At load time, the data for all of the GameObjects and Components on each one of those 30 duplicate elements must be read from disk before being transferred to the newly-instantiated Object. This file reading time is a significant contributor to the overall cost of instantiating large prefabs. Large hierarchies should be instantiated in modular chunks, and then be stitched together at runtime.
Unity 5.4 note: Unity 5.4 altered the representation of transforms in memory. Each root transform's entire child hierarchy is stored in compact, contiguous regions of memory. When instantiating new GameObjects that will be instantly reparented into another hierarchy, consider using the new GameObject.Instantiate overloaded variants which accept a parent argument. Using this overload avoids the allocation of a root transform hierarchy for the new GameObject. In tests, this speeds up the time required for an instantiate operation by about 5-10%.
AA Local ID is unique from all the other Local IDs for the same Asset file. ↩
Internally, this cache is called the PersistentManager. ↩
An example of an Asset created at runtime would be a Texture2D Object created in script, like so:
var myTexture = new Texture2D(1024, 768);↩
The most common case where Objects are removed from memory at runtime without being unloaded occurs when Unity loses control of its graphics context. This may occur when a mobile app is suspended and the app is forced into the background. In this case, the mobile OS usually evicts all graphical resources from GPU memory. When the app returns to the foreground, Unity must reload all needed Textures, Shaders and Meshes to the GPU before scene rendering can resume. ↩