On this page
- About the game and how it works (in brief)
- The big space-saver: a custom prefab "stream"
- Mesh flipbooks to animate characters
- A tightly packed navigation system
- Architect update loops like a particle system
- Be kind to the cache
- Don’t jump straight into threading
- How to get cheap terrains that look good on mobile
- Re-using the terrain mesh for water
- A faster lighting solution
- Multi-resolution rendering
- A few tips for a custom build for Asset Bundles
When it pays to be cheap: Tips for big games on low-end mobileLast updated: January 2019
What you will get from this page: great tips on how to get big mobile games to work on low-end hardware. The tips are from Jason Booth, who has been making games for over 25 years. Currently, he’s a graphics and client architect at Disruptor Beam (and an Asset Store publisher).
These optimizations were used to develop The Walking Dead: March to War. This ambitious game needed to run smoothly on OpenGLES 2.0 devices with 1GB of RAM, which is approximately 40% of all Android devices. Filled with detailed art, full day/night and weather cycles, and thousands of objects, it couldn’t be bigger than 100MB. Find out how Jason and his team did it.
About the game
The Walking Dead: March to War is an intense multiplayer mobile strategy game set in the world of Robert Kirkman's long-running comics series, The Walking Dead. The game can be played by up to 50,000 concurrent players. Situated in the Virginia and Washington, D.C. area, it’s a big, detailed and freely scrollable world, filled with thousands of "walkers" or zombies, ripe for blasting.
The world is made up of regions 32x64 in size for loading and rendering, while gameplay tiles are 2048x1024. At any given time, 4-6 regions are being rendered. The maps in the game are created with a mix of hand-placed and procedural content.
How it all works in a (very brief) nutshell: The hand-placed content and procedural systems get compiled in a Map Compiler. For each 32x64 block they generate procedural content and then Boolean that with the hand-placed data. The map compiler also compiles terrain data, as well as the navigation information for their AI systems. The complete result is then saved in a prefab stream format, which is the studio’s own version of a prefab system. Finally, everything is streamed to players via Asset Bundles.
The big space-saver: A custom prefab "stream"
Disruptor Beam’s custom prefab solution, or “stream” is used at both design- and runtime, and supports nesting. It stores arrays of InstanceEntry, and is used for construction and delivery.
At edit time, the extra code in the prefab stream allows for actions such as randomization. For a particular block the prefab stream can randomize a number of elements and create subtle variations on everything. Items, such as houses and vehicles are built up from smaller blocks that can be mixed and matched to produce variations on the same thing.
At compilation time, the prefab stream is broken into three levels of detail: objects can be assigned to high, medium or low layers so that some objects can be removed on low-end devices. And when it’s all compiled down, it’s flattened, so that there’s no hierarchy when it’s delivered to the user.
At runtime, the prefab stream functions similar to a low-level draw list in a graphics engine (indicating what meshes to draw with what materials, and where), with a list of Transform locations, that is, locations for where you’d want to place a particular prefab.
The prefab stream packs the Transform down; since the block size is already known (a 160 meter area), it can pack the Transform into seven “shorts”: three for position, three for rotation, and one for scale.
This results in a huge reduction of the size of scene data that is streamed to users. If a block is saved as a scene, it’s about 3.6 MB in size; as a prefab, it’s around 2.1 MB in size, and, as a prefab stream, only 41KB.
Mesh flipbooks to animate characters
Characters in The Walking Dead: March to War are not seen up close; they are only 60 pixels tall with a limited animation set. The team had to batch the characters because draw calls don’t work well on low-end devices. A normal procedure would be to put a character into a texture, sample in the vertex shader and do all the animation in the shader. But OpenGLES 2.0 does not support sampling in the vertex shader, so they needed another solution.
Instead, they converted the animations into mesh flipbooks. At load time, they take each animation and bake out all the frames as unique meshes wherein one frame=one mesh. Then, they swap the mesh out on each character to animate it. However, this method requires a lot of memory.
Architect update loops like a particle system
To write really fast and tight update loops, avoid using Update()., virtual functions, or object-oriented overhead. In fact, you want to inline a lot of functions and really strip it down. It’s similar to writing/architecting your code like a particle system. In a particle system you have an array of particles and you zip through the array, updating everything at once. Generally, that’s how you should shape your code if you’re going to have thousands of objects, such as zombies.
Be kind to the cache
What CPUs do really well is zip through memory in a linear order and process things. So, set up a big block of things that are all nicely arranged, run the same routine on them all, similar to what shaders do: they take a block of pixels and process them. If you design your data to keep its size as small as possible, it will all cache efficiently, keeping CPU processing times as fast as you need them. In the case of Walking Dead, raycasting on a 64x64 bit grid of data is practically free, it’s all in the cache. And, the entire region of navigation data is less than 1KB.
Don’t jump straight into threading
Slow code is slow code, whether it’s running through multiple processors or just one. If you don’t ensure that your data structures are as compact as possible, you’re simply copying inefficient structures to multiple processors. Consider amortization, as this is often easier than threading. However, if you do need to thread your code, it will be much easier to do so once your code is efficient and possible to amortize.
How to get cheap terrains that look good on mobile
To get nice-looking terrains for their game, the team used the yCbCR color space in JPEG. In JPEG compression format, this color space provides a high-resolution luminance value, and low-resolution Chroma (CbCR) values.
They packed four luminance textures into one texture so that they had four general terrain types that could be used for a variety of terrain, based on what color they are. For example, brown makes them look like dirt, green like grass, and so on.
Then they added a splat weight mask via the vertex RGBA color channels (RGBA). They used the luminance channels for height mapping and height-based blending, which resulted in nice-looking transitions. Finally, they applied the low-res Chroma layer to the luminance height map, to get good-looking terrains.
The overall result was a 1024x1024 texture for the luminance data, and 3.1 MB of data for the entire world, for all of the splat mapping.
Re-using the terrain mesh for water
Their water mesh is a clone of the terrain mesh. They didn’t have to use a depth map; instead they ended up with a “free” depth buffer by moving the vertex to the water height (vertexHeight=waterHeight). The difference that you end up with becomes the depth. This resulted in a huge savings. They had one texture sample/terrain draw call, and one texture sample/water draw call.
A fast lighting solution
Rendering full PBR was too expensive for mobile, so they used a method called Spherical Lighting Approximation or SLA. With this method, they rendered the full PBR lighting environment to spherical mapped textures for diffuse and specular lighting results. For subsequent mip levels, they stored ½ the smoothness values as prior mip level, and used log-space encoding to enable HDR up to 4x intensity.
Then at runtime, instead of doing lighting calculations, they can choose a mip level and look it up in the texture. Lighting textures are sampled at the appropriate smoothness level using text2Dlod.
The advantage of this method is that they can employ any number of lights, because they only have to light the sphere and render it. This makes it possible to have an arbitrarily complex lighting environment, with an infinite number of lights, sky boxes, and so on. It’s a customized full PBR workflow that’s 20% faster than the standard PBR workflow.
For performant shadows, the team came up with a solution wherein they render the shadows top-down, with height above the shadow plane. This effectively creates a distance field of height, and supports blurring based on height for soft shadows, as well as self-shadowing.
Additionally, since they store the height of objects, they can clip the values close to the shadow plane and blur them, which creates a reasonable approximation of ambient occlusion.
By using multi-resolution rendering, they keep their UI at a high resolution while allowing the 3D world the benefit from a reduced fill rate. They use DPI for consistent results across a wide range of device size, with resolution being set based on the target DPI (from 200 DPI to 400 DPI).
A few tips for a custom build for Asset Bundles
- Only mark data which will be loaded from code into a bundle.
- Parse dependencies:
- Large assets (Textures/Sounds) get put into their own bundle
- Shared assets get put into their own bundle
- Unique assets get included in parent bundles
- Bundle named by path
- Do no use variants
- Instead, build unique manifest for each variant level
- P4 checkout, pre-processes all textures and sounds, build bundles, P4 revert