Scene Hierarchies for VR

Checked with version: 2017.4


Difficulty: Beginner

Setting your scene hierarchy correctly is an important step in XR development. Suboptimal configurations may lead to incorrect transformations, or add unnecessary computations, causing degradations in performance and fidelity. This guide will describe the optimal hierarchy setup for XR projects in Unity.

We’ll begin with a small overview covering some relevant terms used in this guide.

Session Space

XR Devices report pose data in what we call "session" space. This is given a special term because on some devices, a new origin is provided each time a XR session is started. The data is typically relative either to the device (known as a Device Tracking Origin), or relative to the floor (known as a Floor Tracking Origin).

Specifically, session space refers a Cartesian space in which 1 unit of magnitude is equivalent to 1 meter in the real world.

Tracking Origin

The point in session space that has a position at the origin (0, 0, 0), and no rotation.

Device Tracking Origin

This denotes that the origin of session space is located at the device’s position at some point in the past. This may be during device power-on, during the start of or resume of a session, or some other point in the device’s history.

This is typically used for seated, standing, or stationary scenarios, and also commonly found in mobile tracking scenarios. For example, HoloLens uses its power-on device location as its Device Tracking Origin.

Floor Tracking Origin

This denotes that the origin of session space is located on the "floor" of the tracked area. The reported device position will include a vertical offset from the floor plane to the current location of the device.

This is found in some roomscale scenarios. OpenVR, for example, uses the center of the play area as its Floor Tracking Origin.

XR Rig Representation

As developers, we want to be able to represent a collection of related XR devices (e.g., controllers and/or a HMD) as a single unit that can be placed, rotated, and moved in the world in a way that will make sense for users. There are multiple ways to solve this problem within Unity, but the way we are recommending is to use the following GameObject hierarchy:


The "XR Rig" node setup is the simplest way to configure your hierarchy so that all related devices will correctly transform from session space to Unity world space.

In this hierarchy, the "Main Camera", “Left Hand” and “Right Hand” nodes all use a Tracked Pose Driver to track positions. The Tracked pose drivers attached to all of these components do not use a reference transform as the object hierarchy explicitly provides these transforms.

Floor Offset

The "Floor Offset" GameObject in the hierarchy is used to correctly handle the vertical difference between the two different tracking frames that devices operate in.

XR Devices which operate in "Floor Tracking Origin" mode will deliver data that already contains the correct height of the XR Device from the floor plane, so the transform for this GameObject will effectively be an identity matrix.

However, when using an XR Device that operates in "Device Tracking Origin" mode, this GameObject will need to have its transform contain the vertical offset which acts as the “height” of the device from the ground plane.

Setting the Tracking Mode

Most mobile VR headsets generate tracking data that is relative to the position of the device at the start, or resumption, of the VR session. We call this a "Device Tracking Origin" as the origin is relative to some historical device position. The easiest way to think about this is that the device will generate tracking data relative to its position.

The following image shows that the location of the camera (or our user’s head) is at almost the same location as the origin of session space, indicated by the transform. This is because in Device Tracking Origin modes, the data being reported by the device contains only its position relative to the session space origin, and not any floor or user height information. Therefore, we must instead raise the location of the camera to some predetermined user height value using the Floor Offset GameObject transform.


Floor Tracking Origin

In OpenVR, when using a Roomscale Tracking mode, the data provided by XR Devices is relative to the center of the demarcated play area. This is known as a Floor Tracking Origin, as the origin of the session space used by the device in this case is effectively on the users "floor".

In this mode, the tracking data provided by the devices automatically contains the effective height of the device off a previously indicated floor plane. This means, for example, that we can couple the camera directly to the Unity world space translated position of the HMD without needing to add an additional offset to account for the user height. This is shown in the following image.