The main task of any AR device is to correctly determine the user’s position relative to objects and determine these objects in space. This exactly creates the feeling that everything that happens is real. The position is calculated using tracking technology. Today we will analyze in detail all the nuances of its work.
What is position tracking?
Positional tracking is a special technology that allows you to determine the position of objects in virtual and augmented reality. It is aimed at an accurate determination of the coordinates and position of a real object (device or person) in the environment using the three coordinates of its location and three angles that specify its orientation in space. Trackers are fundamental elements in every process of creating augmented reality, whether it is a face, hands, different surfaces, etc.
The tracking system in AR/VR essentially copies the positioning and orientation methods that already exist in nature. It’s like the human senses. Indeed, vision helps us determine where we are in relation to other objects and people. Also, AR/VR systems must at any time analyze the position and orientation of the user with tracking and transmit this information to the operating system.
Tracking with markers involves a predefined model of the object, which can be tracked even with one camera. Markers are typically infrared sources, as well as visible markers like QR codes. This is only possible if the user is within the visibility of the marker. This is an obsolete method that is not so widely used.
This method is implemented thanks to optical sensors on a moving object — our AR-helmet. They track the movement of relatively fixed points in the surrounding space. Inside-out was used by Microsoft Hololens, Project Tango, SteamVR Lighthouse.
Outside-in is a method in which an external fixed observer, namely a camera, determines the position of a person by characteristic points. This approach is implemented in Oculus Rift and many Motion Capture systems.
SLAM Simultaneous Localization and Mapping
What is it?
SLAM is an algorithm for VR/AR that allows you to translate real-world data into a virtual environment and vice versa. The Method of Simultaneous Localization and Mapping is used to build a map in an unknown space while monitoring the current location and distance traveled.
The algorithm consists of two parts: mapping an unknown environment based on measurements and determining its location based on existing data. SLAM is convenient for mobile virtual and augmented reality solutions.
SLAM algorithms are used in VR/AR technologies, in robotic vacuum cleaners and other robotics, as well as in cars with an autopilot, for example in TESLA and Volvo.
World maps are divided into Sparse and Dense. Sparse algorithms use a few points for model parameters, marking only key objects. Dense ones try to use all the information from the image. They ultimately give a more detailed result. In the dense map structure, all points are dependent on each other; without this, it is impossible to obtain the correct dense reconstruction.
Both algorithms are useful, but for different purposes. The dense structure is more suitable for lighting, rendering, and occlusion. A sparse map is used for tracking and multiplayer because it has fewer points.
SLAM needs to collect as much environmental information as possible for each frame. To do this, the technology uses three data sources:
- Camera. Several cameras in modern phones give excellent tracking, although upset errors are not excluded. Working together with sensors, they complement each other, reducing the error.
- Depth sensors. They determine the distance between objects for a more accurate map.
- Sensors. A compass, gyroscope, and accelerometers help determine where the AR elements and the person are in space. This happens due to acceleration and deviation.
How does SLAM work?
To calculate the distance closest to reality between a person and objects, SLAM performs many operations during each frame.
Since markers are an obsolete method, SLAM automatically finds reference points on objects. There are three ways to do this:
- Semantic. Used to recognize real objects, such as roads, people, their faces and hands.
- Feature-Based. Emphasizes environmental features, such as furniture corners in a room.
- Direct. A method that gives a dense map of the environment, highlighting each pixel of objects as a feature, and not just their characteristic parts.
Description of points
Each point is given a descriptor or code to determine its position. At the same angles of the same object, these descriptors are repeated. If you approach the subject, the algorithm will determine the code of points familiar to it and calculate its new position relative to them.
The position is determined using data obtained from the main sources: camera, accelerometer, gyroscope, sensors, etc. Given the error of all data sources, the algorithm calculates approximate distances.
We need more accurate indicators. Having learned the location, SLAM begins to determine the approximate distance to the reference points. Such measurements take 30-60 times per second.
Loop and plane detection
In key frames that are stored in the buffer, planes and merge cycles are defined. When we return to the point, we were already being, the cycle closes. SLAM detects and remembers them in order not to re-calculate the map. This saves system memory and makes tracking more stable and accurate.
A plane is a group of points that, according to the readings of the sensors, are perpendicular or parallel to the ground. When a plane is defined, SLAM counts from it. Defining planes is important for arranging 3D objects in space.
Without tracking, the work of AR-applications, robots and even cars with an autopilot is impossible. We examined in detail each step of determining the location of AR devices. The SLAM algorithm is now the best tracking method that is widely used. Of course, this is an example of work, provided that the device correctly determines the location without loss in space.