Unlike virtual reality, in an augmented reality system, users can see 3D objects and the real world at the same time. AR’s goal is to lay virtual models in the real world to improve it. In this case, you need to do this naturally as u can. Occlusion is a method that helps make the AR experience even more realistic for the user. Today we’ll talk about what occlusion is and how it works in applications.
What is occlusion?
Since the system only understands the pixels visible by the camera, it doesn’t interpret the world as people. Occlusion is an example of AR restriction. This means that the device cannot automatically interpret the depth of the world. The term “Occlusion” in AR means that virtual objects can overlap with real-world objects and vice versa. The purpose of occlusion is to preserve the rules of direct visibility when creating AR scenes. Here is an example: if a person stands between the camera of an AR device and, for example, a virtual table, then the person will block it. If a person will come behind the table, it will block the person as a very real object.
Previously, this was impossible, only with the advent of occlusion in the ARKit3 framework, augmented reality became a little closer to the physical world. In terms of improving immersion and interactivity, occlusion provides ample opportunities for innovative AR applications. Although this technology has not yet been brought to ideal, impressive results can be expected in the next couple of years.
Ways to implement occlusion in apps
There are several methods for implementing occlusion in applications. All of them are used in tandem. In fact, creating a realistic occlusion is not so easy, because this is a fairly new feature.
In computer graphics and 3D illustration, the depth map is a way of taking into account the remoteness of image elements in the 3D scene from the user’s point of view. It is one of the solutions to the “visibility problem.” The depth map can be obtained using a snapshot of a special depth camera or calculated from the view from several cameras. At least two cameras are needed since using one gives large errors and low-resolution images.
After receiving the map, you need to calculate all the distances, and after that — apply them to the 3D scene. So it is possible to find out which objects will be blocked. Depth maps will further help in the implementation of more complex methods.
Dense 3D Reconstruction
In the previous article, we talked about creating a dense map of the world for tracking and predicting lighting. Now they are useful for creating occlusion. This requires a detailed reconstruction of the space.
For a detailed 3D object, you need to collect data from several depth maps, ideally in three dimensions. After that, it is enough to unload key frames from SLAM into a separate stream a couple of times per second. Thanks to SLAM, we know the position of the camera and thanks to depth maps, the depth of each pixel.
A multidimensional 3D reconstruction method is also used. Its purpose is to deduce the geometric structure of the scene, thanks to a set of diverse images of the object. Using multiple images, 3D information can be partially restored by solving the pixel matching problem. Simply put, with this method the vertices of a simple object seem to be “attracted” to the corresponding places on the depth maps.
This method is quite popular because SLAM gives us points for building a 3D object and can use them for tracking. The resulting model is covered with triangles, smoothed by filters and is ready for use in space.
This is a scientific and technical discipline that is engaged in determining the shape, size, position and other characteristics of objects from their photo images. In the film industry and video games, this method is used to create three-dimensional spaces and objects. So, it’s worth rendering only the main parts of the space, and the rest is simply scanned in 3D. Photogrammetry also works for textures.
Having an accurate map of the area, the render determines which part of the object is blocked and begins to crop it. The output may result in a noisy and inaccurate model, but all this can be corrected with smoothing.
Occlusion of people allows people and virtual objects to interact in one AR-scene, achieving a realistic immersion effect. We have already said that occlusion has not yet been brought to the ideal. This is especially pronounced on moving objects and, in particular, people. Of course, we want to actively participate and interact with the elements of augmented reality. Therefore, the impossibility of high-quality visualization of human overlapping by objects and vice versa becomes a problem.
Even picking up an item in your hand is quite problematic. Holding the 3D object in your hand, your fingers should lie on top of the object, and not be behind it. This is not at all like a real AR experience, right? 3D reconstruction, in this case, is powerless, and depth maps will not be enough to ensure proper occlusion. To do this, you can use special software for recognizing human postures, for example, PoseNet or Detectron2, but they need their own improvements.
In reality, developers combine all of the above methods: first, they calculate a 3D map, use depth maps, and refine the model using third-party detector programs. In fact, depth cameras with really high resolution are needed so as not to go through so many steps and not come up with solutions to the problem.
Occlusion is now an imperfect technology. It requires great improvements, the skills of developers and equipment. Occlusion still leaves open problems like the effective presence of hands, adequate lighting, and shadows. However, this is a big step towards creating a truly realistic and exciting user experience. Occlusion has great potential for studying and creating new implementation methods.