Will AR be the reality?
Augmented Reality (AR) is forecast to become a huge revenue stream for technology companies. But while we await a killer app (beyond Pokémon Go if indeed we really consider that to be AR!) do we have the technology today to achieve that killer app? AR encompasses several aspects of technology, from vision processing through to the graphical overlay of information. This article will look at some components of technology behind AR, and how the solutions today will need to evolve to meet the expectations and requirements in 2025.
AR has been around for several years. In this article we won’t go into the definition of ‘AR’ – it could be mixed reality, extended reality, or any form of “reality” that has been augmented with overlaid graphics. Consumers in the early 90s got used to a form of AR when watching NFL games, and enjoying the artificial first and ten lines overlaid on the screen. It’s been a slow evolution from there to what consumers typically think of AR today, for example Google Glass and Pokémon GO.
The introduction of Google Glass in 2012 gave consumers a glimpse of what might be considered to be the start of the head mounted AR revolution. Whilst it was never a runaway success, it certainly planted plenty of questions in the minds of consumers. What was the form factor like? What applications could it be used for? How was the battery life? There were also strong feelings conveyed surrounding privacy concerns and many of these questions still remain unanswered today.
What changed for AR?
In 2016, we saw the launch of Pokémon GO. This is probably what can be best referred to as “Dumb” AR, with a very simple overlay of Pokémon creatures on the camera input. It had no understanding of the scene before it. The characters had no scale relative to the scene. They had no understanding of what the lighting was and so made no attempt to model shadows or reflections. There was also no understanding of the scene content, so creatures could be standing on water, on top of a car or basically anywhere in the scene.
For AR to really succeed and to drive many of the real world applications that can be conceived of today, such as assisting a surgeon in the operating theatre, architecture visualisation, in-car HUD (head-up display) and many more besides, we need a step change in the capability of the System on Chips (SoCs) created to drive such applications.
|Application||Scene position understanding||Object recognition||Ray traced graphics|
|In car HUD (head-up display)||Yes||Yes||No|
|Medical operating aid||Yes||Yes||No|
|Gaming “Pokémon GO” and beyond||Yes||Yes||Yes|
We need the AR system to have the capability to analyse a scene, and map it out in terms of the relative positions of objects. We need to understand what objects are in the scene. And finally, before placing an artificial (“augmented”) object in the scene, we need to understand the scene lighting.
Determining the what and where of objects in a scene
To determine the position of objects within a scene, one option is using traditional methods such as LIDAR (Light Detection and Ranging) systems which measure the distance to a target by illuminating that target with a laser light. Alternatively we can turn to vision processing techniques, utilising the input from a single camera. Using the Graphics Processing Unit (GPU) such as Imagination’s PowerVR XT series of IP cores which are likely to be found in many SoCs for AR, we can use techniques such as sparse feature point matching to enable a 3D reconstruction of the scene.
Not only do we desire to understand the relative position of objects within a scene, but for the immersive AR experience we are seeking, we need to appreciate what the actual objects are. As an example, if I’m designing an AR system for a medical application, I want the scene analysis to understand what I am looking at within an operating theatre. Perhaps the AR will highlight a suspicious looking patch on an organ as potentially cancerous. Rapid advancements have been made in the field of scene comprehension, a lot of this down to deep learning, or neural network based systems. As with the scene localisation, the neural networks (such as CNNs or convolutional neural networks), can be run on PowerVR GPUs, or as requirements increase, can use dedicated hardware. Today a CNN can be trained to recognise anything the application requires, assuming sufficient training data can be provided.
The importance of lighting in AR
Lastly, if we are going to place graphical objects realistically in a real world scene, we need to try to understand the lighting of the scene. Objects placed in a scene will not look realistic unless we can accurately model the shadows and reflections of the objects. The traditional method to scene lighting understanding is to place a 3D marker and then measure the shadow cast by the known object. At Imagination, we are also actively researching methods of achieving scene lighting understanding without the need for markers in the scene, since such a method is frequently impractical. We are using the GPU compute capabilities of PowerVR GPUs for processing inputs from 360 degree cameras placed at the headset to understand the lighting of the scene.
Once we have the scene understanding, mapping and lighting, we need to generate the graphics to overlay in the scene. Traditional raster based techniques can create basic overlaid images. For some applications this may be enough. However, for a truly realistic, life-like scene, it is likely that we will need to use ray tracing. Ray tracing technology basically models the physics of light to ensure each pixel in a scene will be ‘coloured’ appropriately with realistic reflections, refractions and subtle lighting effects, resulting in near photorealism. This technology has been used for decades in high-end movies and TV shows to deliver astonishing computer generated imaging (CGI) effects. At Imagination we are delivering this technology within a previously unheard of power budget, making it a realistic solution for head mounted displays.
Imagine a shopping app which shows how new furniture would look placed in your home. The objective needs to be to model it with such accuracy that the AR view of the scene is as close to what the final reality would be. The use of ray tracing can ensure that shadows can be accurately modelled, lens distortions easily corrected and reflections shown precisely.
Will AR be the reality?
As we embark on the next generation of AR devices, it is clear that the processing demands placed on the SoCs to create these compelling applications will be enormous. The choice of semiconductor IP technologies for these SoCs – such as CPUs and GPUs – plays a critical role in ensuring the highest performance /mW and the highest performance/mm2 is achieved. Imagination’s IP cores have been designed from the outset with these two goals in mind. For the industry to deliver truly life changing AR applications, we will need new SoCs specifically designed for AR. Only then can we expect to see the required step change in performance from the reuse of non-dedicated SoCs (for example mobile application processors) that we see today.
Chris Longstaff is Senior Director of Product & Technology Marketing PowerVR, Imagination Technologies. He has worked for semiconductor companies such as Leitch (Harris), C-Cube/LSI Logic, and ATI (AMD/Broadcom), in roles including hardware design, FAE & business development.