While positional tracking will allow AR and VR headsets to see the world, computer vision will enable them to understand it.

Computer vision is term encompassing a broad field that deals with how computers can be made to gain a high-level of understanding from digital images or videos. Essentially, it is programming computers to 'see' in the way that the human visual system can. To look at an object and, based on its appearance, location, and surrounding setting, understand exactly what that object is.

Allowing computers to see the world the way that humans do is not an easy task. True computer vision is only possible with AI, machine learning, and enormous sets of data - not to mention a detail capturing camera and a vision processing unit (VPU) to help the machine sort through the image. The computer's camera needs a program to help it sort out the visible world. Currently, four primary methods are being used to accomplish this task: blob detection, scale space, template matching, and edge detection. These methods can be combined to produce a holistic understanding as each has its strengths.

Blob Detection

Blob detection is used to determine various regions in an image. It determines these regions by looking at areas with color and brightness that contrast to their surrounding settings. For example, blob detection would have an easy time spotting a flourishing bush in the middle of a field since the bush would be a different color and vibrancy than the world around it. Blob detection would note that all regions of the bush looked similar and thus must be related, or at the very least similar. If there was a second bush in the field - blob detection would understand the two objects as separate but similar.

blob detection

Scale Space

The scale space theory is built to process the multi-scale reality of image structures. Essentially,  if no previous data is available to determine the appropriate scales for a given data set, then the logical approach for an uncommitted vision system is to show the input data at multiple scales to best understand its properties. The scale space theory breaks an image down into its most basic elements to better understand the relationship between points.

scale space

Template Matching

Template matching is a technique in digital image processing used to find small pieces of an image that match a chosen (template) picture. This technology is already used in facial recognition software. Each face has its own unique features and, after a large set of data, the program can easily determine which face belongs to which person. This is already visible in certain social networks like Facebook.

template matching

Edge Detection

Edge detection's goal is to identify key points in an image where the brightness drastically changes or ceases all together. These points of change are organized into a set of curved line segments called edges. Edge detection is often used to help determine the specific features of an object. Edges properties can usually be characterized as viewpoint dependent or viewpoint independent, depending on if the edges will change with perspective.

edge detection


The Future of Computer Vision

All of these techniques come together to give computers solid programs with which to decipher the world. That said, by themselves they are still incomplete. Machine learning will be crucial, as computers will need to be able to understand which information is the best at determining what an image actually is. An enormous amount of collected data - taken from a large installed base - will also greatly help refine the algorithms and increase their accuracy. This is happening today. Massive companies like Google are debuting computer vision programs that will improve with their user base. Google Lens will start narrow but will still likely process millions of images in its first year. Blippar is also rolling out a selective computer vision program, this one aimed at the automotive industry.

Computer vision is vital for AR growth. It will allow enterprise solutions to more efficiently train their workforce and allow marketers to better target consumers, reducing the invasive nature of advertisements. While not as immediate, the potential is also there for VR. VR headsets will soon have positional tracking, allowing users to freely move in the real world. With computer vision, VR programs could understand which objects could help enhance the experience and which are to be avoided at all costs. This small but notable improvement would be especially useful in the location-based (LBVRE) sector.