Augmented reality demo featured in M. W. Krueger’s book ‘Artificial Reality II,’ 1991.

What you’re looking at above is the state of augmented reality nearly two decades ago. It’s also a clue as to why, today, Magic Leap is reportedly looking for more venture funding after having already raised more than $2.6 billion from Google and other Silicon Valley giants but has little to show for it beyond an expensive AR headset that’s rumored to have unimpressive sales. It’s also a cautionary case study for Apple, which is reportedly planning a launch of its own AR headset line in 2022.

This ’90s demo has a strikingly similar interaction model to Magic Leap’s user interface. But while visually compelling, this approach to interaction will always suffer from the problems associated with ambiguous input from hand gestures. It’s a key challenge that’s continually held AR back as a technology.

The author presenting at Innovem Fest 19 in Spain.

The unacknowledged complexity of gesture control

AR advocates often assume gesture control will be the next iteration in user interfaces since it seems so intuitive and natural to human expression. (And as discussed in my last Modus essay, it has the seeming inevitability of sci-fi.) But hand-gesture models and libraries are not uniform. The ambiguous input produced by humans forces the computer to process far more information than a controller with comparatively limited function, like a touchscreen. This makes the interaction prone to a wide range of errors. The user’s background could be too dark, they could be wearing gloves, or they could have hands smaller than those that the device was tested with. This interaction model also likely requires having to train someone to use gestures they’re not yet familiar with, and not everyone will make the gestures in the same way.

By contrast, physical buttons are incredibly practical. A computer can always interpret the push of a button as a one-to-one interaction. Button-based interfaces are usually colorful and in the right places for your hands. You can quickly pick up the muscle memory to use them regularly. With a button, it doesn’t matter if you’re wearing gloves or if your hands are a certain color or if you’re an adult or a child.

The promise of gesture control technology is that it will significantly improve over time, but in practice, its perceived accuracy basically remains the same. Like Zeno’s paradox of motion, the more our computing power and motion-sensor efficacy improves, the more our expectations for precise gesture recognition also grows. But existing computers can never have an understanding of the full spectrum of edge cases they might encounter in the real world. Even if they could, gesture recognition is cognitively expensive for machines and mostly unnecessary when a simple button would suffice.

HoloLens photo courtesy of Microsoft.

Microsoft’s augmented reality HoloLens interface was released in 2016 to great anticipation, but users and developers quickly realized how difficult it was to actually interact with objects using hand gestures. The specified gesture for interacting with the augmented reality displayed by the headset was a clicking motion with their hands in front of them. But these gestures were not always recognized the first time by the headset due to visual noise such as light or the irregularity inherent in a specific gesture being performed by different people. Frequent HoloLens users even coined the term “smashing butterflies” to describe the act of performing the input multiple times in order to get the computer to understand it.

This problem is not unlike how home automation systems like Alexa often have trouble understanding commands the first time, especially with speakers with accents, mumbling, or background noise thrown into the mix. AR devices like Magic Leap and HoloLens struggle with detecting the intersection between hand movements and objects. Awash in the effluvia of reality, the headset cannot always discern that the user is, say, trying to pick up a block, and it forces them to grab it multiple times. (Perhaps as a response to this frustration, Magic Leap belatedly added physical controllersto its product roadmap and made them part of its Creator Edition.)

Most augmented reality headsets were launched to early adopter enthusiasts and content creators, but even these users quickly found using these devices on a daily basis to be difficult. They’re often heavy and hot, and they obscure your vision. And in the end, there are only so many butterflies that even the most passionate of us can smash.

The videogame-like Vive control for VR. Photo by the author and Sam Mendoza.

This is one core reason why VR has been a relative success compared with AR: Most virtual reality headsets have a one-to-one, button-based user interface in the form of hand controllers. There is no real world to overlay information onto, and physics engines from videogames can be smoothly ported into the world of VR. This shouldn’t be a surprise: VR enables a one-to-one interaction with the virtual world, and we’ve had 30 years of video game development to perfect this interaction.

Here are some other hidden assumptions that hold augmented reality headsets back:

People assume all new tech will replace everything that came before it, but it rarely does

Some Apple executives reportedly think the company’s AR products will eventually supplant the iPhone. But this isn’t usually how technology adoption works. Just because you have the latest choice doesn’t mean it’s the best option for everything. Cash isn’t obsolete simply because credit cards exist. And real reality won’t be taken over entirely by virtual reality. We replace old models of the same fundamental technology with upgraded versions, but when it comes to devices that are categorically different, they become just another option alongside our existing devices and are only adopted if they greatly enhance our existing tech and lifestyle habits. For most of us, an augmented reality headset will be another thing to take care of, fight with, upgrade, and forget to charge.

Which takes me to a related point:

Any product, no matter how compelling, has to be evaluated in the full context that prior technology has already created

VR/AR enthusiasts will point out that smartphones have small displays and limited interaction options and are unable to offer anything like the data immersion of head-mounted displays (HMDs). While this is true, it assumes that this in itself makes HMDs superior to smartphones. What’s missing from that evaluation is not just the incredible convenience of smartphones, which can be used in just about any context, but their social nature. We enjoy our phones with each other, passing them back and forth to share funny videos and other interesting content. An HMD experience threatens to deprive a user of both convenience and that impromptu social interaction.

Social media platforms are already augmented reality

As I noted, the vision for augmented reality has remained more or less unchanged since the 1990s. Since then, the growth of smartphones and social media have unintentionally created an entirely different vision of AR, one where live photos and videos are shared across our networks through the device in our hands and then discussed in the posts’ comment threads — virtual chat rooms sitting on top of our experience of reality.

Facebook might provide merely a two-dimensional interface, but it is our imagination that adds in additional dimensions. For fully immersive interfaces, we must remember that a little technology goes a long way, and our brains can fill in the rest. Simply put: We already have quite a lot of augmented reality in our lives, just not the kind that was originally conceived.

Projected interface created by teamLab Borderless for MORI Building DIGITAL ART MUSEUM

In fact, successful augmented reality interfaces (broadly speaking) have been around for years. They’re democratic, affordable to use, and can provide higher-resolution interaction. And best of all, you don’t need to pay thousands of dollars for them, recharge them, or wear them on your head. They’re simple projected interfaces, and while they’re less sexy than headsets, they are already used in airports, shopping malls, and museums. They can be calibrated to show media content, art, or directions, and they can be used by anyone.

Consider all this in relationship to Apple’s much-rumored move toward launching an AR headset line in 2022. If the company asked my advice, I’d recommend they forbid any further internal discussion about AR replacing their beloved iPhone. If they’re smart, Apple will instead learn from past AR mistakes and start with a minimal device that has a very narrow but powerful set of features. Much the way the simple iPod preceded the iPhone, Apple should start small, very slowly getting people to adopt a whole new way of life. And keep it tightly integrated with the iPhone.

We need to remember that technology is cyclical. We see the same solutions proposed again and again, often with the same results. Instead, if we work within the limitations of technology we’ve already embraced, we have a much better chance of doing things well.

This post was originally posted to Modus on December 9th.