I’ve often wondered why it is that I’ve had the good fortune to spend the last 20 years doing such interesting and influential work. Part of it is skill, hard work, and passion, and a good part is luck – in other places or times, matters would have worked out very differently. (My optimization skills would certainly have been less valuable if I was working in the potato fields of Eastern Europe alongside my great-great-grandparents.) But I’ve recently come to understand that there’s been another, more subtle, factor at work.
I became aware of this additional influence when my father remarked that my iPad seemed like magic. I understand why he feels that way, but to me, it doesn’t seem like magic at all, it’s just a convergence of technologies that has seemed obvious for decades – it was only a matter of when.
When I stepped back, though, I realized that he was right. The iPad is wildly futuristic technology – when I was growing up, the idea of a personal computer, let alone one you could carry around with you and use to browse a worldwide database, would have ranked up there with personal helicopters on the improbability scale. In fact, it would have seemed more improbable. So why do I not only accept it but expect it?
I think it’s because I read science fiction endlessly when I was growing up. SF conditioned me for a future full of disruptive technology, which happens to be the future I grew up into. Even though the details of the future that actually happened differed considerably from what SF anticipated, the key was that SF gave me a world view that was ready for personal computers and 3D graphics and smartphones and the Internet.
Augmented reality (AR) is far more wildly futuristic than the iPad, and again, it doesn’t seem like magic or a pipe dream to me, just a set of technologies that are coming together right about now. I’m sure that one day we’ll all be walking around with AR glasses on (or AR contacts, or direct neural connections); it’s the timeframe I’m not sure about. What I’m spending my time on is figuring out the timeframe for those technologies, looking at how they might be encouraged and guided, and figuring out what to do with them once they do come together. And once again, I believe I’m able to think about AR in a pragmatic, matter-of-fact way because of SF. In this case, though, it’s both a blessing and a curse, because of the expectations SF has raised for AR – expectations that are unrealistic over the next few years.
Anyone who reads SF knows how AR should work. Vernor Vinge’s novel Rainbow’s End is a good example; AR is generated by lasers in contact lenses, which produce visual results that indistinguishably intermix with and replace elements of the real world, and people in the same belief circle see a shared virtual reality superimposed on the real world. Karl Schroeder’s short story “To Hie from Far Cilenia” is another example; people who belong to a certain group see Victorian gas lamps in place of normal lights, Victorian garb on other members, and so on. The wearable team at Valve calls this “hard AR,” as contrasted with “soft AR,” which covers AR in which the mixing of real and virtual is noticeably imperfect. Hard AR is tremendously compelling, and will someday be the end state and apex of AR.
But it’s not going to happen any time soon.
Leave aside the issues associated with tracking objects in the real world in order to know how to virtually modify and interact with them. Leave aside, too, the issues associated with tracking, processing, and rendering fast enough so that virtual objects stay glued in place relative to the real world. Forget about the fact that you can’t light and shadow virtual objects correctly unless you know the location and orientation of every real light source and object that affects the scene, which can’t be fully derived from head-mounted sensors. Pay no attention to the challenges of having a wide enough AR field of view so that it doesn’t seem like you’re looking through a porthole, of having a wide enough brightness range so that virtual images look right both at the beach and in a coal mine, of antialiasing virtual edges into the real world, and of doing all of the above with a hardware package that’s stylish enough to wear in public, ergonomic enough to wear all the time, and capable of running all day without a recharge. No, ignore all that, because it’s at least possible to imagine how they’d be solved, however challenging the engineering might be.
Fix all that, and the problem remains: how do you draw black?
Before I explain what that means, I need to discuss the likely nature of the first wave (and probably quite a few more waves) of AR glasses.
There are two possible types of AR glasses. One type, which I’ll call “video-passthrough,” uses virtual reality (VR) glasses that are opaque, with forward-facing cameras on the front of the glasses that provide video that is then displayed on the glasses. This has the advantage of simplifying the display hardware, which doesn’t have to be transparent to photons from the real world, and of making it easy to intermix virtual and real images, since both are digitized. Unfortunately, compared to reality video-passthrough has low resolution, low dynamic intensity, and a low field of view, all of which result in a less satisfactory and often more tiring experience. Worse, because there is lag between head motion and the update of the image of the world on the screen (due to the time it takes for the image to be captured by the camera, transmitted for processing, processed, and displayed), it tends to induce simulator sickness. Worse still, the eye is no longer able to focus normally on different parts of a real-world scene, since focus is controlled by the camera, which leads to a variety of problems. Finally, it’s impossible to see the eyes of anyone wearing such glasses, which is a major impediment to social interaction. So, for many reasons, video-passthrough AR has not been successful in the consumer space, and seems unlikely to be so any time soon.
The other sort of AR is “see-through.” In this version, the glasses are optically transparent; they may reduce ambient light to some degree, but they don’t block it or warp it. When no virtual display is being drawn, it’s like wearing a pair of thicker, heavier normal glasses, or perhaps sunglasses, depending on the amount of darkening. When there is a virtual display, it’s overlaid on the real world, but the real world is still visible as you’d see it normally, just with the addition of the virtual pixels (which are translucent when lit) on top of the real view. This has the huge virtue of not compromising real-world vision, which is, after all, what you’ll use most of the time even once AR is successful. Crossing a street would be an iffy proposition using video-passthrough AR, but would be no problem with see-through AR, so it’s reasonable to imagine people could wear see-through AR glasses all day. Best of all, simulator sickness doesn’t seem to be a problem with see-through AR, presumably because your vision is anchored to the real world just as it normally is.
These advantages, along with recent advances in technologies such as waveguides and picoprojectors that are making it possible to build consumer-priced, ergonomic see-through AR glasses, make see-through by far the more promising of the two technologies for AR right now, and that’s where R&D efforts are being concentrated throughout the industry. Companies both large and small have come up with a surprisingly large number of different ways to do see-through AR, and there’s a race on to see who can come out with the first good-enough see-through AR glasses at a consumer price. So it’s a sure thing that the first wave of AR glasses will be see-through.
That’s not to say that there aren’t disadvantages to see-through AR, just to say that they’re outweighed by the advantages. For one thing, because there’s always a delay in generating virtual images, due to tracking, processing, and scan-out times, it’s very difficult to get virtual and real images to register closely enough so the eye doesn’t notice. For example, suppose you have a real Coke can that you want to turn into an AR Pepsi can by drawing a Pepsi logo over the Coke logo. If it takes dozens of milliseconds to redraw the Pepsi logo, every time you rotate your head the effect will be that the Pepsi logo will appear to shift a few degrees relative to the can, and part of the Coke logo will become visible; then the Pepsi logo will snap back to the right place when you stop moving. This is clearly not good enough for hard AR, because it will be obvious that the Pepsi logo isn’t real; it will seem as if you have a decal loosely plastered over the real world, and the illusion will break down.
There’s a worse problem, though – with see-through AR, there’s actually no way to completely replace the Coke logo with the Pepsi logo.
The way see-through AR works is by additive blending; each virtual pixel is added to the real world “pixel” it overlays. For example, given a real pixel of 0x0000FF (blue) and a virtual pixel of 0x00FF00 (green), the color the viewer sees will be 0x0000FF + 0x00FF00 = 0x00FFFF (cyan). This means that while a virtual pixel can be bright enough to be the dominant color the viewer sees, it can’t completely replace the real world; the real-world photons always come through, regardless of the color of the virtual pixel. That means that the Coke logo would show through the Pepsi logo, as if the Pepsi logo were translucent.
The simplest way to understand this is to observe that when the virtual color black is drawn, it doesn’t show up as black to the viewer; it shows up as transparent, because the real world is unchanged when viewed through a black virtual pixel. For example, suppose the real-world “pixel” (that is, the area of the real world that is overlaid by the virtual pixel in the viewer’s perception) has a color equivalent to 0x008000 (a medium green). Then if the virtual pixel has value 0x000000 (black), the color seen by the viewer will be 0x008000 + 0x000000 = 0x008000 (remember, the virtual pixel gets added to the color of the real-world “pixel”); this is the real-world color, unmodified. So you can’t draw a black virtual background for something, unless you’re in a dark room.
The implications are much broader than simply not being able to draw black. Given additive blending, there’s no way to darken real pixels even the slightest bit. That means that there’s no way to put virtual shadows on real surfaces. Moreover, if a virtual blue pixel happens to be in front of a real green “pixel,” the resulting pixel will be cyan, but if it’s in front of a real red “pixel,” the resulting pixel will be purple. This means that the range of colors it’s possible to make appear at a given pixel is at the mercy of what that pixel happens to be overlaying in the real world, and will vary as the glasses move.
None of this means that useful virtual images can’t be displayed; what it means is that the ghosts in “Ghostbusters” will work just fine, while virtual objects that seamlessly mix with and replace real objects won’t. In other words, hard AR isn’t happening any time soon.
“But wait,” you say (as I did when I realized the problem), “you can just put an LCD screen with the same resolution on the outside of the glasses, and use it to block real-world pixels however you like.” That’s a clever idea, but it doesn’t work. You can’t focus on an LCD screen an inch away (and you wouldn’t want to, anyway, since everything interesting in the real world is more than an inch away), so a pixel at that distance would show up as a translucent blob several degrees across, just as a speck of dirt on your glasses shows up as a blurry circle, not a sharp point. It’s true that you can black out an area of the real world by occluding many pixels, but that black area will have a wide, fuzzy border trailing off around its edges. That could well be useful for improving contrast in specific regions of the screen (behind HUD elements, for example), but it’s of no use when trying to stencil a virtual object into the real world so it appears to fit seamlessly.
Of course, there could be a technological breakthrough that solves this problem and allows true per-pixel darkening (and, in the interest of completeness, I should note that there is in fact existing technology that does per-pixel opaquing, but the approach used is far too bulky to be interesting for consumer glasses). In fact, I actually expect that to happen at some point, because per-pixel darkening would be such a key differentiator as AR adoption ramps up that a lot of R&D will be applied to the problem. But so far nothing of the sort has surfaced in the AR industry or literature, and unless and until it does, hard AR, in the SF sense that we all know and love, can’t happen, except in near-darkness.
That doesn’t mean AR is off the table, just that for a while yet it’ll be soft AR, based on additive blending and area darkening with trailing edges. Again, think translucent like “Ghostbusters.” High-intensity virtual images with no dark areas will also work, especially with the help of regional or global darkening – they just won’t look like part of the real world.
Eventually we’ll get to SF-quality hard AR, but it’ll take a while. I’d be surprised if it was sooner than five years, and it could easily be more than ten before it makes it into consumer products. That’s fine; there are tons of interesting things to do and plenty of technical challenges to figure out just with soft AR. I wrote one of the first PC games with bitmapped graphics in 1982, and 30 years later we’re still refining the state of the art; a few years or even a decade is just part of the maturing process for a new technology. So sit back and enjoy the show as AR grows, piece by piece, into truly seamless augmented reality over the years. It won’t be a straight shot to Rainbow’s End, but we’ll get there – and I have no doubt that it’ll be a fun ride all along the way.