Over the years, I’ve had the good fortune to meet and work with a lot of remarkable people and do more than my share of interesting things. There are some things that still haven’t happened, though. I haven’t written a compiler. I haven’t written a 3D game from scratch on my own or figured out how to do anything interesting with cellular automata. I’ve worked with Gates and Newell and met Bentley and Akeley and Neal Stephenson, but I haven’t met Knuth or Page or Brin or Miyamoto, and now I’ll never get to meet Iain Banks.
And then there’s this: I’ve been waiting 20 years for someone to write a great book about a project I worked on. A book I’ll read and say, “Yes! That’s exactly how it was!” A book that I can pick up when I’m 80, whenever I want to remember what it was like to help build the future.
Hasn’t happened yet.
You’d think it would have by now, considering that I’ve worked on some significant stuff and appeared in no less than four books, but Tracy Kidder-class writers seem to be thin on the ground. Any of the four books could have been great – the material was certainly there – but each fell well short, for a couple of reasons.
First, there were too many significant inaccuracies and omissions for my taste. Maybe someday I’ll take the time to set the record straight, but as just one example, Laura Fryer, the indispensable, hyper-competent complement to Seamus Blackley and a person without whom the original Xbox would not have shipped successfully, simply vanished in Opening the Xbox. That’s not unusual – writers of tech history have limited space to work with and have to choose who to feature and what story they want to tell – but leaving out Laura meant leaving out a big chunk of the history of Xbox as I experienced it.
That touches on the other problem that all four books had to one degree or another: they failed to capture what it felt like to be part of an industry-changing project. That’s a real loss, because being part of a project like Windows NT or Quake is a remarkable experience, one I badly miss whenever I’m working on something more mundane or a project that doesn’t turn out as I had hoped.
Happily, I’m becoming steadily more confident that my current project, VR, is going to be one of the game-changers. That opinion was recently bolstered by the experience of wearing a relatively inexpensive prototype head-mounted display that is possibly the best VR hardware ever made, probably good enough to catapult VR into widespread usage, given the right software. Exciting times indeed, and I hope someday soon there’s a VR breakthrough into wide usage – along with a book about it that fully conveys that excitement.
Which isn’t to say that everything about VR has been figured out, not by a long shot; there’s certainly plenty left to work out with tracking, for example, not to mention input, optics, and software. And, of course, there’s always the most prominent issue, VR displays. In particular, the last two posts discussed the perceptual problems that can result from color-sequential and full-persistence displays, respectively; this post will describe how to fix the problems of full persistence, then look at the new problems that opens up.
Last time, we saw how eye movement relative to a head-mounted display can produce a perceptual effect called judder, a mix of smearing and strobing that can significantly reduce visual quality. The straightforward way to reduce judder is to make displays more like reality, and the obvious way to do that is to increase frame rate.
Here’s the space-time diagram from last time for an image that’s being tracked by the eye on a head-mounted display, producing judder:
And here’s the diagram for a real object being tracked by the eye:
(In both cases, remember that both the eye and object/image are moving relative to the display, but they are not moving relative to each other.)
If we double the frame rate, we get this:
which is significantly closer to the diagram for the real object. Taking that to the limit, if we could make the frame rate infinite, we would get exactly the same diagram as for the real object. Unfortunately, an infinite frame rate is not an option, but somewhere between 60 Hz and infinity, there must be a frame rate that’s good enough so that the eye can’t tell the difference. The question is, what is that frame rate?
There’s no one answer to that question; it depends on the scene content, resolution, FOV, pixel fill, display type, speed of eye motion, and characteristics of the eye. I can tell you, though, that 100 Hz is nowhere near enough. 200 Hz would be a significant improvement but still not enough; the sweet spot for 1080p at 90 degrees FOV is probably somewhere between 300 and 1000 Hz, although higher frame rates would be required to hit the sweet spot at higher resolutions. A 1000 Hz display would very likely look great, and would also almost certainly reduce or eliminate a number of other HMD problems, possibly including motion sickness, because it would interact with the visual system in a way that mimics reality much more closely than existing displays. I have no way of knowing any of that for sure, though, since I’ve never seen a 1000 Hz head-mounted display myself, and don’t ever expect to.
And there’s the rub – there are no existing consumer displays capable of anywhere near the required refresh rates, and no existing consumer data links that can transfer the amount of video data that would be required. There’s no current reason to build such a display or link, and even if there were, rendering at that rate would require such a huge reduction in scene complexity that net visual quality would not be impressive – the lack of judder would be great, but 2005-level graphics would undo a lot of that advantage. (That’s not to say 2005-level graphics couldn’t be adequate for VR – after all, they were good enough for Half-Life 2 – but they would be clearly inferior to PC and console graphics; also, really good VR is going to require a lot more resolution than 1080p, and it’s a moot point anyway because there’s no prospect of consumer displays that can handle anything like 1000 Hz.)
So increased refresh rate is a perfect solution to judder and other problems – except that it’s completely impractical, at least in the near future. So it’s on to Plan B, which is not a perfect solution, but is at least feasible. Before we can discuss that, though, we need to touch on persistence.
Judder is an artifact of persistence – that is, of the fact that during each frame pixels remain illuminated for considerable periods of time.
Full persistence is when pixels are lit for the entire frame. This is the case with many OLED and LCD displays, although it is by no means required for either technology. Here’s the space-time diagram for a full-persistence display, for the case where the eye is fixated straight ahead while a virtual image is moving relative to the eye:
Here’s half-persistence, where pixels remain lit for half a frame:
And here’s zero-persistence, where pixels are lit for only a tiny fraction of each frame, but at very high intensity to compensate for the short duration. Scanning laser displays are effectively zero-persistence.
The diagrams above are for the case where the eye is fixated while the virtual image moves. That’s not the key judder case, though; the key case is when the eye is moving relative to the display. Here’s the diagram for that on a full-persistence display again:
As the diagram illustrates, the smear part of judder results from each pixel moving across the retina during the time it’s lit, due to eye motion relative to the display. It’s actually not the fraction of a frame for which pixels remain lit that determines the extent of the smearing, it’s the absolute time for which pixels are illuminated, because that (times eye speed) is what determines how long the smears on the retina are. At 1000 Hz, full persistence is only 1 ms, short enough to eliminate judder in most cases – and while 1000 Hz isn’t practical, that observation leads us in the direction of the second, more practical solution to judder: low persistence.
Here’s the same scenario as the last diagram – the eye moving relative to the display – but with a zero-persistence display:
In this case, there’s no significant movement of the display relative to the eye while the pixel is illuminated, because the pixel is only on for a very short time. Consequently, there’s no movement of the pixel across the retina, which means that zero persistence (or, in practice, sufficiently low persistence, below roughly 2 ms, maybe less at 1080p with a 90 degree FOV) should almost completely eliminate the smear component of judder. Experimental prototypes confirm that this is the case; images on low-persistence HMDs remain sharp regardless of head and eye motion.
Fixing one VR problem generally just reveals another one, though, and low persistence is no exception.
In the last post, I noted that strobing – the perception of multiple copies of a virtual image – can occur when frame-to-frame locations for an image are more than very roughly 5 to 10 arc minutes apart, although whether and at what separation strobing actually occurs is heavily content-dependent. At 60 Hz, successive frames of an image will be 5 arc minutes apart if the eyes are moving at just 5 degrees/second relative to the image; 10 arc minutes is 10 degrees/second. For context, a leisurely head turn is likely to be in the ballpark of 100 degrees/second, so it is very easy for the eyes to have a high enough velocity relative to an image so that strobing results.
(As an aside, this is the other reason that very high refresh rates work so well. Not only does increasing refresh rate decrease persistence time, it also decreases inter-frame time, which in turn decreases strobing by reducing the distance images move between frames.)
Smear hides a lot of strobing in the case of judder. Without smear, previously-invisible strobing becomes an issue on low persistence displays. However, low-persistence strobing isn’t quite as serious a problem as it may at first seem, because whatever image your eye is following won’t strobe, for the simple reason that the eye is tracking it; the pixels from that image will land on the same place on the retina each frame, so there’s no frame-to-frame separation to produce strobing. (This assumes perfect tracking and consistent frame rate; tracking error or variable frame rate can result in sufficient misregistration to induce strobing.) And because that image is the center of attention, and because it lands on the high-resolution area of the eye, most of the perceptual system will be focused there, with relatively little processing power devoted to the rest of the scene, so low-persistence strobing may not be as noticeable as you might think.
For example, if you track a car moving from left to right across a scene on a low-persistence display, the car will appear very sharp and clear, with no strobing. The rest of the scene can strobe, since the eye is moving relative to those pixels. However, that may not be very noticeable, depending on refresh rate, speed of eye motion, contents of the background, and the particular eye’s characteristics. (There’s considerable person-to-person variation; personally, I’m much more sensitive to strobing than most people.) It also probably matters how absorbing the image being tracked is. If you’re following a rocket that requires a split-second response, you may not notice peripheral strobing; if you’re scanning your surroundings for threats, you’re more likely to pick up some strobing. However, I should caution that this is just hypothesis at this point; we haven’t done the tests to know for sure.
If low-persistence strobing does turn out to be a problem, the obvious solution is, once again, higher frame rate. It’s possible that low persistence combined with a higher frame rate could get away with a lower frame rate than is needed with increasing frame rate alone. Even so, the frame rate required is higher than is currently available in consumer parts, so it’s probably not a viable option in the near future. An alternative would be to render all the objects in the scene with motion blur, thereby keeping images in successive frames from being far enough apart to strobe and lowering image frequency (which increases the non-strobing separation). However, even if that works perfectly, it has several significant downsides: first, it requires extra rendering, second, it requires calculating the movement of each virtual object relative to the eye, and third, it requires eyetracking. It’s not clear whether the benefits would outweigh the costs.
Strobing was a fairly predictable consequence of low persistence; we knew we’d encounter it before we ever built any prototypes, because we came across this paper. (I should note, however, that strobing is not nearly as well-researched as persistence smear.) Similarly, we expected to run into issues with low-persistence motion perception, because a series of short, bright photon bursts from low-persistence virtual images won’t necessarily produce the same effects in the eye’s motion detectors as a continuous stream of photons from a real object. We expected those issues to be in areas such as motion sickness, accurate motion estimation, and reaction time. However, we’ve come across one motion artifact that is far weirder than we would have anticipated, and that seems to be based on much deeper, less well-understood mechanisms than strobing.
By way of introduction, I’ll point out that if you look at a row of thin green vertical bars on a low-persistence display and saccade to the left or right, strobing is very apparent; multiple copies of each line appear. As I mentioned above, strobing is not that well understood, but there are a couple of factors that seem likely to contribute to this phenomenon.
The first factor is the interaction of low persistence with saccadic masking. It’s a widespread belief that the eye is blind while saccading, and while the eye actually does gather a variety of information during saccades, it is true that normally no sharp images can be collected because the image of the real world smears across the retina, and that saccadic masking raises detection thresholds, keeping those smeared images from reaching our conscious awareness. However, low-persistence images can defeat saccadic masking, perhaps because saccadic masking fails when mid-saccadic images are as clear as pre- and post-saccadic images in the absence of retinal smear. At saccadic eye velocities (several hundred degrees/second), strobing is exactly what would be expected if saccadic masking fails to suppress perception of the lines flashed during the saccade.
One other factor to consider is that the eye and brain need to have a frame of reference at all times in order to interpret incoming retinal data and fit it into a model of the world. It appears that when the eye prepares to saccade, it snapshots the frame of reference it’s saccading from, and prepares a new frame of reference for the location it’s saccading to. Then, while it’s moving, it normally suppresses the perception of retinal input, so no intermediate frames of reference are needed. However, as noted above, saccadic masking can fail when a low-persistence image is perceived during a saccade. In that case, neither of the frames of reference is correct, since the eye is between the two positions. There’s evidence that the brain uses a combination of an approximated eye position signal and either the pre- or post-saccadic frame of reference, but the result is less accurate than usual, so the image is mislocalized; that is, it’s perceived to be in the wrong location.
It’s possible that both of these factors are occurring and interacting in the saccadic strobing case described above. The strobing of the vertical bars is certainly an interesting matter (at least to HMD developers!), but it seems relatively straightforward. However, the way the visual system interprets data below the conscious level has many layers, and the mechanisms described above are at a fairly low level; higher levels contain phenomena that are far stranger and harder to explain, as we learned by way of the kind of accident that would make a good story in the book of how VR gaming came to be.
Not long ago, I wrote a simple prototype two-player VR game that was set in a virtual box room. For the walls, ceiling, and floor of the room, I used factory wall textures, which were okay, but didn’t add much to the experience. Then Aaron Nicholls suggested that it would be better if the room was more Tron-like, so I changed the texture to a grid of bright, thin green lines on black, as if the players were in a cage made of a glowing green coarse mesh.
When I tried it out on the Rift, it did look better when my head wasn’t moving quickly. However, both smear and strobing were quite noticeable; strobing isn’t usually very apparent on the Rift, due to smearing, but the thin green lines were perfect for triggering strobing. I wanted to see what it looked like with no judder, so next I ran it on a low-persistence prototype. The results were unexpected.
For the most part, it looked fantastic. Both the other player and the grid on the walls were stable and clear under all conditions. Then Atman Binstock tried standing near a wall, looking down the wall into the corner it made with the adjacent wall and the floor, and shifting his gaze rapidly to look at the middle of the wall. What happened was that the whole room seemed to shift or turn by a very noticeable amount. When we mentally marked a location in the HMD and repeated the triggering action, it was clear that the room hadn’t actually moved, but everyone who tried it agreed that there was an unmistakable sense of movement, which caused a feeling that the world was unstable for a brief moment. Initially, we thought we had optics issues, but Aaron suspected persistence was the culprit, and when we went to full persistence, the instability vanished completely. In further testing, we were able to induce a similar effect in the real world via a strobe light.
This type of phenomenon has a name – visual instability – but there are multiple mechanisms involved, and the phenomenon isn’t fully understood. It’s not hard to come up with possible explanations, though. For example, it could be that mislocalization, as described above, causes a sense that the world has shifted; and if the world has shifted, there must have been motion in order to get it there, hence the perception of motion. Once the saccade stops, everything goes back to being in the right place, leaving only a disorienting sense of movement. Or perhaps the motion detectors are being stimulated directly by the images that get past saccadic masking, producing a sense of motion without any actual motion being involved.
All that sounds plausible, but it’s hard to explain why the same thing doesn’t happen with vertical lines. Apparently the visual instability effect that we identified requires enough visual data to form a 3D model of the world before it can kick in. That, in turn, implies that this effect is much higher-level than anything we’ve seen so far, and reflects sophisticated 3D processing below the conscious level, a mechanism that we have very little insight into at this point.
How could this effect be eliminated? Yet again, 1000 Hz would probably do the trick. The previously-mentioned approach of motion-blurring might work too; it all depends on whether the motion-blurred images would make it through saccadic masking, and that’s a function of what triggers saccadic masking, which is not fully understood. A final approach would be to author content to avoid high-frequency components; it’s not clear exactly what would be needed to make this work well, but it is certainly true that the visual instability effect is not very visible playing, say, Half-Life 2 on a low-persistence HMD.
It’s unclear whether the visual instability effect is a significant problem, since in our experiments it’s less pronounced or undetectable with normal game content. The same is true for any of the motion detection problems we think might be caused by low persistence; even if they exist, the eye-brain combination may be able to adapt, as it has for many aspects of displays. But such adaption may not be complete, especially below the conscious level, and that sort of partial adaption that may cause fatigue and motion sickness. And even when adaptation is complete, the process of adaptation can be unpleasant, as for example is often the case when people get new eyeglasses. It’s going to take a lot of R&D before all this is sorted out, which is one reason I say that VR is going to continue improving for decades.
In any case, the visual instability effect is an excellent example of how complicated and poorly-understood HMD visual perception currently is, and how solving one problem can uncover another. Initially, we saw color fringing resulting from temporally separated red, green, and blue subpixels. We fixed that by displaying the components simultaneously, and then found that visual quality was degraded by judder. We fixed judder by going to low persistence, and ran into the visual instability effect. And the proposed solutions to the visual instability effect that are actually feasible (as opposed to 1000 Hz or higher update rate), as well as whatever solutions are devised for any other low-persistence motion detection problems, will likely cause or uncover new problems. Fortunately, it does seem like the scale of the problems is decreasing as we get farther down the rabbit hole – although diagnosing the causes of the problems and fixing them seems to be becoming more challenging at the same time.
And with that, we come to the limits of our present knowledge in this area. I wish I could lay out chapter and verse on the issues and the solutions, but I wanted to give you a sense of just how different HMDs are from anything that’s come before. And besides, while it would be great if someday soon someone like Tracy Kidder writes the definitive book about how mass-market VR happened, past history isn’t encouraging in that respect, so I hope these last three posts have conveyed to at least some extent what it’s like to be in the middle of figuring out a whole new technology that has the potential to affect all of us for decades to come.
The short version: hard but fun, and exciting as hell.