Author Archives: MAbrash

My Steam Developers Day Talk

It was a lot of fun talking at Steam Developer Days; the whole event was a blast, the virtual reality talks drew a large, enthusiastic crowd, and everyone I talked to had good questions and observations. Here are the slides from my talk, in PDF form. They include the text of the talk as the per-slide notes; I don’t ad-lib when I give talks, so the notes match what I said almost exactly.

Here are some of my previous posts that discuss points from the talk in more detail.

You may also find the slides from my Game Developers Conference talk in 2013 to be useful.

Joe Ludwig’s slides from his talk about Steam VR are here, and related links can be found here, here, and here

As I said at the end of my talk, I look forward to continuing the conversation with you in the comments!

Update: the talks are online

Videos of Steam Dev Day talks are now posted here. There are four talks about VR: mine, Joe Ludwig’s, Palmer Luckey’s, and one by the Owlchemy guys.

Down the VR rabbit hole: Fixing judder

Over the years, I’ve had the good fortune to meet and work with a lot of remarkable people and do more than my share of interesting things. There are some things that still haven’t happened, though. I haven’t written a compiler. I haven’t written a 3D game from scratch on my own or figured out how to do anything interesting with cellular automata. I’ve worked with Gates and Newell and met Bentley and Akeley and Neal Stephenson, but I haven’t met Knuth or Page or Brin or Miyamoto, and now I’ll never get to meet Iain Banks.

And then there’s this: I’ve been waiting 20 years for someone to write a great book about a project I worked on. A book I’ll read and say, “Yes! That’s exactly how it was!” A book that I can pick up when I’m 80, whenever I want to remember what it was like to help build the future.

Hasn’t happened yet.

You’d think it would have by now, considering that I’ve worked on some significant stuff and appeared in no less than four books, but Tracy Kidder-class writers seem to be thin on the ground. Any of the four books could have been great – the material was certainly there – but each fell well short, for a couple of reasons.

First, there were too many significant inaccuracies and omissions for my taste. Maybe someday I’ll take the time to set the record straight, but as just one example, Laura Fryer, the indispensable, hyper-competent complement to Seamus Blackley and a person without whom the original Xbox would not have shipped successfully, simply vanished in Opening the Xbox. That’s not unusual – writers of tech history have limited space to work with and have to choose who to feature and what story they want to tell – but leaving out Laura meant leaving out a big chunk of the history of Xbox as I experienced it.

That touches on the other problem that all four books had to one degree or another: they failed to capture what it felt like to be part of an industry-changing project. That’s a real loss, because being part of a project like Windows NT or Quake is a remarkable experience, one I badly miss whenever I’m working on something more mundane or a project that doesn’t turn out as I had hoped.

Happily, I’m becoming steadily more confident that my current project, VR, is going to be one of the game-changers. That opinion was recently bolstered by the experience of wearing a relatively inexpensive prototype head-mounted display that is possibly the best VR hardware ever made, probably good enough to catapult VR into widespread usage, given the right software. Exciting times indeed, and I hope someday soon there’s a VR breakthrough into wide usage – along with a book about it that fully conveys that excitement.

Which isn’t to say that everything about VR has been figured out, not by a long shot; there’s certainly plenty left to work out with tracking, for example, not to mention input, optics, and software. And, of course, there’s always the most prominent issue, VR displays. In particular, the last two posts discussed the perceptual problems that can result from color-sequential and full-persistence displays, respectively; this post will describe how to fix the problems of full persistence, then look at the new problems that opens up.

If you haven’t done so already, I strongly recommend that you read both of the previous posts (here and here) before continuing on.

The obvious solution to judder

Last time, we saw how eye movement relative to a head-mounted display can produce a perceptual effect called judder, a mix of smearing and strobing that can significantly reduce visual quality. The straightforward way to reduce judder is to make displays more like reality, and the obvious way to do that is to increase frame rate.

Here’s the space-time diagram from last time for an image that’s being tracked by the eye on a head-mounted display, producing judder:

And here’s the diagram for a real object being tracked by the eye:

(In both cases, remember that both the eye and object/image are moving relative to the display, but they are not moving relative to each other.)

If we double the frame rate, we get this:

which is significantly closer to the diagram for the real object. Taking that to the limit, if we could make the frame rate infinite, we would get exactly the same diagram as for the real object. Unfortunately, an infinite frame rate is not an option, but somewhere between 60 Hz and infinity, there must be a frame rate that’s good enough so that the eye can’t tell the difference. The question is, what is that frame rate?

There’s no one answer to that question; it depends on the scene content, resolution, FOV, pixel fill, display type, speed of eye motion, and characteristics of the eye. I can tell you, though, that 100 Hz is nowhere near enough. 200 Hz would be a significant improvement but still not enough; the sweet spot for 1080p at 90 degrees FOV is probably somewhere between 300 and 1000 Hz, although higher frame rates would be required to hit the sweet spot at higher resolutions. A 1000 Hz display would very likely look great, and would also almost certainly reduce or eliminate a number of other HMD problems, possibly including motion sickness, because it would interact with the visual system in a way that mimics reality much more closely than existing displays. I have no way of knowing any of that for sure, though, since I’ve never seen a 1000 Hz head-mounted display myself, and don’t ever expect to.

And there’s the rub – there are no existing consumer displays capable of anywhere near the required refresh rates, and no existing consumer data links that can transfer the amount of video data that would be required. There’s no current reason to build such a display or link, and even if there were, rendering at that rate would require such a huge reduction in scene complexity that net visual quality would not be impressive – the lack of judder would be great, but 2005-level graphics would undo a lot of that advantage. (That’s not to say 2005-level graphics couldn’t be adequate for VR – after all, they were good enough for Half-Life 2 – but they would be clearly inferior to PC and console graphics; also, really good VR is going to require a lot more resolution than 1080p, and it’s a moot point anyway because there’s no prospect of consumer displays that can handle anything like 1000 Hz.)

So increased refresh rate is a perfect solution to judder and other problems – except that it’s completely impractical, at least in the near future. So it’s on to Plan B, which is not a perfect solution, but is at least feasible. Before we can discuss that, though, we need to touch on persistence.

Persistence

Judder is an artifact of persistence – that is, of the fact that during each frame pixels remain illuminated for considerable periods of time.

Full persistence is when pixels are lit for the entire frame. This is the case with many OLED and LCD displays, although it is by no means required for either technology. Here’s the space-time diagram for a full-persistence display, for the case where the eye is fixated straight ahead while a virtual image is moving relative to the eye:

Here’s half-persistence, where pixels remain lit for half a frame:

And here’s zero-persistence, where pixels are lit for only a tiny fraction of each frame, but at very high intensity to compensate for the short duration. Scanning laser displays are effectively zero-persistence.

The diagrams above are for the case where the eye is fixated while the virtual image moves. That’s not the key judder case, though; the key case is when the eye is moving relative to the display. Here’s the diagram for that on a full-persistence display again:

As the diagram illustrates, the smear part of judder results from each pixel moving across the retina during the time it’s lit, due to eye motion relative to the display. It’s actually not the fraction of a frame for which pixels remain lit that determines the extent of the smearing, it’s the absolute time for which pixels are illuminated, because that (times eye speed) is what determines how long the smears on the retina are. At 1000 Hz, full persistence is only 1 ms, short enough to eliminate judder in most cases – and while 1000 Hz isn’t practical, that observation leads us in the direction of the second, more practical solution to judder: low persistence.

Here’s the same scenario as the last diagram – the eye moving relative to the display – but with a zero-persistence display:

In this case, there’s no significant movement of the display relative to the eye while the pixel is illuminated, because the pixel is only on for a very short time. Consequently, there’s no movement of the pixel across the retina, which means that zero persistence (or, in practice, sufficiently low persistence, below roughly 2 ms, maybe less at 1080p with a 90 degree FOV) should almost completely eliminate the smear component of judder. Experimental prototypes confirm that this is the case; images on low-persistence HMDs remain sharp regardless of head and eye motion.

Fixing one VR problem generally just reveals another one, though, and low persistence is no exception.

Side effects of low persistence

In the last post, I noted that strobing – the perception of multiple copies of a virtual image – can occur when frame-to-frame locations for an image are more than very roughly 5 to 10 arc minutes apart, although whether and at what separation strobing actually occurs is heavily content-dependent. At 60 Hz, successive frames of an image will be 5 arc minutes apart if the eyes are moving at just 5 degrees/second relative to the image; 10 arc minutes is 10 degrees/second. For context, a leisurely head turn is likely to be in the ballpark of 100 degrees/second, so it is very easy for the eyes to have a high enough velocity relative to an image so that strobing results.

(As an aside, this is the other reason that very high refresh rates work so well. Not only does increasing refresh rate decrease persistence time, it also decreases inter-frame time, which in turn decreases strobing by reducing the distance images move between frames.)

Smear hides a lot of strobing in the case of judder. Without smear, previously-invisible strobing becomes an issue on low persistence displays. However, low-persistence strobing isn’t quite as serious a problem as it may at first seem, because whatever image your eye is following won’t strobe, for the simple reason that the eye is tracking it; the pixels from that image will land on the same place on the retina each frame, so there’s no frame-to-frame separation to produce strobing. (This assumes perfect tracking and consistent frame rate; tracking error or variable frame rate can result in sufficient misregistration to induce strobing.) And because that image is the center of attention, and because it lands on the high-resolution area of the eye, most of the perceptual system will be focused there, with relatively little processing power devoted to the rest of the scene, so low-persistence strobing may not be as noticeable as you might think.

For example, if you track a car moving from left to right across a scene on a low-persistence display, the car will appear very sharp and clear, with no strobing. The rest of the scene can strobe, since the eye is moving relative to those pixels. However, that may not be very noticeable, depending on refresh rate, speed of eye motion, contents of the background, and the particular eye’s characteristics. (There’s considerable person-to-person variation; personally, I’m much more sensitive to strobing than most people.) It also probably matters how absorbing the image being tracked is. If you’re following a rocket that requires a split-second response, you may not notice peripheral strobing; if you’re scanning your surroundings for threats, you’re more likely to pick up some strobing. However, I should caution that this is just hypothesis at this point; we haven’t done the tests to know for sure.

If low-persistence strobing does turn out to be a problem, the obvious solution is, once again, higher frame rate. It’s possible that low persistence combined with a higher frame rate could get away with a lower frame rate than is needed with increasing frame rate alone. Even so, the frame rate required is higher than is currently available in consumer parts, so it’s probably not a viable option in the near future. An alternative would be to render all the objects in the scene with motion blur, thereby keeping images in successive frames from being far enough apart to strobe and lowering image frequency (which increases the non-strobing separation). However, even if that works perfectly, it has several significant downsides: first, it requires extra rendering, second, it requires calculating the movement of each virtual object relative to the eye, and third, it requires eyetracking. It’s not clear whether the benefits would outweigh the costs.

Down the rabbit hole

Strobing was a fairly predictable consequence of low persistence; we knew we’d encounter it before we ever built any prototypes, because we came across this paper. (I should note, however, that strobing is not nearly as well-researched as persistence smear.) Similarly, we expected to run into issues with low-persistence motion perception, because a series of short, bright photon bursts from low-persistence virtual images won’t necessarily produce the same effects in the eye’s motion detectors as a continuous stream of photons from a real object. We expected those issues to be in areas such as motion sickness, accurate motion estimation, and reaction time. However, we’ve come across one motion artifact that is far weirder than we would have anticipated, and that seems to be based on much deeper, less well-understood mechanisms than strobing.

By way of introduction, I’ll point out that if you look at a row of thin green vertical bars on a low-persistence display and saccade to the left or right, strobing is very apparent; multiple copies of each line appear. As I mentioned above, strobing is not that well understood, but there are a couple of factors that seem likely to contribute to this phenomenon.

The first factor is the interaction of low persistence with saccadic masking. It’s a widespread belief that the eye is blind while saccading, and while the eye actually does gather a variety of information during saccades, it is true that normally no sharp images can be collected because the image of the real world smears across the retina, and that saccadic masking raises detection thresholds, keeping those smeared images from reaching our conscious awareness. However, low-persistence images can defeat saccadic masking, perhaps because saccadic masking fails when mid-saccadic images are as clear as pre- and post-saccadic images in the absence of retinal smear. At saccadic eye velocities (several hundred degrees/second), strobing is exactly what would be expected if saccadic masking fails to suppress perception of the lines flashed during the saccade.

One other factor to consider is that the eye and brain need to have a frame of reference at all times in order to interpret incoming retinal data and fit it into a model of the world. It appears that when the eye prepares to saccade, it snapshots the frame of reference it’s saccading from, and prepares a new frame of reference for the location it’s saccading to. Then, while it’s moving, it normally suppresses the perception of retinal input, so no intermediate frames of reference are needed. However, as noted above, saccadic masking can fail when a low-persistence image is perceived during a saccade. In that case, neither of the frames of reference is correct, since the eye is between the two positions. There’s evidence that the brain uses a combination of an approximated eye position signal and either the pre- or post-saccadic frame of reference, but the result is less accurate than usual, so the image is mislocalized; that is, it’s perceived to be in the wrong location.

It’s possible that both of these factors are occurring and interacting in the saccadic strobing case described above. The strobing of the vertical bars is certainly an interesting matter (at least to HMD developers!), but it seems relatively straightforward. However, the way the visual system interprets data below the conscious level has many layers, and the mechanisms described above are at a fairly low level; higher levels contain phenomena that are far stranger and harder to explain, as we learned by way of the kind of accident that would make a good story in the book of how VR gaming came to be.

Not long ago, I wrote a simple prototype two-player VR game that was set in a virtual box room. For the walls, ceiling, and floor of the room, I used factory wall textures, which were okay, but didn’t add much to the experience. Then Aaron Nicholls suggested that it would be better if the room was more Tron-like, so I changed the texture to a grid of bright, thin green lines on black, as if the players were in a cage made of a glowing green coarse mesh.

When I tried it out on the Rift, it did look better when my head wasn’t moving quickly. However, both smear and strobing were quite noticeable; strobing isn’t usually very apparent on the Rift, due to smearing, but the thin green lines were perfect for triggering strobing. I wanted to see what it looked like with no judder, so next I ran it on a low-persistence prototype. The results were unexpected.

For the most part, it looked fantastic. Both the other player and the grid on the walls were stable and clear under all conditions. Then Atman Binstock tried standing near a wall, looking down the wall into the corner it made with the adjacent wall and the floor, and shifting his gaze rapidly to look at the middle of the wall. What happened was that the whole room seemed to shift or turn by a very noticeable amount. When we mentally marked a location in the HMD and repeated the triggering action, it was clear that the room hadn’t actually moved, but everyone who tried it agreed that there was an unmistakable sense of movement, which caused a feeling that the world was unstable for a brief moment. Initially, we thought we had optics issues, but Aaron suspected persistence was the culprit, and when we went to full persistence, the instability vanished completely. In further testing, we were able to induce a similar effect in the real world via a strobe light.

This type of phenomenon has a name – visual instability – but there are multiple mechanisms involved, and the phenomenon isn’t fully understood. It’s not hard to come up with possible explanations, though. For example, it could be that mislocalization, as described above, causes a sense that the world has shifted; and if the world has shifted, there must have been motion in order to get it there, hence the perception of motion. Once the saccade stops, everything goes back to being in the right place, leaving only a disorienting sense of movement. Or perhaps the motion detectors are being stimulated directly by the images that get past saccadic masking, producing a sense of motion without any actual motion being involved.

All that sounds plausible, but it’s hard to explain why the same thing doesn’t happen with vertical lines. Apparently the visual instability effect that we identified requires enough visual data to form a 3D model of the world before it can kick in. That, in turn, implies that this effect is much higher-level than anything we’ve seen so far, and reflects sophisticated 3D processing below the conscious level, a mechanism that we have very little insight into at this point.

How could this effect be eliminated? Yet again, 1000 Hz would probably do the trick. The previously-mentioned approach of motion-blurring might work too; it all depends on whether the motion-blurred images would make it through saccadic masking, and that’s a function of what triggers saccadic masking, which is not fully understood. A final approach would be to author content to avoid high-frequency components; it’s not clear exactly what would be needed to make this work well, but it is certainly true that the visual instability effect is not very visible playing, say, Half-Life 2 on a low-persistence HMD.

It’s unclear whether the visual instability effect is a significant problem, since in our experiments it’s less pronounced or undetectable with normal game content. The same is true for any of the motion detection problems we think might be caused by low persistence; even if they exist, the eye-brain combination may be able to adapt, as it has for many aspects of displays. But such adaption may not be complete, especially below the conscious level, and that sort of partial adaption that may cause fatigue and motion sickness. And even when adaptation is complete, the process of adaptation can be unpleasant, as for example is often the case when people get new eyeglasses. It’s going to take a lot of R&D before all this is sorted out, which is one reason I say that VR is going to continue improving for decades.

In any case, the visual instability effect is an excellent example of how complicated and poorly-understood HMD visual perception currently is, and how solving one problem can uncover another. Initially, we saw color fringing resulting from temporally separated red, green, and blue subpixels. We fixed that by displaying the components simultaneously, and then found that visual quality was degraded by judder. We fixed judder by going to low persistence, and ran into the visual instability effect. And the proposed solutions to the visual instability effect that are actually feasible (as opposed to 1000 Hz or higher update rate), as well as whatever solutions are devised for any other low-persistence motion detection problems, will likely cause or uncover new problems. Fortunately, it does seem like the scale of the problems is decreasing as we get farther down the rabbit hole – although diagnosing the causes of the problems and fixing them seems to be becoming more challenging at the same time.

And with that, we come to the limits of our present knowledge in this area. I wish I could lay out chapter and verse on the issues and the solutions, but I wanted to give you a sense of just how different HMDs are from anything that’s come before. And besides, while it would be great if someday soon someone like Tracy Kidder writes the definitive book about how mass-market VR happened, past history isn’t encouraging in that respect, so I hope these last three posts have conveyed to at least some extent what it’s like to be in the middle of figuring out a whole new technology that has the potential to affect all of us for decades to come.

The short version: hard but fun, and exciting as hell.

Why virtual isn’t real to your brain: judder

Last time, we started to look at the ways in which the interaction of a head-mounted display with the eye and the brain leads to perceptual artifacts that are unique to HMDs and that can greatly affect VR/AR experiences. We looked closely at one of those artifacts, whereby use of a color-sequential display in an HMD leads to color fringing. I chose to start the discussion of perceptual artifacts with color fringing not because it was the most problematic artifact, but rather because the temporal separation of the color components makes it easy to visualize the effects of relative motion between the eye and the display. In point of fact, color fringing can easily be eliminated by using a display, such as LCD, OLED, color-filter LCOS, or scanning laser, that illuminates all three color components simultaneously. (I hope HMD manufacturers are reading this, because many of them are still using color-sequential LCOS.) However, the next artifact we’re going to look at, judder, is not so easily fixed.

Judder, as it relates to displayed images, has no single clear definition; it’s used by cinematographers in a variety of ways. I’m going to use the term here to refer to a combination of smearing and strobing that’s especially pronounced on VR/AR HMDs; why that’s so is the topic of today’s post.

The place to start with judder is with the same rule we started with last time: visual perception is a function of when and where photons land on the retina. When it comes to HMDs, this rule is much less straightforward than it seems, due to eye motion relative to the display in conjunction with the temporal and spatial quantization performed by displays; we saw two examples of that last time, and judder will be yet a third example. By “temporal and spatial quantization,” I mean that any given pixel is illuminated for some period of time over the course of each frame, and during that time its color remains constant within the pixel bounds; that’s a simplification, but it’s close enough for our purposes.

When we looked at color fringing, the key was that each color component of any given pixel was illuminated at a different time, so when the eye was moving relative to the display, each color component landed in a different place on the retina. With judder, the key is that the illuminated area of each pixel sweeps a constant color across the retina for however long it’s lit (the persistence time), resulting in a smear; this is then followed by a jump that causes strobing – that is, the perception of multiple simultaneous copies of the image. (It’s not intuitively obvious why this would cause strobing, but it should be clear by the end of this post.) The net result is loss of detail, and quite likely eye fatigue or even increased motion sickness. Let’s look at how this happens in more detail.

If you haven’t done so already, I strongly recommend you read the last post before continuing on.

Why judder happens

In this post, we’re going to look at many of the same mechanisms as last time, but with a different artifact in mind. I’ll repeat some of the discussion from last time to lay the groundwork, but we’ll end up in quite a different place (although everything I’ll talk about was implicit in the last post’s color-fringing diagrams).

Once again, let’s look at a few space-time diagrams. These diagrams plot x position relative to the eye on the horizontal axis, and time advancing down the vertical axis.

First, here’s a real-world object staying in the same position relative to the eye. (This should be familiar, because it’s repeated from the last post).

I’ll emphasize, because it’s important for understanding later diagrams, that the x axis is horizontal position relative to the eye, not horizontal position in the real world. With respect to perception it’s eye-relative position that matters, because that’s what affects how photons land on the retina. So the figure above could represent a situation in which both the eye and the object are not moving, but it could just as well represent a situation in which the object is moving and the eye is tracking it.

The figure would look the same for the case where both a virtual, rather than real, object and the eye are not moving relative to one another, unless the color of the object was changing. In that case, a real-world object could change color smoothly, while a virtual object could only change color once per frame. However, the figure would not look the same for the case where a virtual object is moving and the eye is tracking it; in fact, that case goes to the heart of what this post is about, and we’ll discuss it shortly.

Next, let’s look at a case where the object is moving relative to the eye. (Again, this is repeated from the last post.) Here a real-world object is moving from left to right at a constant velocity relative to the eye. The most common case of this would be where the eye is fixated on something else, while the object moves through space from left to right.

In contrast, here’s the case where a virtual object is moving from left to right relative to the eye. Throughout today’s post, I’m going to assume the display is one that displays all three color components simultaneously; that means that in contrast to the similar diagram from the last post, the pixel color is constant throughout each frame, rather than consisting of sequential red, green, and blue.

Because each pixel can update only once a frame and remains lit for the persistence time, the image is quantized to pixel locations spatially and to persistence times temporally, resulting in stepped rather than continuous motion. In the case shown above, that wouldn’t produce judder, although it would generally produce strobing at normal refresh rates if the virtual object contained high spatial frequencies.

Note that in these figures, unless otherwise noted, persistence time – the time each pixel remains lit – is the same as the frame time – that is, these are full-persistence displays.

So far, so good, but neither of the above cases involves motion of the eye relative to the display, and it’s specifically that motion that causes judder. As explained last time, the eye can move relative to the display, while still being able to see clearly, either because it’s tracking a moving virtual object or because it’s fixated on a static virtual or real object via VOR while the head turns. (I say “see clearly” because the eye can also move relative to the display by saccading, but in that case it can’t see clearly, although, contrary to popular belief, it does still acquire and use visual information.) The VOR case is particularly interesting, because, as discussed in the last post, it can involve very high relative velocities (hundreds of degrees per second) between the eye and the display, and consequently very long smears.

Here’s the relative-motion case.

Once again, remember that the x axis is horizontal motion relative to the eye. If the display had an infinite refresh rate, the plot would be a vertical line, just like the first space-time diagram above. Given actual refresh rates, however, what happens is that a given virtual object lights up the correct pixels for its virtual position at the start of the frame (assuming either no latency or perfect prediction), and then, because those pixels remain unchanged both in color and in position on the display over the full persistence time and because the eye is moving relative to the display, the pixels slide over the retina for the duration of the frame, falling behind the correct location for the moving virtual object. At the start of the next frame, the virtual object is again redrawn at the proper location for that time, lighting up a different set of pixels on the screen, so the image snaps back to the right position in virtual space, and the pixels then immediately start to slide again.

It’s hard to film judder of exactly the sort defined above, but this video shows a very similar mechanism in slow motion. Judder as I’ve discussed it involves relative motion between the eye and the display. In the video, in contrast, the camera is rigidly attached to the display, and they pan together across a wall that contains several markers used for optical tracking. The display pose is tracked, and a virtual image is superimposed on each marker; the real-world markers are dimly visible as patterns of black-and-white squares through the virtual images. The video was shot through an HMD at 300 frames per second, and is played back at one-fifth speed, making it easy to see the relationship between the virtual and real images. You can see that because the virtual images are only updated once per displayed frame, they slide relative to the markers – they move ahead of the markers, because they stay in the same place on the display, and the display is moving – for a full displayed frame time (five camera frames), then jump back to the correct position.

This phenomenon is not exactly what happens with the HMD judder I’ve been talking about – the images are moving relative to the camera, rather than having the camera tracking them – but it does clearly illustrate how the temporal quantization of displayed pixels causes images to slide from the correct position over the course of a frame. I strongly recommend that you play a little of the video one frame at a time, so you can see that what actually happens is that the virtual image stays in the same position on the screen for five camera frames, while the physical marker moves across the screen continuously due to motion of the HMD/camera. If you substituted your eye for the camera and looked straight ahead, as the camera did, you would only see strobing of the virtual images, not smearing, as the virtual images jumped from one displayed frame to the next. However, if instead you moved the HMD as in the video but at the same time moved your eye to keep it fixated on either the physical or virtual marker, you would in fact see exactly the form of judder showed in the last diagram; you should be able to directly map that scenario to the last diagram. In particular, the images would smear.

You might reasonably wonder how bad the smear can be, given that frame times are measured in milliseconds. The answer is: worse than you probably think.

When you turn your head at a leisurely speed, that’s in the neighborhood of 100 degrees per second. Suppose you turn your head at 120 degrees per second, while wearing a 60 Hz HMD; that’s two degrees per displayed frame. Two degrees doesn’t sound like much, but on an Oculus Rift development kit it’s about 14 pixels, and if an HMD existed that had a resolution approximating the resolving capability of the human eye, a two-degree arc across it would cross hundreds of pixels. So the smear part of judder is very noticeable. Since I have no way to show it to you directly, let’s look at a simulation of it.

Here’s a rendered scene:

And here’s what it looks like after the image is smeared across two degrees:

Clearly, smearing can have a huge impact on detail and sharpness.

In contrast, this video shows how smooth the visuals are when a high-speed camera is panned across a monitor. (The video quality is not very good, but it’s good enough so that you can see how stable the displayed images are compared to the shifting and jumping in the first video.) The difference is that in the first video, tracking was used to try to keep a virtual image on a see-through HMD in the right place relative to the real world as the camera moved, with the pixels on the HMD moving relative to the real world over the course of each frame; in the second video, the image was physically displayed on a real-world object (a monitor), so each pixel remained in a fixed position in the real world at all times. This neatly illustrates the underlying reason VR/AR HMDs differ markedly from other types of displays – virtual images on HMDs have to be drawn to register correctly with the real world, rather than simply being drawn in a fixed location in the real world.

Besides smear, the other effect you can see in the first video is that the images snap back to the right location at the start of each frame, as shown in the last space-time diagram. Again, the location and timing of photons on the retina is key. If an image moves more than about five or ten arc-minutes between successive updates, it can start to strobe; that is, you may see multiple simultaneous copies of the image. At a high enough head-turn speed, the image will move farther than this threshold when it snaps back to the correct location at the start of each frame (and even a very slow 10 degrees per second head turn can be enough for images containing high frequencies), so judder can feature strobing in addition to smearing.

It’s worth noting that this effect is reduced because intensity lessens toward both ends of the smear for features that are more than one pixel wide. The reason is very straightforward: the edges of such smears are covered by the generating feature for only part of the persistence time. However, that’s a mixed blessing; the eye perceives flicker more readily at lower intensities, so the edges of such objects may flicker (an on/off effect), rather than strobe (a multiple-replicas effect).

Also, you might wonder why juddering virtual objects would strobe, rather than appearing as stable smeared images. One key factor is that any variation in latency, error in prediction, or inaccuracy in tracking will result in edges landing at slightly varying locations on the retina, which can produce strobing. Another reason may be that the eye’s temporal summation period doesn’t exactly match the persistence time. For illustrative purposes only, suppose that the persistence time is 10 ms, and the eye’s temporal integration period is 5 ms (a number I just made up for this example). Then the eye will detect a virtual edge not once but twice per frame, and if the eye is moving rapidly relative to the display, those two detections will be far enough apart so that two images will be perceived; in other words, the edge will strobe. (In actuality, the eye’s integration window depends on a number of factors, and does not take a discrete snapshot.) Note, however, that this is only a theory at this point. In any case, the fact is that the eye does perceive strobing as part of judder.

The net effect of smearing and strobing combined is much like a choppy motion blur. At a minimum, image quality is reduced due to the loss of detail from smearing. Strobing tends not to be very visible on full-persistence displays – smearing mostly hides it, and it’s less prominent for images that don’t have high spatial frequencies – but it’s possible that both strobing and smearing contribute to eye fatigue and/or motion sickness, because both seem likely to interfere with the eye’s motion detection mechanisms. The latter point is speculative at this juncture, and involves deep perceptual mechanisms, but I’ll discuss it down the road if it turns out to be valid.

Slow LCD switching times, like those in the Rift development kit HMDs, result in per-frame pixel updates that are quite different from the near-instantaneous modification of the pixel state that you’d see with OLEDs or scanning lasers; with LCD panels, pixel updates follow a ramped curve. This produces blurring that exaggerates smearing, making it longer and smoother, and masks strobing. While that does mostly solve the strobing problem, it is not exactly a win, because the loss of detail is even greater than what would result from full-persistence, rapid-pixel-switching judder alone.

Why isn’t judder a big problem for movies, TV, and computer displays?

I mentioned in the last post that HMDs are very different from other types of displays, and one aspect of that is that judder is a more serious problem for HMDs. Why isn’t judder a major problem for movies, TVs, and computer displays?

Actually, judder is a significant problem for TV and movies, or at least it would be except that cinematographers go to great lengths to avoid it. For example, you will rarely see a rapid pan in a movie, and when you do, you won’t be able to see much of anything other than blur indicating the direction of motion. Dramatic TV filming follows much the same rules as movies. Sports on TV can show judder, and that’s a motivating factor behind higher refresh rates for TVs. And you can see judder on a computer simply by grabbing a window and tracking an edge carefully while dragging it rapidly back and forth (although your results will vary depending on the operating system, graphics hardware, and whether the desktop manager waits for vsync or not). It’s even easier to see judder by going to the contacts list on your phone and tracking your finger as you scroll the list up and down; the text will become blurry and choppy. Better yet, hold your finger on the list and move the phone up and down while your finger stays fixed in space. The list will become very blurry indeed – try to read it. And you can see judder in video games when you track a rapidly moving object, but those tend to appear in the heat of battle, when you have a lot of other things to think about. (Interestingly, judder in video games is much worse now, with LCD monitors, than it was on CRTs; the key reason for this is persistence time, although slow LCD switching times don’t help.)

However, while judder is potentially an issue for all displays, there are two important differences that make it worse for HMDs, as I mentioned last time: first, the FOV in an HMD is much wider, so objects can be tracked for longer, and second, you can turn your head much more rapidly than you can normally track moving objects without saccading, yet still see clearly, thanks to the counter-rotation VOR provides. These two factors make judder much more evident on an HMD. A third reason is that virtual images on a monitor appear to be on a surface in the world, in contrast to virtual images on an HMD, which appear to be directly in the world; this causes the perceptual system to have higher expectations for HMD images and to more readily detect deviations from what we’re used to when looking at the real world.

Next time: the tradeoffs involved in reducing judder

Judder isn’t a showstopper, but it does degrade VR/AR visual quality considerably. I’ve looked through a prototype HMD that has no judder, and the image stayed astonishingly sharp and clear as I moved my head. Moreover, increased pixel density is highly desirable for VR/AR, but the effects of judder get worse the higher the pixel density is, because the smears get longer relative to pixel size, causing more detail to be lost. So is there a way to reduce or eliminate judder?

As it happens, there is – in fact, there are two of them: higher refresh rate and low persistence. However, you will not be surprised to learn that there are complications, and it will take some time to explain them, so the next part of the discussion will have to wait until the next post.

By this point, you should be developing a strong sense of why it’s so hard to convince the eye and brain that virtual images are real. Next time we’ll see that the perceptual rabbit hole goes deeper still.

Why virtual isn’t real to your brain

I was going to start this post off with a discussion of how we all benefit from sharing information, but I just got an email from John Carmack that nicely sums up what I was going to say, so I’m going to go with that instead:

Subject: What a wonderful world…

Just for fun, I was considering writing a high performance line drawing routine for the old Apple //c that Anna got me for Christmas. I could do a pretty good one off the top of my head, but I figured a little literature review would also be interesting. Your old series of articles comes up quickly, and they were fun to look through again. I had forgotten about the run-length slice optimization.

What struck me was this paragraph:

First off, I have a confession to make: I’m not sure that the algorithm I’ll discuss is actually, precisely Bresenham’s run-length slice algorithm. It’s been a long time since I read about this algorithm; in the intervening years, I’ve misplaced Bresenham’s article, and have been unable to unearth it. As a result, I had to derive the algorithm from scratch, which was admittedly more fun than reading about it, and also ensured that I understood it inside and out. The upshot is that what I discuss may or may not be Bresenham’s run-length slice algorithm—but it surely is fast.

The notion of misplacing a paper and being unable to unearth it again seems like a message from another world from today’s perspective. While some people might take the negative view that people no longer figure things out from scratch for themselves, I consider it completely obvious that having large fractions of the sum total of human knowledge at your fingertips within seconds is one of the greatest things to ever happen to humanity.

Hooray for today!

But what’s in it for me?

Hooray for today indeed – as I’ve written elsewhere (for example, the last section of this), there’s huge value to shared knowledge. However, it takes time to write something up and post it, and especially to answer questions. So while we’re far better off overall from sharing information, it seems like any one of us would be better off not posting, but rather just consuming what others have shared.

This appears to be a classic example of the Prisoner’s Dilemma. It’s not, though, because there are generally large, although indirect and unpredictable, personal benefits. There’s no telling when they’ll kick in or what form they’ll take, but make no mistake, they’re very real.

For example, consider how the articles I wrote over a ten-year stretch – late at night after everyone had gone to sleep – opened up virtually all the interesting opportunities I’ve had over the last twenty years.

In 1992, I was writing graphics software for a small company, and getting the sense that it was time to move on. I had spent my entire career to that point working at similar small companies, doing work that was often interesting but that was never going to change the world. It’s easy to see how I could have spent my entire career moving from one such job to another, making a decent living but never being in the middle of making the future happen.

However, in the early 80’s, Dan Illowsky, publisher of my PC games, had wanted to co-write some articles as a form of free advertising. There was nothing particularly special about the articles we wrote, but I learned a lot from doing them, not least that I could get what I wrote published.

Then, in the mid-80’s, I came across an article entitled “Optimizing for Speed” in Programmer’s Journal, a short piece about speeding up bit-doubling on the 8088 by careful cycle counting. I knew from optimization work I’d done on game code that cycle counts weren’t the key on the 8088; memory accesses, which took four cycles per byte, limited almost everything, especially instruction fetching. On a whim, I wrote an article explaining this and sent it off to PJ, which eventually published it, and that led to a regular column in PJ. By the time I started looking around for a new job in 1992, I had stuff appearing in several magazines on a regular basis.

One of those articles was the first preview of Turbo C. Borland had accidentally sent PJ the ad copy for Turbo C before it was announced, and when pressed agreed to let PJ have an advance peek. The regular C columnist couldn’t make it, so as the only other PJ regular within driving distance, I drove over the Santa Cruz Mountains on zero notice one rainy night and talked with VP Brad Silverberg, then wrote up a somewhat breathless (I wanted that development environment) but essentially correct (it really did turn out to be that good) article.

In 1992, Brad had moved on to become VP of Windows at Microsoft, and when I sent him mail looking for work, he referred me to the Windows NT team, where I ended up doing some of the most challenging and satisfying work of my career. Had I not done the Turbo C article, I wouldn’t have known Brad, and might never have had the opportunity to work on NT. (Or I might have; Dave Miller, who I worked with at Video Seven, referred me to Jeff Newman, who pointed me to the NT team as well – writing isn’t the only way opportunity knocks!)

I was initially a contractor on the NT team, and I floundered at first, because I had no experience with working on a big project. I would likely have been canned after a few weeks, were it not for Mike Harrington, who had read some of my articles and thought it was worth helping me out. Mike got me set up on the network, showed me around the development tools, and took me out for dinner, giving me a much-needed break in the middle of a string of 16-hour workdays.

After a few years at Microsoft, I went to work at Id, an opportunity that opened up because John Carmack had read my PJ articles when he was learning about programming the PC. And a few years later, Mike Harrington would co-found Valve, licensing the Quake source code from Id, where I would be working at the time – and where I would help Valve get the license – and thirteen years after that, I would go to work at Valve.

If you follow the thread from the mid-80’s on, two things are clear: 1) it was impossible to tell where writing would lead, and 2) writing opened up some remarkable opportunities over time.

It’s been my observation that both of these points are true in general, not just in my case. The results from sharing information are not at all deterministic, and the timeframe can be long, but generally possibilities open up that would never have been available otherwise. So from a purely selfish perspective, sharing information is one of the best investments you can make.

The unpredictable but real benefits of sharing information are part of why I write this blog. It has brought me into contact with many people who are well worth knowing, both to learn from and to work with; for example, I recently helped Pravin Bhat, who emailed me after reading a blog post and now works at Valve, optimize some very clever tracking code that I hope to talk about one of these days. If you’re interested in AR and VR – or if you’re interested in making video games, or Linux, or hardware, or just think Valve sounds like an great place to work (and you should) – take a look at the Valve Handbook. If, after reading the Handbook, you think you fit the Valve template and Valve fits you, check out Valve’s job openings or send me a resume. We’re interested in both software and hardware – mechanical engineers are particularly interesting right now, but Valve doesn’t hire for specific projects or roles, so I’m happy to consider a broad range of experience and skills – but please, do read the Handbook first to see if there’s likely to be a fit, so you can save us both a lot of time if that’s not the case.

The truth is, I wrote all those articles, and I write this blog, mostly because of the warm feeling I get whenever I meet someone who learned something from what I wrote; the practical benefits were an unexpected bonus. Whatever the motivation, though, sharing information really does benefit us all. With that in mind, I’m going to start delving into what we’ve found about the surprisingly deep and complex reasons why it’s so hard to convince the human visual system that virtual images are real.

How images get displayed

There are three broad factors that affect how real – or unreal – virtual scenes seem to us, as I discussed in my GDC talk: tracking, latency, and the way in which the display interacts perceptually with the eye and the brain. Accurate tracking and low latency are required so that images can be drawn in the right place at the right time; I’ve previously talked about latency, and I’ll talk about tracking one of these days, but right now I’m going to treat latency and tracking as solved problems so we can peel the onion another layer and dive into the interaction of head mounted displays with the human visual system, and the perceptual effects thereof. More informally, you could think of this line of investigation as: “Why VR and AR aren’t just a matter of putting a display an inch in front of each eye and rendering images at the right time in the right place.”

In the next post or two, I’ll take you farther down the perceptual rabbit hole, to persistence, judder, and strobing, but today I’m going to start with an HMD artifact that’s both useful for illustrating basic principles and easy to grasp intuitively: color fringing. (I discussed this in my GDC talk, but I’ll be able to explain more and go deeper here.)

A good place to start is with a simple rule that has a lot of explanatory power: visual perception is a function of where and when photons land on the retina. That may seem obvious, but consider the following non-intuitive example. Suppose the eye is looking at a raster-scan display. Further, suppose a vertical line is being animated on the display, moving from left to right, and that the eye is tracking it. Finally, assume that the pixels on the display have zero persistence – that is, each one is illuminated very brightly for a very short portion of the frame time. What will the eye see?

The pattern shown on the display for each frame is a vertical line, so you might expect that to be what the eye sees, but the eye will actually see a line slanting from upper right to lower left. The reasons for this were discussed here, but what they boil down to is that the pattern in which the photons from the pixels land on the retina is a slanted line. This is far from unusual; it is often the case that what is perceived by the eye differs from what is displayed on an HMD, and the root cause of this is that the overall way in which display-generated photons are presented to the retina has nothing in common with real-world photons.

Real-world photons are continuously reflected or emitted by every surface, and vary constantly. In contrast, displays emit fixed streams of photons from discrete pixel areas for discrete periods of time, so photon emission is quantized both spatially and temporally; furthermore, with head-mounted displays, pixel positions are fixed with respect to the head, but not with respect to the eyes or the real world. In the case described above, the slanted line results from eye motion relative to the pixels during the time the raster scan sweeps down the display.

You could think of the photons from a display as a three-dimensional signal: pixel_color = f(display_x, display_y, time). Quantization arises because pixel color is constant within the bounds defined by the pixel boundaries and the persistence time (the length of time any given pixel remains lit during each frame). When that signal is projected onto the retina, the result for a given pixel is a tiny square that is swept across the retina, with the color constant over the course of a frame; the distance swept per frame is proportional to the distance the eye moves relative to the pixel during the persistence time. The net result is a smear, unless persistence is close to zero or the eye is not moving relative to the pixel.

The above description is a simplification, since pixels aren’t really square or uniformly colored, and illumination isn’t truly constant during the persistence time, but it will suffice for the moment. We will shortly see a case where it’s each pixel color component that remains lit, not the pixel as a whole, with interesting consequences.

The discrete nature of photon emission over time is the core of the next few posts, because most display technologies have significant persistence, which means that most HMDs have a phenomenon called judder, a mix of smearing and strobing (that is, multiple simultaneous perceived copies of images) that reduces visual quality considerably, and introduces a choppiness that can be fatiguing and may contribute to motion sickness. We’ll dive into judder next time; in this post we’ll establish a foundation for the judder discussion, using the example of color fringing to illustrate the basics of the interaction between the eye and a display.

The key is relative motion between the eye and the display

Discrete photon emission produces artifacts to varying degrees for all display and projector based technologies. However, HMDs introduce a whole new class of artifacts, and the culprit is rapid relative motion between the eye and the display, which is unique to HMDs.

When you look at a monitor, there’s no situation in which your eye moves very rapidly relative to the monitor while still being able to see clearly. One reason for this is that monitors don’t subtend a very wide field of view – even a 30-inch monitor would be less than 60 degrees at normal viewing distance – so a rapidly-moving image would vanish off the screen almost as soon as the eye could acquire and track it. In contrast, the Oculus Rift has a 90-degree FOV.

An even more important reason why the eye can move much more rapidly relative to head-mounted displays than to monitors is that HMDs are attached to heads. Heads can rotate very rapidly – 500 degrees per second or more. When the head rotates, the eye can counter-rotate just as fast and very accurately, based on the vestibulo-ocular reflex (VOR). That means that if you fixate on a point on the wall in front of you, then rotate your head as rapidly as you’d like, that point remains clearly visible as your head turns.

Now consider what that means in the context of an HMD. When your head turns while you fixate on a point in the real world, the pixels on the HMD move relative to your eyes, and at a very high speed – easily ten times as fast as you can smoothly track a moving object. This is particularly important because it’s common to look at a new object by first moving the eyes to acquire the target, then remaining fixated on the target while the head turns to catch up. This VOR-based high-speed eye-pixel relative velocity is unique to HMDs.

Let’s look at a few space-time diagrams that help make it clear how HMDs differ from the real world. These diagrams plot x position relative to the eye on the horizontal axis, and time advancing down the vertical axis. This shows how two of the three dimensions of the signal from the display land on the retina, with the vertical component omitted for simplicity.

First, here’s a real-world object sitting still.

I’ll emphasize, because it’s important for understanding later diagrams, that the x axis is horizontal position relative to the eye, not horizontal position in the real world. With respect to perception of images on HMDs it’s eye-relative position that matters, because that’s what affects how photons land on the retina. So the figure above could represent a situation in which both the eye and the object are not moving, but it could just as well represent a situation in which the object is moving and the eye is tracking it.

The figure would look the same for the case where both a virtual image and the eye are not moving, unless the color of the image was changing. In that case, a real-world object could change color smoothly, while a virtual image could only change color once per frame. However, matters would be quite different if the virtual image was moving and the eye was tracking it, as we’ll see shortly.

Next let’s look at a case where something is moving relative to the eye. Here a real-world object is moving from left to right at a constant velocity relative to the eye. The most common case of this would be where the eye is fixated on something else, while the object moves through space from left to right.

Now let’s examine the case where a virtual image is moving from left to right relative to the eye, again while the eye remains fixated straight ahead. There are many types of displays that this might occur on, but for this example we’re going to assume we’re using a color-sequential liquid crystal on silicon (LCOS) display.

Color-sequential LCOS displays, which are (alas, for reasons we’ll see soon) often used in HMDs, display red, green, and blue separately, one after another, for example by reflecting a red LED off a reflective substrate that’s dynamically blocked or exposed by pixel-resolution liquid crystals, then switching the liquid crystals and reflecting a green LED, then switching the crystals again and reflecting a blue LED. (Many LCOS projectors actually switch the crystals back to the green configuration again and reflect the green LED a second time each frame, but for simplicity I’ll ignore that.) This diagram below shows how the red, green, and blue components of a moving white virtual image are displayed over time, again with the eye fixated straight ahead.

Once again, remember that the x axis is horizontal motion relative to the eye. If the display had an infinite refresh rate, the plot would be a diagonal line, just like the second space-time diagram above. Given actual refresh rates, however, something quite different happens.

For a given pixel, each color displays for one-third of each frame. (It actually takes time to switch the mirrors, so each color displays for more like 2 ms per frame, and there are dark periods between colors, but for ease of explanation, let’s assume that each frame is evenly divided between the three colors; the exact illumination time for each color isn’t important to the following discussion.) At 60 Hz, the full cycle is displayed over the course of 16 ms, and because that interval is shorter than the time during which the eye integrates incident light, the visual system blends the colors for each point together into a single composite color. The result is that the eye sees an image with the color properly blended. This is illustrated in the figure below, which shows how the photons from a horizontal white line on an LCOS display land on the retina.

Here the three color planes are displayed separately, one after another, and, because the eye is not moving relative to the display, the three colored lines land on top of each other to produce a perceived white line.

Because each pixel can update only once a frame and remains lit for the persistence time, the image is quantized to pixel locations spatially and to persistence time temporally, resulting in stepped rather than continuous motion. In the case shown above, that wouldn’t produce noticeable artifacts unless the image moved too far between frames – “too far” being on the order of five or ten arc-minutes, depending on the frequency characteristics of the image. In that case, the image would strobe; that is, the eye would perceive multiple simultaneous copies of the image. I’ll talk about strobing in the next post.

So far, so good, but we haven’t yet looked at motion of the eye relative to the display, and it’s that case that’s key to a number of artifacts. As I noted earlier, the eye can move relative to the display, while still being able to see clearly, either when it’s tracking a moving virtual image or when it’s fixated on a static virtual image or real object via VOR while the head turns. (I say “see clearly” because the eye can also move relative to the display by saccading, but in that case it can’t see clearly, although, contrary to popular belief, it does still acquire and use visual information.) As explained above, the VOR case is particularly interesting, because it can involve very high relative velocities between the eye and the display.

So what happens if the eye is tracking a moving virtual object that’s exactly one pixel in size from left to right? (Assume that the image lands squarely on a pixel center each frame, so we can limit this discussion to the case of exactly one pixel being lit per frame.) The color components of each pixel will then each line up differently with the eye, as you can see in the figure below, and color fringes will appear. (This figure also contains everything you need in order to understand judder, but I’ll save that discussion for the next post.)

Remember, the x position is relative to the eye, not the real world.

For a given frame, the red component of the pixel gets drawn in the correct location – that is, to the right pixel – at the start of the frame (assuming either no latency or perfect prediction). However, the red component remains in the same location on the display and is the same color for one-third of the frame; in an ideal world, the pixel would move continuously at the same speed as the image is supposed to be moving, but of course it can’t go anywhere until the next frame. Meanwhile, the eye continues to move along the path the image is supposed to be following, so the pixel slides backward relative to eye, as you can see in the figure above. After a one-third of the frame, the green component replaces the red component, falling farther behind the correct location, and finally the blue component slides even farther for the final one-third of the frame. At the start of the next frame, the red component is again drawn at the correct pixel (a different one, because the image is moving across the display), so the image snaps back to the right position, and again starts to slide. Because each pixel component is drawn at a different location relative to the eye, the colors are not properly superimposed, and don’t blend together correctly.

Here’s how color fringing would look for eye movement from left to right – color fringes appear at the left and right sides of the image, due to the movement of the eye relative to the display between the times the red, green, and blue components are illuminated.

It might be hard to believe that color fringes can be large enough to really matter, when a whole 60Hz frame takes only 16.6 ms. However, if you turn your head at a leisurely speed, that’s about 100 degrees/second, believe it or not; in fact, you can easily turn at several hundred degrees/second. (And remember, you can do that and see clearly the whole time if you’re fixating, thanks to VOR.) At just 60 degrees/second, one 16.6ms frame is a full degree; at 120 degrees/second, one frame is two degrees. That doesn’t sound like a lot, but one or two degrees can easily be dozens of pixels – if such a thing as a head-mounted display that approached the eye’s resolution existed, two degrees would be well over 100 pixels – and having rainbows that large around everything reduces image quality greatly.

Color-sequential displays in projectors and TVs don’t suffer to any significant extent from color fringing because there’s no rapid relative motion between the eye and the display involved, for the two reasons mentioned earlier: because projectors and TVs have limited fields of view, and because they don’t move with the head and thus aren’t subject to the high relative eye velocities associated with VOR. Not so for HMDs; color-sequential displays should be avoided like the plague in HMDs intended for AR or VR use.

Necessary but not sufficient

There are two important conclusions to be drawn from the discussion to this point. The first is that it should now be clear that relative motion between the eye and a head-mounted display can produce serious artifacts, and what the basic mechanism underlying that is. The second is that a specific artifact, color fringing, is a natural by-product of color-sequential displays, and that as a result AR/VR displays need to illuminate all three color components simultaneously, or at least nearly so.

Illuminating all three color components simultaneously is, alas, necessary but not sufficient. Doing so will eliminate color fringing, but it won’t do anything about judder, so that’s the layer we’ll peel off the perceptual onion next time.

Slides from my Game Developers Conference talk

Thursday I gave a 25-minute talk at Game Developers Conference about virtual reality; you can download the slides here. Afterward, I got to meet some regular readers of this blog, which was a blast – there were lots of good questions and observations.

Much of the ground I covered will be familiar those of you who have followed these posts over the last year, but I did discuss some areas, particularly color fringing and judder, that I haven’t talked about here yet, although I do plan to post about them soon in more detail than I could go into during a short talk.

Putting together the talk made me realize how many challenging problems have to be solved in order to get VR and AR to work well, and how long it’ll to take to get all of those areas truly right; it’s going to be an interesting decade or two. At least I don’t have to worry about running out of stuff to talk about here for a long time!

Update: Here’s a PDF version of the slides. Unfortunately, I don’t know of any way to get the videos and animations to work in this version, so if you want to see those, you’ll have to use the Powerpoint viewer.

Game Developers Conference and space-time diagrams

Next week, I’ll be giving a half-hour talk at Game Developers Conference titled Why Virtual Reality Is Hard (And Where It Might Be Going). That talk will use a number of diagrams of a sort that, while not complicated, might require a little study to fully grasp, so I’m going to explain here how those diagrams work, in the hopes that at least some of the attendees will have read this post before the talk, and will therefore be positioned to follow the talk more easily. The diagrams are generally useful for talking about some of the unique perceptual aspects of head mounted VR and AR, and I will use them in future posts as well.

The diagrams are used in the literature on visual perception, and are called space-time diagrams, since they plot one spatial dimension against time. Here is the space-time diagram for an object that’s not moving in the spatial dimension (x) over time:

You can think of space-time diagrams as if you’re looking down from above an object, with movement right and left representing movement right and left relative to the eyes. However, instead of the vertical axis representing spatial movement toward and away from the eyes, it represents time. In the above case, the plot is a vertical line because the point isn’t moving in space over time. An example of this in the real world would be looking at a particular key on your keyboard – assuming your keyboard is staying in one place, of course.

Here’s the space-time diagram for an object that’s moving at a constant speed from left to right, while the eyes remain fixated straight ahead (that is, not tracking the object):

It’s important to understand that x position in these diagrams is relative to the position and orientation of the eyes, not the real world, because it’s the frame of reference of the eyes that matters for perception. It may not be entirely clear what that means right now, but I’ll return to this shortly.

Here’s a sample of what the viewer might see during the above space-time plot (each figure is at a successively later time):





A real world example of this would be tracking a light on the side of a train that’s passing by from left to right at a constant speed.

Before looking at the next figure, take a moment and try to figure out what the space-time diagram would be for a car that drives by from right to left at a constant speed, then hits a concrete wall head-on, while the eyes are fixated straight ahead.

Ready? Here it is:

These diagrams change in interesting ways when the viewer is looking at a display, rather than the real world. Pixels update only once a frame, remaining lit for part or all of that frame, rather than changing continuously the way a real-world object would. That means that if a light on the side of a virtual train moves past from left to right on a full-persistence display (that is, one where the pixels remain illuminated for the entire frame time) while the eyes are fixated straight ahead, the space-time diagram would look like this, rather than the diagonal line above, as the train’s position updated once per frame:

The above diagram has implications all by itself, but things get much more interesting if the eyes track the moving virtual object:

Remember that the spatial dimension is relative to the eyes, not to the real world; the x axis is perpendicular to a line coming out of the pupil at all times, so if the eyes move relative to the world over time, the x axis reflects that changing position. You can see the effect of this in the above diagram, where even though the virtual object is being drawn so that it appears to the viewer to move relative to the real world, it’s staying in the same position relative to the eyes (because the eyes are tracking it), except for the effects of full persistence.

These temporal sampling effects occur on all types of displays, but are particularly important for head-mounted displays in that they creates major new artifacts, unique to VR and AR, that in my opinion have to be solved before VR and AR can truly be great. My talk will be about why this is so, and I hope you’ll be there. If not, don’t worry – I’m sure I’ll get around to posting about it before too long.