Last time, we started to look at the ways in which the interaction of a head-mounted display with the eye and the brain leads to perceptual artifacts that are unique to HMDs and that can greatly affect VR/AR experiences. We looked closely at one of those artifacts, whereby use of a color-sequential display in an HMD leads to color fringing. I chose to start the discussion of perceptual artifacts with color fringing not because it was the most problematic artifact, but rather because the temporal separation of the color components makes it easy to visualize the effects of relative motion between the eye and the display. In point of fact, color fringing can easily be eliminated by using a display, such as LCD, OLED, color-filter LCOS, or scanning laser, that illuminates all three color components simultaneously. (I hope HMD manufacturers are reading this, because many of them are still using color-sequential LCOS.) However, the next artifact we’re going to look at, judder, is not so easily fixed.
Judder, as it relates to displayed images, has no single clear definition; it’s used by cinematographers in a variety of ways. I’m going to use the term here to refer to a combination of smearing and strobing that’s especially pronounced on VR/AR HMDs; why that’s so is the topic of today’s post.
The place to start with judder is with the same rule we started with last time: visual perception is a function of when and where photons land on the retina. When it comes to HMDs, this rule is much less straightforward than it seems, due to eye motion relative to the display in conjunction with the temporal and spatial quantization performed by displays; we saw two examples of that last time, and judder will be yet a third example. By “temporal and spatial quantization,” I mean that any given pixel is illuminated for some period of time over the course of each frame, and during that time its color remains constant within the pixel bounds; that’s a simplification, but it’s close enough for our purposes.
When we looked at color fringing, the key was that each color component of any given pixel was illuminated at a different time, so when the eye was moving relative to the display, each color component landed in a different place on the retina. With judder, the key is that the illuminated area of each pixel sweeps a constant color across the retina for however long it’s lit (the persistence time), resulting in a smear; this is then followed by a jump that causes strobing – that is, the perception of multiple simultaneous copies of the image. (It’s not intuitively obvious why this would cause strobing, but it should be clear by the end of this post.) The net result is loss of detail, and quite likely eye fatigue or even increased motion sickness. Let’s look at how this happens in more detail.
If you haven’t done so already, I strongly recommend you read the last post before continuing on.
In this post, we’re going to look at many of the same mechanisms as last time, but with a different artifact in mind. I’ll repeat some of the discussion from last time to lay the groundwork, but we’ll end up in quite a different place (although everything I’ll talk about was implicit in the last post’s color-fringing diagrams).
Once again, let’s look at a few space-time diagrams. These diagrams plot x position relative to the eye on the horizontal axis, and time advancing down the vertical axis.
First, here’s a real-world object staying in the same position relative to the eye. (This should be familiar, because it’s repeated from the last post).
I’ll emphasize, because it’s important for understanding later diagrams, that the x axis is horizontal position relative to the eye, not horizontal position in the real world. With respect to perception it’s eye-relative position that matters, because that’s what affects how photons land on the retina. So the figure above could represent a situation in which both the eye and the object are not moving, but it could just as well represent a situation in which the object is moving and the eye is tracking it.
The figure would look the same for the case where both a virtual, rather than real, object and the eye are not moving relative to one another, unless the color of the object was changing. In that case, a real-world object could change color smoothly, while a virtual object could only change color once per frame. However, the figure would not look the same for the case where a virtual object is moving and the eye is tracking it; in fact, that case goes to the heart of what this post is about, and we’ll discuss it shortly.
Next, let’s look at a case where the object is moving relative to the eye. (Again, this is repeated from the last post.) Here a real-world object is moving from left to right at a constant velocity relative to the eye. The most common case of this would be where the eye is fixated on something else, while the object moves through space from left to right.
In contrast, here’s the case where a virtual object is moving from left to right relative to the eye. Throughout today’s post, I’m going to assume the display is one that displays all three color components simultaneously; that means that in contrast to the similar diagram from the last post, the pixel color is constant throughout each frame, rather than consisting of sequential red, green, and blue.
Because each pixel can update only once a frame and remains lit for the persistence time, the image is quantized to pixel locations spatially and to persistence times temporally, resulting in stepped rather than continuous motion. In the case shown above, that wouldn’t produce judder, although it would generally produce strobing at normal refresh rates if the virtual object contained high spatial frequencies.
Note that in these figures, unless otherwise noted, persistence time – the time each pixel remains lit – is the same as the frame time – that is, these are full-persistence displays.
So far, so good, but neither of the above cases involves motion of the eye relative to the display, and it’s specifically that motion that causes judder. As explained last time, the eye can move relative to the display, while still being able to see clearly, either because it’s tracking a moving virtual object or because it’s fixated on a static virtual or real object via VOR while the head turns. (I say “see clearly” because the eye can also move relative to the display by saccading, but in that case it can’t see clearly, although, contrary to popular belief, it does still acquire and use visual information.) The VOR case is particularly interesting, because, as discussed in the last post, it can involve very high relative velocities (hundreds of degrees per second) between the eye and the display, and consequently very long smears.
Here’s the relative-motion case.
Once again, remember that the x axis is horizontal motion relative to the eye. If the display had an infinite refresh rate, the plot would be a vertical line, just like the first space-time diagram above. Given actual refresh rates, however, what happens is that a given virtual object lights up the correct pixels for its virtual position at the start of the frame (assuming either no latency or perfect prediction), and then, because those pixels remain unchanged both in color and in position on the display over the full persistence time and because the eye is moving relative to the display, the pixels slide over the retina for the duration of the frame, falling behind the correct location for the moving virtual object. At the start of the next frame, the virtual object is again redrawn at the proper location for that time, lighting up a different set of pixels on the screen, so the image snaps back to the right position in virtual space, and the pixels then immediately start to slide again.
It’s hard to film judder of exactly the sort defined above, but this video shows a very similar mechanism in slow motion. Judder as I’ve discussed it involves relative motion between the eye and the display. In the video, in contrast, the camera is rigidly attached to the display, and they pan together across a wall that contains several markers used for optical tracking. The display pose is tracked, and a virtual image is superimposed on each marker; the real-world markers are dimly visible as patterns of black-and-white squares through the virtual images. The video was shot through an HMD at 300 frames per second, and is played back at one-fifth speed, making it easy to see the relationship between the virtual and real images. You can see that because the virtual images are only updated once per displayed frame, they slide relative to the markers – they move ahead of the markers, because they stay in the same place on the display, and the display is moving – for a full displayed frame time (five camera frames), then jump back to the correct position.
This phenomenon is not exactly what happens with the HMD judder I’ve been talking about – the images are moving relative to the camera, rather than having the camera tracking them – but it does clearly illustrate how the temporal quantization of displayed pixels causes images to slide from the correct position over the course of a frame. I strongly recommend that you play a little of the video one frame at a time, so you can see that what actually happens is that the virtual image stays in the same position on the screen for five camera frames, while the physical marker moves across the screen continuously due to motion of the HMD/camera. If you substituted your eye for the camera and looked straight ahead, as the camera did, you would only see strobing of the virtual images, not smearing, as the virtual images jumped from one displayed frame to the next. However, if instead you moved the HMD as in the video but at the same time moved your eye to keep it fixated on either the physical or virtual marker, you would in fact see exactly the form of judder showed in the last diagram; you should be able to directly map that scenario to the last diagram. In particular, the images would smear.
You might reasonably wonder how bad the smear can be, given that frame times are measured in milliseconds. The answer is: worse than you probably think.
When you turn your head at a leisurely speed, that’s in the neighborhood of 100 degrees per second. Suppose you turn your head at 120 degrees per second, while wearing a 60 Hz HMD; that’s two degrees per displayed frame. Two degrees doesn’t sound like much, but on an Oculus Rift development kit it’s about 14 pixels, and if an HMD existed that had a resolution approximating the resolving capability of the human eye, a two-degree arc across it would cross hundreds of pixels. So the smear part of judder is very noticeable. Since I have no way to show it to you directly, let’s look at a simulation of it.
Here’s a rendered scene:
And here’s what it looks like after the image is smeared across two degrees:
Clearly, smearing can have a huge impact on detail and sharpness.
In contrast, this video shows how smooth the visuals are when a high-speed camera is panned across a monitor. (The video quality is not very good, but it’s good enough so that you can see how stable the displayed images are compared to the shifting and jumping in the first video.) The difference is that in the first video, tracking was used to try to keep a virtual image on a see-through HMD in the right place relative to the real world as the camera moved, with the pixels on the HMD moving relative to the real world over the course of each frame; in the second video, the image was physically displayed on a real-world object (a monitor), so each pixel remained in a fixed position in the real world at all times. This neatly illustrates the underlying reason VR/AR HMDs differ markedly from other types of displays – virtual images on HMDs have to be drawn to register correctly with the real world, rather than simply being drawn in a fixed location in the real world.
Besides smear, the other effect you can see in the first video is that the images snap back to the right location at the start of each frame, as shown in the last space-time diagram. Again, the location and timing of photons on the retina is key. If an image moves more than about five or ten arc-minutes between successive updates, it can start to strobe; that is, you may see multiple simultaneous copies of the image. At a high enough head-turn speed, the image will move farther than this threshold when it snaps back to the correct location at the start of each frame (and even a very slow 10 degrees per second head turn can be enough for images containing high frequencies), so judder can feature strobing in addition to smearing.
It’s worth noting that this effect is reduced because intensity lessens toward both ends of the smear for features that are more than one pixel wide. The reason is very straightforward: the edges of such smears are covered by the generating feature for only part of the persistence time. However, that’s a mixed blessing; the eye perceives flicker more readily at lower intensities, so the edges of such objects may flicker (an on/off effect), rather than strobe (a multiple-replicas effect).
Also, you might wonder why juddering virtual objects would strobe, rather than appearing as stable smeared images. One key factor is that any variation in latency, error in prediction, or inaccuracy in tracking will result in edges landing at slightly varying locations on the retina, which can produce strobing. Another reason may be that the eye’s temporal summation period doesn’t exactly match the persistence time. For illustrative purposes only, suppose that the persistence time is 10 ms, and the eye’s temporal integration period is 5 ms (a number I just made up for this example). Then the eye will detect a virtual edge not once but twice per frame, and if the eye is moving rapidly relative to the display, those two detections will be far enough apart so that two images will be perceived; in other words, the edge will strobe. (In actuality, the eye’s integration window depends on a number of factors, and does not take a discrete snapshot.) Note, however, that this is only a theory at this point. In any case, the fact is that the eye does perceive strobing as part of judder.
The net effect of smearing and strobing combined is much like a choppy motion blur. At a minimum, image quality is reduced due to the loss of detail from smearing. Strobing tends not to be very visible on full-persistence displays – smearing mostly hides it, and it’s less prominent for images that don’t have high spatial frequencies – but it’s possible that both strobing and smearing contribute to eye fatigue and/or motion sickness, because both seem likely to interfere with the eye’s motion detection mechanisms. The latter point is speculative at this juncture, and involves deep perceptual mechanisms, but I’ll discuss it down the road if it turns out to be valid.
Slow LCD switching times, like those in the Rift development kit HMDs, result in per-frame pixel updates that are quite different from the near-instantaneous modification of the pixel state that you’d see with OLEDs or scanning lasers; with LCD panels, pixel updates follow a ramped curve. This produces blurring that exaggerates smearing, making it longer and smoother, and masks strobing. While that does mostly solve the strobing problem, it is not exactly a win, because the loss of detail is even greater than what would result from full-persistence, rapid-pixel-switching judder alone.
I mentioned in the last post that HMDs are very different from other types of displays, and one aspect of that is that judder is a more serious problem for HMDs. Why isn’t judder a major problem for movies, TVs, and computer displays?
Actually, judder is a significant problem for TV and movies, or at least it would be except that cinematographers go to great lengths to avoid it. For example, you will rarely see a rapid pan in a movie, and when you do, you won’t be able to see much of anything other than blur indicating the direction of motion. Dramatic TV filming follows much the same rules as movies. Sports on TV can show judder, and that’s a motivating factor behind higher refresh rates for TVs. And you can see judder on a computer simply by grabbing a window and tracking an edge carefully while dragging it rapidly back and forth (although your results will vary depending on the operating system, graphics hardware, and whether the desktop manager waits for vsync or not). It’s even easier to see judder by going to the contacts list on your phone and tracking your finger as you scroll the list up and down; the text will become blurry and choppy. Better yet, hold your finger on the list and move the phone up and down while your finger stays fixed in space. The list will become very blurry indeed – try to read it. And you can see judder in video games when you track a rapidly moving object, but those tend to appear in the heat of battle, when you have a lot of other things to think about. (Interestingly, judder in video games is much worse now, with LCD monitors, than it was on CRTs; the key reason for this is persistence time, although slow LCD switching times don’t help.)
However, while judder is potentially an issue for all displays, there are two important differences that make it worse for HMDs, as I mentioned last time: first, the FOV in an HMD is much wider, so objects can be tracked for longer, and second, you can turn your head much more rapidly than you can normally track moving objects without saccading, yet still see clearly, thanks to the counter-rotation VOR provides. These two factors make judder much more evident on an HMD. A third reason is that virtual images on a monitor appear to be on a surface in the world, in contrast to virtual images on an HMD, which appear to be directly in the world; this causes the perceptual system to have higher expectations for HMD images and to more readily detect deviations from what we’re used to when looking at the real world.
Judder isn’t a showstopper, but it does degrade VR/AR visual quality considerably. I’ve looked through a prototype HMD that has no judder, and the image stayed astonishingly sharp and clear as I moved my head. Moreover, increased pixel density is highly desirable for VR/AR, but the effects of judder get worse the higher the pixel density is, because the smears get longer relative to pixel size, causing more detail to be lost. So is there a way to reduce or eliminate judder?
As it happens, there is – in fact, there are two of them: higher refresh rate and low persistence. However, you will not be surprised to learn that there are complications, and it will take some time to explain them, so the next part of the discussion will have to wait until the next post.
By this point, you should be developing a strong sense of why it’s so hard to convince the eye and brain that virtual images are real. Next time we’ll see that the perceptual rabbit hole goes deeper still.