Back in the spring of 1986, Dan Illowsky and I were up against the deadline for an article that we were writing for PC Tech Journal. The name of the article might have been “Software Sprites,” but I’m not sure, since it’s one of the few things I’ve written that seems not to have made it to the Internet. In any case, I believe the article showed two or three different ways of doing software animation on the very simple graphics hardware of the time. With the deadline looming, both the article and the sample code that would accompany it were written, but one part of the code just wouldn’t work right.
As best I can remember, the problematic sample moved two animated helicopters and a balloon around the screen. All the drawing was done immediately after vsync; the point was to show that since nothing was being scanned out to the display at that time (vsync happens in the middle of the vertical blanking interval), the contents of the frame buffer could be modified with no visible artifacts. The problem was that when an animated object got high enough on the screen, it would start vanishing – oddly enough, from the bottom up – and more and more of the object would vanish as it rose until it was completely gone. Stranger still, the altitude at which this happened varied from object to object. We had no idea why that was happening – and the clock was ticking.
I’m happy to report that we did solve the mystery before the deadline. The problem was that back in those days of dog-slow 8088s and slightly faster 80286s, the display was scanning out pixels before the code had finished updating them. And if that explanation doesn’t make much sense to you at the moment, it should all be clear by the end of today’s post, which covers some decidedly non-intuitive consequences of an interesting aspect of the discussion of latency in the last post – the potentially problematic AR/VR implications of a raster scan display, and the way that racing the beam interacts with the raster scan to address those problems.
Raster scanning is the process of displaying an image by updating each pixel one after the other, rather than all at the same time, with all the pixels on the display updated over the course of one frame. Typically this is done by scanning each row of pixels from left to right, and scanning rows from top to bottom, so the rightmost pixel on each scan line is updated a few microseconds after the leftmost pixel, and the bottommost row on the screen is updated a few milliseconds (roughly 15 ms for 60 Hz refresh – less than 16.7 ms because of vertical blanking time) after the topmost row. Figure 1 shows the order in which pixels are updated on an illustrative if not particularly realistic 8×4 raster scan display.
Originally, the raster scan pattern directly reflected the way the electron beam in a CRT moved to update the phosphors. There’s no longer an electron beam on most modern displays; now the raster scan reflects the order in which pixel data is scanned out of the graphics adapter and into the display. There’s no reason that the scan-in has to proceed in that particular order, but on most devices that’s what it does, although there are variants like scanning columns rather than rows, scanning each pair of lines in opposite directions, or scanning from the bottom up. If you could see events that happen on a scale of milliseconds (and, as we’ll see shortly, under certain circumstances you can), you would see pixel updates crawling across the screen in raster scan order, from left to right and top to bottom.
It’s necessary that pixel data be scanned into the display in some time-sequential pattern, because the video link (HDMI, for example) transmits pixel data in a stream. However, it’s not required that these changes become visible over time. It would be quite possible to scan in a full frame to, say, an LCD panel while it was dark, wait until all the pixel data has been transferred, and then illuminate all the pixels at once with a short, bright light, so all the pixel updates become visible simultaneously. I’ll refer to this as global display, and, in fact, it’s how some LCOS, DLP, and LCD panels work. However, in the last post I talked about reducing latency by racing the beam, and I want to follow up by discussing the interaction of that with raster scanning in this post. There’s no point to racing the beam unless each pixel updates on the display as soon as the raster scan changes it; that means that global display, which doesn’t update any pixel’s displayed value until all the pixels in the frame have been scanned in, precludes racing the beam.
So for the purposes of today’s discussion, I’ll assume we’re working with a display that updates each pixel on the screen as soon as the scanned-in pixel data provides a new value for it; I’ll refer to this as rolling display. I’ll also assume we’re working with zero persistence pixels – that is, pixels that illuminate very brightly for a very short period after being updated, then remain dark for the remainder of the frame. This eliminates the need to consider the positions and times of both the first and last photons emitted, and thus we can ignore smearing due to eye movement relative to the display. Few displays actually have zero persistence or anything close to it, although scanning lasers do, but it will make it easier to understand the basic principles if we make this simplifying assumption.
To recap, racing the beam is when rendering proceeds down the frame just a little ahead of the raster, so that pixels appear on the screen shortly after they’re drawn. Typically this would be done by rendering the scene in horizontal strips of perhaps a few dozen lines each, using the latest reading from the tracking system to position each strip for the current HMD pose just before rendering it.
This is an effective latency-reducing technique, but it’s hard to implement, because it’s very timing-dependent. There’s no guarantee as to how long a given strip will take to render, so there’s a delicate balance involved in leaving enough padding so the raster won’t overtake rendering, while still getting close enough to the raster to reap significant latency reduction. As discussed in the last post, there are some interesting ways to try to address that balance, such as rendering the whole frame, then warping each strip based on the latest position data. In any case, racing the beam is capable of reducing display latency purely in software, and that’s a rare thing, so it’s worth looking into more deeply. However, before we can even think about racing the beam, we need to understand some non-intuitive implications of rolling display, which, as explained above, is required in order for racing the beam to provide any benefit.
So let’s look at a few scenarios. If you’re wearing an HMD with a 60 Hz rolling display, and rendering each frame in its entirety, waiting for vsync, and then scanning the frame out to the display in the normal fashion (with no racing the beam involved at this point), what do you think you’d see in each of the following scenarios? (Hint: think about what you’d see in a single frame for each scenario, and then just repeat that.)
Scenario 1: Head is not moving; eyes are fixated on a vertical line that extends from the top to the bottom of the display, as shown in Figure 2; the vertical line is not moving on the display.
Scenario 2: Head is not moving; the vertical line in Figure 2 is moving left to right on the display at 60 degrees/second; eyes are tracking the line.
Scenario 3: Head is not moving; the vertical line in Figure 2 is moving left to right relative to the display at 60 degrees/second across the center of the screen; eyes are fixated on the center of the screen, and are not tracking the line.
Scenario 4: Head is rotating left to right at 60 degrees/second; the vertical line in Figure 2 is moving right to left on the display at 60 degrees/second, compensating for the head motion so that to the eye the image appears to stay in the same place in the real world; eyes are counter-rotating, tracking the line.
Take a second to think through each of these and write down what you think you’d see. Bear in mind that raster scanning is not how anything works in nature; the pixels in a raster image are updated at differing times, and in the case of zero persistence aren’t even on at the same time. Frankly, it’s a miracle that raster images look like anything coherent at all to us; the fact that they do has to do with the way our visual system collects photons and makes inferences from that data, and at some point I hope to talk about that a little, because it’s fascinating (and far from fully understood).
Here are the answers, as shown in Figure 3, below:
Scenario 1: an unmoving vertical line.
Scenario 2: a line moving left to right, slanted to the right by about one degree from top to bottom. (The slant is exaggerated in Figure 3 to make it more visible; in an HMD, a one-degree slant is much more visible, for reasons I’ll discuss a little later.)
Scenario 3: a vertical line moving left to right.
Scenario 4: a line staying in the same place relative to the real world (although moving right to left on the display, compensating for the display movement from left to right), slanted to the left by about one degree from top to bottom.
How did you do? If you didn’t get all four, don’t feel bad; as I said at the outset, this is not intuitive – which is what makes it so interesting.
In a moment, I’ll explain these results in detail, but here’s the underlying rule for understanding what happens in such situations: your perception will be based on whatever pattern is actually produced on your retina by the photons emitted by the image. That may sound obvious, and in the real world it is, but with an HMD, the time-dependent sequence of pixel illumination makes it anything but.
Given that rule, we get a vertical line in scenario 1 because nothing is moving, so the image registers on the retina exactly as it’s displayed.
Things get more complicated with scenario 2. Here, the eye is smoothly tracking the image, so it’s moving to the right at 60 degrees/second relative to the display. (Note that 60 degrees/second is a little fast for smooth pursuit without saccades, but the math works out neatly on a 60 Hz display, so we’ll go with that.) The topmost pixel in the vertical line is displayed at the start of the frame, and lands at some location on the retina. Then the eye continues moving to the right, and the raster continues scanning down. By the time the raster reaches the last scan line and draws the bottommost pixel of the line, it’s something on the order of 15 ms later, and here we come to the crux of the matter – the eye has moved about one degree to the right since the topmost pixel was drawn. (Note that the eye will move smoothly in tracking the line, even though the line is actually drawn as a set of discrete 60 Hz samples.)
That means that the bottommost pixel will land on the retina about one degree to the right of the topmost pixel, which, due to the way images are formed on the retina and then flipped, will cause the viewer to perceive it to be one degree to the left of the topmost pixel. The same is true of all the pixels in the vertical line, in direct proportion to how much later they’re drawn relative to the topmost pixel. The pixels of the vertical line land on the retina slanted by one degree, so we see a line that’s similarly slanted, as shown in Figure 4 for an illustrative 4×4, 60 Hz display.
Note that for clarity, Figure 4 omits the retinal image flipping step and just incorporates its effects into the final result. The slanted pixels are shown at the locations where they’d be perceived; the pixels would actually land on the retina offset in the opposite direction, and reversed vertically as well, due to image inversion, but it’s the perceived locations that matter.
If it’s that easy to produce this effect, you may well ask: Why can’t I see it on a monitor? The answer depends on whether the monitor waits for vsync; that is, whether the entire rendered frame is scanned out to the display only once per displayed frame (i.e., at the refresh rate), or scanned out to the display as fast as frames can be drawn (so multiple rendered frames affect a single displayed frame, each in its own horizontal strip – a form of racing the beam).
In the case where vsync isn’t waited for, you won’t see lines slant for reasons that may already be obvious to you – because each horizontal strip is drawn at the right location based on the most recent position data; we’ll return to this later. However, in this case it’s easy to see the problem with not waiting for vsync as well. If vsync is off on your monitor, grab a screen-height window that has a high-contrast border and drag it rapidly left to right, then back right to left, and you’ll see that the vertical edge breaks up into segments. The segments are separated by the scan lines where the copy to the screen overtook the raster. If you move the window to the left and don’t track it with your eyes, the lower segments will be to the left of the segments above them, because as soon as the copy overtakes the raster (this assumes that the copy is faster than the raster update, which is very likely to be the case), the raster starts displaying the new pixels, which represent the most up-to-date window position as it moves to the left. This segmentation is called tearing, and is a highly visible artifact that needs to be carefully smoothed over for any HMD racing-the-beam approach.
In contrast, if vsync is waited for, there will be no tearing, but the slanting described above will be visible. If your monitor waits for vsync, grab a screen-height window and drag it back and forth, tracking it with your eyes, and you will see that the vertical edges do in fact tilt as advertised; it’s subtle, because it’s only about a degree and because the pixels smear due to long persistence, but it’s there.
In either case, the artifacts are far more visible for AR/VR in an HMD, because objects that dynamically warp and deform destroy the illusion of reality; in AR in particular, it’s very apparent when artifacts mis-register against the real world. Another factor is that in an HMD, your eyes can counter-rotate and maintain fixation while you turn your head (via the combination of the vestibulo-ocular reflex, or VOR, and the optokinetic response, or OKR), and that makes possible relative speeds of rotation between the eye and the display that are many times higher than the speeds at which you can track a moving object (via smooth pursuit) while holding your head still, resulting in proportionally greater slanting.
By the way, although it’s not exactly the same phenomenon, you can see something similar – and more pronounced – on your cellphone. Put it in back-facing camera mode, point it at a vertical feature such as a door frame, and record a video while moving it smoothly back and forth. Then play the video back while holding the camera still. You will see the vertical feature tilt sharply, or at least that’s what I see on my iPhone. This differs from scenario 4 because it involves a rolling shutter camera (if you don’t see any tilting, either you need to rotate your camera 90 degrees to align with the camera scan direction – I had to hold my iPhone with the long dimension horizontal – or your camera has a global shutter), but the basic principles of the interaction of photons and motion over time are the same, just based on sampling incoming photons in this case rather than displaying outgoing ones. (Note that it is risky to try to draw rolling display conclusions relevant to HMDs from experiments with phone cameras because of the involvement of rolling shutter cameras, because the frame rates and scanning directions of the cameras and displays may differ, and because neither the camera nor the display is attached to your head.)
Scenario 3 results in a vertical line for the same reason as scenario 1. True, the line is moving between frames, but during a frame it’s drawn as a vertical line on the display. Since the eye isn’t moving relative to the display, that image ends up on the retina exactly as it’s displayed. (A bit of foreshadowing for some future post: the image for the next frame will also be vertical, but will be at some other location on the retina, with the separation depending on the velocity of motion – and that separation can cause its own artifacts.)
It may not initially seem like it, but scenario 4 is the same as scenario 2, just in the other direction. I’ll leave this one as an exercise for the reader, with the hint that the key is the motion of the eye relative to the display.
Rolling displays can produce vertical effects as well, and they can actually be considerably more dramatic than the horizontal ones. As an extreme but illustrative example (you’d probably injure yourself if you actually tried to move your head at the required speed), take a moment and try to figure out what would happen if you rotated your head upward over the course of a frame at exactly the same speed that the raster scanned down the display, while fixating on a point in the real world.
The answer is that the entire frame would collapse into a single horizontal line, because every scan line will land in exactly the same place on the retina. Less rapid motion will result in vertical compression of the image. Vertical motion in the same direction as the raster scan will similarly result in vertical expansion. Either case can cause either intra- or inter-frame brightness variation.
None of this is hypothetical, nor is it a subtle effect. I’ve looked at cubes in an HMD that contort as if they’re made of Jell-O, leaning this way and that, compressing and expanding as I move my head around. It’s hard to miss.
In sum, rolling display of a rendered frame produces noticeable shear, compression, expansion, and brightness artifacts that make both AR and VR less solid and hence less convincing; the resulting distortion may also contribute to simulator sickness. What’s to be done? Here we finally return to racing the beam, which updates the position of each scan line or block of scan lines just before rendering, which in turn occurs just before scan-out and display, thereby compensating for intra-frame motion and placing pixels where they should be on the retina. (Here I’m taking “racing the beam” to include the whole family of warping and reconstruction approaches that were mentioned in the last post and the comments on the post.) In scenario 4, HMD tracking data would cause each scan line or horizontal strip of scan lines to be drawn slightly to the left of the one above, which would cause the pixels of the image to line up in proper vertical arrangement on the retina. (Another approach would be the use of a global display; that comes with its own set of issues, not least the inability to reduce latency by racing the beam, which I hope to talk about at some point.)
So it appears that racing the beam, for all its complications, is a great solution not only to display latency but also to rolling display artifacts – in fact, it seems to be required in order to address those artifacts – and that might well be the case. But I’ll leave you with a few thoughts (for which the bulk of the credit goes to Atman Binstock and Aaron Nicholls, who have been diving into AR/VR perceptual issues at Valve):
1) The combination of racing the beam and compensating for head motion can fix scenario 4, but that scenario is a specific case of a general problem; head-tracking data isn’t sufficient to allow racing the beam to fix the rolling display artifacts in scenario 2. Remember, it’s the motion of the eye relative to the display, not the motion of the head, that’s key.
2) It’s possible, when racing the beam, to inadvertently repeat or omit horizontal strips of the scene, in addition to the previously mentioned brightness variations. (In the vertical rotation example above, where all the scan lines collapse into a single horizontal line, think about what each scan line would draw.)
3) Getting rid of rolling display artifacts while maintaining proper AR registration with the real world for moving objects is quite challenging – and maybe even impossible.
These issues are key, and I’ll return to them at some point, but I think we’ve covered enough ground for one post.
Finally, in case you still aren’t sure why the sprites in the opening story vanished from the bottom up, it was because both the raster and the sprite rendering were scanning downward, with the raster going faster. Until it caught up to the current rendering location, the raster scanned out pixels that had already been rendered; once it passed the current rendering location, it scanned out background pixels, because the foreground image hadn’t yet been drawn to those pixels. Different images started to vanish at different altitudes because the images were drawn at different times, one after the other, and vanishing was a function of the raster reaching the scan lines the image was being drawn to as it was being drawn, or, in the case of vanishing completely, before it was drawn. Since the raster scans at a fixed speed, images that were drawn sooner would be able to get higher before vanishing, because the raster would still be near the top of the screen when they were drawn. By the time the last image was drawn, the raster would have advanced far down the screen, and the image would start to vanish at that much lower level.