Down the VR rabbit hole: Fixing judder

Over the years, I’ve had the good fortune to meet and work with a lot of remarkable people and do more than my share of interesting things. There are some things that still haven’t happened, though. I haven’t written a compiler. I haven’t written a 3D game from scratch on my own or figured out how to do anything interesting with cellular automata. I’ve worked with Gates and Newell and met Bentley and Akeley and Neal Stephenson, but I haven’t met Knuth or Page or Brin or Miyamoto, and now I’ll never get to meet Iain Banks.

And then there’s this: I’ve been waiting 20 years for someone to write a great book about a project I worked on. A book I’ll read and say, “Yes! That’s exactly how it was!” A book that I can pick up when I’m 80, whenever I want to remember what it was like to help build the future.

Hasn’t happened yet.

You’d think it would have by now, considering that I’ve worked on some significant stuff and appeared in no less than four books, but Tracy Kidder-class writers seem to be thin on the ground. Any of the four books could have been great – the material was certainly there – but each fell well short, for a couple of reasons.

First, there were too many significant inaccuracies and omissions for my taste. Maybe someday I’ll take the time to set the record straight, but as just one example, Laura Fryer, the indispensable, hyper-competent complement to Seamus Blackley and a person without whom the original Xbox would not have shipped successfully, simply vanished in Opening the Xbox. That’s not unusual – writers of tech history have limited space to work with and have to choose who to feature and what story they want to tell – but leaving out Laura meant leaving out a big chunk of the history of Xbox as I experienced it.

That touches on the other problem that all four books had to one degree or another: they failed to capture what it felt like to be part of an industry-changing project. That’s a real loss, because being part of a project like Windows NT or Quake is a remarkable experience, one I badly miss whenever I’m working on something more mundane or a project that doesn’t turn out as I had hoped.

Happily, I’m becoming steadily more confident that my current project, VR, is going to be one of the game-changers. That opinion was recently bolstered by the experience of wearing a relatively inexpensive prototype head-mounted display that is possibly the best VR hardware ever made, probably good enough to catapult VR into widespread usage, given the right software. Exciting times indeed, and I hope someday soon there’s a VR breakthrough into wide usage – along with a book about it that fully conveys that excitement.

Which isn’t to say that everything about VR has been figured out, not by a long shot; there’s certainly plenty left to work out with tracking, for example, not to mention input, optics, and software. And, of course, there’s always the most prominent issue, VR displays. In particular, the last two posts discussed the perceptual problems that can result from color-sequential and full-persistence displays, respectively; this post will describe how to fix the problems of full persistence, then look at the new problems that opens up.

If you haven’t done so already, I strongly recommend that you read both of the previous posts (here and here) before continuing on.

The obvious solution to judder

Last time, we saw how eye movement relative to a head-mounted display can produce a perceptual effect called judder, a mix of smearing and strobing that can significantly reduce visual quality. The straightforward way to reduce judder is to make displays more like reality, and the obvious way to do that is to increase frame rate.

Here’s the space-time diagram from last time for an image that’s being tracked by the eye on a head-mounted display, producing judder:

And here’s the diagram for a real object being tracked by the eye:

(In both cases, remember that both the eye and object/image are moving relative to the display, but they are not moving relative to each other.)

If we double the frame rate, we get this:

which is significantly closer to the diagram for the real object. Taking that to the limit, if we could make the frame rate infinite, we would get exactly the same diagram as for the real object. Unfortunately, an infinite frame rate is not an option, but somewhere between 60 Hz and infinity, there must be a frame rate that’s good enough so that the eye can’t tell the difference. The question is, what is that frame rate?

There’s no one answer to that question; it depends on the scene content, resolution, FOV, pixel fill, display type, speed of eye motion, and characteristics of the eye. I can tell you, though, that 100 Hz is nowhere near enough. 200 Hz would be a significant improvement but still not enough; the sweet spot for 1080p at 90 degrees FOV is probably somewhere between 300 and 1000 Hz, although higher frame rates would be required to hit the sweet spot at higher resolutions. A 1000 Hz display would very likely look great, and would also almost certainly reduce or eliminate a number of other HMD problems, possibly including motion sickness, because it would interact with the visual system in a way that mimics reality much more closely than existing displays. I have no way of knowing any of that for sure, though, since I’ve never seen a 1000 Hz head-mounted display myself, and don’t ever expect to.

And there’s the rub – there are no existing consumer displays capable of anywhere near the required refresh rates, and no existing consumer data links that can transfer the amount of video data that would be required. There’s no current reason to build such a display or link, and even if there were, rendering at that rate would require such a huge reduction in scene complexity that net visual quality would not be impressive – the lack of judder would be great, but 2005-level graphics would undo a lot of that advantage. (That’s not to say 2005-level graphics couldn’t be adequate for VR – after all, they were good enough for Half-Life 2 – but they would be clearly inferior to PC and console graphics; also, really good VR is going to require a lot more resolution than 1080p, and it’s a moot point anyway because there’s no prospect of consumer displays that can handle anything like 1000 Hz.)

So increased refresh rate is a perfect solution to judder and other problems – except that it’s completely impractical, at least in the near future. So it’s on to Plan B, which is not a perfect solution, but is at least feasible. Before we can discuss that, though, we need to touch on persistence.

Persistence

Judder is an artifact of persistence – that is, of the fact that during each frame pixels remain illuminated for considerable periods of time.

Full persistence is when pixels are lit for the entire frame. This is the case with many OLED and LCD displays, although it is by no means required for either technology. Here’s the space-time diagram for a full-persistence display, for the case where the eye is fixated straight ahead while a virtual image is moving relative to the eye:

Here’s half-persistence, where pixels remain lit for half a frame:

And here’s zero-persistence, where pixels are lit for only a tiny fraction of each frame, but at very high intensity to compensate for the short duration. Scanning laser displays are effectively zero-persistence.

The diagrams above are for the case where the eye is fixated while the virtual image moves. That’s not the key judder case, though; the key case is when the eye is moving relative to the display. Here’s the diagram for that on a full-persistence display again:

As the diagram illustrates, the smear part of judder results from each pixel moving across the retina during the time it’s lit, due to eye motion relative to the display. It’s actually not the fraction of a frame for which pixels remain lit that determines the extent of the smearing, it’s the absolute time for which pixels are illuminated, because that (times eye speed) is what determines how long the smears on the retina are. At 1000 Hz, full persistence is only 1 ms, short enough to eliminate judder in most cases – and while 1000 Hz isn’t practical, that observation leads us in the direction of the second, more practical solution to judder: low persistence.

Here’s the same scenario as the last diagram – the eye moving relative to the display – but with a zero-persistence display:

In this case, there’s no significant movement of the display relative to the eye while the pixel is illuminated, because the pixel is only on for a very short time. Consequently, there’s no movement of the pixel across the retina, which means that zero persistence (or, in practice, sufficiently low persistence, below roughly 2 ms, maybe less at 1080p with a 90 degree FOV) should almost completely eliminate the smear component of judder. Experimental prototypes confirm that this is the case; images on low-persistence HMDs remain sharp regardless of head and eye motion.

Fixing one VR problem generally just reveals another one, though, and low persistence is no exception.

Side effects of low persistence

In the last post, I noted that strobing – the perception of multiple copies of a virtual image – can occur when frame-to-frame locations for an image are more than very roughly 5 to 10 arc minutes apart, although whether and at what separation strobing actually occurs is heavily content-dependent. At 60 Hz, successive frames of an image will be 5 arc minutes apart if the eyes are moving at just 5 degrees/second relative to the image; 10 arc minutes is 10 degrees/second. For context, a leisurely head turn is likely to be in the ballpark of 100 degrees/second, so it is very easy for the eyes to have a high enough velocity relative to an image so that strobing results.

(As an aside, this is the other reason that very high refresh rates work so well. Not only does increasing refresh rate decrease persistence time, it also decreases inter-frame time, which in turn decreases strobing by reducing the distance images move between frames.)

Smear hides a lot of strobing in the case of judder. Without smear, previously-invisible strobing becomes an issue on low persistence displays. However, low-persistence strobing isn’t quite as serious a problem as it may at first seem, because whatever image your eye is following won’t strobe, for the simple reason that the eye is tracking it; the pixels from that image will land on the same place on the retina each frame, so there’s no frame-to-frame separation to produce strobing. (This assumes perfect tracking and consistent frame rate; tracking error or variable frame rate can result in sufficient misregistration to induce strobing.) And because that image is the center of attention, and because it lands on the high-resolution area of the eye, most of the perceptual system will be focused there, with relatively little processing power devoted to the rest of the scene, so low-persistence strobing may not be as noticeable as you might think.

For example, if you track a car moving from left to right across a scene on a low-persistence display, the car will appear very sharp and clear, with no strobing. The rest of the scene can strobe, since the eye is moving relative to those pixels. However, that may not be very noticeable, depending on refresh rate, speed of eye motion, contents of the background, and the particular eye’s characteristics. (There’s considerable person-to-person variation; personally, I’m much more sensitive to strobing than most people.) It also probably matters how absorbing the image being tracked is. If you’re following a rocket that requires a split-second response, you may not notice peripheral strobing; if you’re scanning your surroundings for threats, you’re more likely to pick up some strobing. However, I should caution that this is just hypothesis at this point; we haven’t done the tests to know for sure.

If low-persistence strobing does turn out to be a problem, the obvious solution is, once again, higher frame rate. It’s possible that low persistence combined with a higher frame rate could get away with a lower frame rate than is needed with increasing frame rate alone. Even so, the frame rate required is higher than is currently available in consumer parts, so it’s probably not a viable option in the near future. An alternative would be to render all the objects in the scene with motion blur, thereby keeping images in successive frames from being far enough apart to strobe and lowering image frequency (which increases the non-strobing separation). However, even if that works perfectly, it has several significant downsides: first, it requires extra rendering, second, it requires calculating the movement of each virtual object relative to the eye, and third, it requires eyetracking. It’s not clear whether the benefits would outweigh the costs.

Down the rabbit hole

Strobing was a fairly predictable consequence of low persistence; we knew we’d encounter it before we ever built any prototypes, because we came across this paper. (I should note, however, that strobing is not nearly as well-researched as persistence smear.) Similarly, we expected to run into issues with low-persistence motion perception, because a series of short, bright photon bursts from low-persistence virtual images won’t necessarily produce the same effects in the eye’s motion detectors as a continuous stream of photons from a real object. We expected those issues to be in areas such as motion sickness, accurate motion estimation, and reaction time. However, we’ve come across one motion artifact that is far weirder than we would have anticipated, and that seems to be based on much deeper, less well-understood mechanisms than strobing.

By way of introduction, I’ll point out that if you look at a row of thin green vertical bars on a low-persistence display and saccade to the left or right, strobing is very apparent; multiple copies of each line appear. As I mentioned above, strobing is not that well understood, but there are a couple of factors that seem likely to contribute to this phenomenon.

The first factor is the interaction of low persistence with saccadic masking. It’s a widespread belief that the eye is blind while saccading, and while the eye actually does gather a variety of information during saccades, it is true that normally no sharp images can be collected because the image of the real world smears across the retina, and that saccadic masking raises detection thresholds, keeping those smeared images from reaching our conscious awareness. However, low-persistence images can defeat saccadic masking, perhaps because saccadic masking fails when mid-saccadic images are as clear as pre- and post-saccadic images in the absence of retinal smear. At saccadic eye velocities (several hundred degrees/second), strobing is exactly what would be expected if saccadic masking fails to suppress perception of the lines flashed during the saccade.

One other factor to consider is that the eye and brain need to have a frame of reference at all times in order to interpret incoming retinal data and fit it into a model of the world. It appears that when the eye prepares to saccade, it snapshots the frame of reference it’s saccading from, and prepares a new frame of reference for the location it’s saccading to. Then, while it’s moving, it normally suppresses the perception of retinal input, so no intermediate frames of reference are needed. However, as noted above, saccadic masking can fail when a low-persistence image is perceived during a saccade. In that case, neither of the frames of reference is correct, since the eye is between the two positions. There’s evidence that the brain uses a combination of an approximated eye position signal and either the pre- or post-saccadic frame of reference, but the result is less accurate than usual, so the image is mislocalized; that is, it’s perceived to be in the wrong location.

It’s possible that both of these factors are occurring and interacting in the saccadic strobing case described above. The strobing of the vertical bars is certainly an interesting matter (at least to HMD developers!), but it seems relatively straightforward. However, the way the visual system interprets data below the conscious level has many layers, and the mechanisms described above are at a fairly low level; higher levels contain phenomena that are far stranger and harder to explain, as we learned by way of the kind of accident that would make a good story in the book of how VR gaming came to be.

Not long ago, I wrote a simple prototype two-player VR game that was set in a virtual box room. For the walls, ceiling, and floor of the room, I used factory wall textures, which were okay, but didn’t add much to the experience. Then Aaron Nicholls suggested that it would be better if the room was more Tron-like, so I changed the texture to a grid of bright, thin green lines on black, as if the players were in a cage made of a glowing green coarse mesh.

When I tried it out on the Rift, it did look better when my head wasn’t moving quickly. However, both smear and strobing were quite noticeable; strobing isn’t usually very apparent on the Rift, due to smearing, but the thin green lines were perfect for triggering strobing. I wanted to see what it looked like with no judder, so next I ran it on a low-persistence prototype. The results were unexpected.

For the most part, it looked fantastic. Both the other player and the grid on the walls were stable and clear under all conditions. Then Atman Binstock tried standing near a wall, looking down the wall into the corner it made with the adjacent wall and the floor, and shifting his gaze rapidly to look at the middle of the wall. What happened was that the whole room seemed to shift or turn by a very noticeable amount. When we mentally marked a location in the HMD and repeated the triggering action, it was clear that the room hadn’t actually moved, but everyone who tried it agreed that there was an unmistakable sense of movement, which caused a feeling that the world was unstable for a brief moment. Initially, we thought we had optics issues, but Aaron suspected persistence was the culprit, and when we went to full persistence, the instability vanished completely. In further testing, we were able to induce a similar effect in the real world via a strobe light.

This type of phenomenon has a name – visual instability – but there are multiple mechanisms involved, and the phenomenon isn’t fully understood. It’s not hard to come up with possible explanations, though. For example, it could be that mislocalization, as described above, causes a sense that the world has shifted; and if the world has shifted, there must have been motion in order to get it there, hence the perception of motion. Once the saccade stops, everything goes back to being in the right place, leaving only a disorienting sense of movement. Or perhaps the motion detectors are being stimulated directly by the images that get past saccadic masking, producing a sense of motion without any actual motion being involved.

All that sounds plausible, but it’s hard to explain why the same thing doesn’t happen with vertical lines. Apparently the visual instability effect that we identified requires enough visual data to form a 3D model of the world before it can kick in. That, in turn, implies that this effect is much higher-level than anything we’ve seen so far, and reflects sophisticated 3D processing below the conscious level, a mechanism that we have very little insight into at this point.

How could this effect be eliminated? Yet again, 1000 Hz would probably do the trick. The previously-mentioned approach of motion-blurring might work too; it all depends on whether the motion-blurred images would make it through saccadic masking, and that’s a function of what triggers saccadic masking, which is not fully understood. A final approach would be to author content to avoid high-frequency components; it’s not clear exactly what would be needed to make this work well, but it is certainly true that the visual instability effect is not very visible playing, say, Half-Life 2 on a low-persistence HMD.

It’s unclear whether the visual instability effect is a significant problem, since in our experiments it’s less pronounced or undetectable with normal game content. The same is true for any of the motion detection problems we think might be caused by low persistence; even if they exist, the eye-brain combination may be able to adapt, as it has for many aspects of displays. But such adaption may not be complete, especially below the conscious level, and that sort of partial adaption that may cause fatigue and motion sickness. And even when adaptation is complete, the process of adaptation can be unpleasant, as for example is often the case when people get new eyeglasses. It’s going to take a lot of R&D before all this is sorted out, which is one reason I say that VR is going to continue improving for decades.

In any case, the visual instability effect is an excellent example of how complicated and poorly-understood HMD visual perception currently is, and how solving one problem can uncover another. Initially, we saw color fringing resulting from temporally separated red, green, and blue subpixels. We fixed that by displaying the components simultaneously, and then found that visual quality was degraded by judder. We fixed judder by going to low persistence, and ran into the visual instability effect. And the proposed solutions to the visual instability effect that are actually feasible (as opposed to 1000 Hz or higher update rate), as well as whatever solutions are devised for any other low-persistence motion detection problems, will likely cause or uncover new problems. Fortunately, it does seem like the scale of the problems is decreasing as we get farther down the rabbit hole – although diagnosing the causes of the problems and fixing them seems to be becoming more challenging at the same time.

And with that, we come to the limits of our present knowledge in this area. I wish I could lay out chapter and verse on the issues and the solutions, but I wanted to give you a sense of just how different HMDs are from anything that’s come before. And besides, while it would be great if someday soon someone like Tracy Kidder writes the definitive book about how mass-market VR happened, past history isn’t encouraging in that respect, so I hope these last three posts have conveyed to at least some extent what it’s like to be in the middle of figuring out a whole new technology that has the potential to affect all of us for decades to come.

The short version: hard but fun, and exciting as hell.

99 Responses to Down the VR rabbit hole: Fixing judder

  1. luc says:

    Very interesting.
    Next time you’re in Chicago, come to EVL @ UIC (electronic visualization lab.) and give a talk on the topic, and we’ll show you our VR space:
    http://youtu.be/d5XDbzy7vuE
    http://youtu.be/vK74PP4kHHM

    Luc

    • MAbrash says:

      No plans at the present to be in Chicago, but I’ll keep it in mind – thanks for the invite!

      –Michael

  2. I received (begged for) a copy of Computer Graphics, Principles and Practices (2nd edition) from my mother when I was 16. I still have the hacked Powerglove from when we wired it up to Rend386. I used to write 3d normalization utilities in QBasic and on my TI-85. Basically this time in the history of consumer VR is the most exciting time ever and I can’t wait to see the kinds of things that are produced, and the kinds of things I can share with my kids as they grow up. I look forward to many more of these posts.

    • MAbrash says:

      I remember wading through that 2nd edition – so much stuff that wasn’t relevant to real-time 3D, but it was the best reference around at the time. Kids today have no idea how hard it used to be to find good information!

      –Michael

  3. Tom Forsyth says:

    I met Iain Banks once, at a book-reading in a Cambridge branch of Waterstone’s. He read us some of Feersum Endjinn (so technically he was M-endowed at the time), and the phonetic writing makes so much more sense when you understand that in his head the sloth speaks like Sean Connery. And now in my head. And also in yours. Memes!

    Then a learned English professor in the audience put his hand up and said “I think I may be in the wrong place – are you the Iain Banks who writes normal literature?” And we all had a good giggle at his expense while Mr. Banks explained that yes, he was the same chap, and not only did he “also write SF”, but that he actually enjoyed doing so just as much.

    He will be missed.

    I haven’t met Mr. Stephenson though. *cough* yet *cough*

    And one day I’ll write that book on Larrabee, and you can tell me the bits I got wrong.

    • MAbrash says:

      I’d like to read that book :)

      –Michael

      • Dean Macri says:

        When I still worked at Intel I had kept *ALL* of the e-mails related to Larrabee, even back to before it was called Larrabee. Sadly, those were left on a hard drive at Intel that probably has been wiped :-(

        Still, when/if Tom gets around to writing that book, I’ll be glad to “correct” anything he got wrong ;-)

  4. Michael Handford says:

    Amazing post, this is absolutely fascinating. I’d love to start experimenting with a custom OLED controller, using a monochrome/low res display to keep bandwidth requirements down maybe even without stereoscopy. I wonder if you guys are already doing something similar? I guess you have access to the display controller if you’re experimenting with persistence.

    An ASIC with the display controller and graphics processing integrated with a tightly coupled frame buffer/s would be a dream to play with for rendering a static scene with the tracker driving the scene rotation in hardware. I wonder if an FPGA could suffice for something low res/low bandwidth as an experiment for high frame rate. I’d love to see a transparent OLED display used to try and show an object rendered relative to reality. Achieving a convincing free floating object fixed in space might highlight any visual artifacts that could be less obvious but play a part in “VR sickness”.

    Thanks for writing up your experiments, it’s great to learn how the brain/eye combination experience your optimizations and how data bandwidths requirements might be reduced.

    Mike.

    • MAbrash says:

      Yes, we’ve done some of those experiments. Note that there are DLP systems that can do more than 1000 Hz with 1-bit monochrome, although they’re not really head-mountable.

      This stuff is fascinating indeed. We have the pleasant illusion that we see reality directly, but in fact we use some limited sensors, plus a lot of processing and assumptions, and construct what we see from that. It’s interesting and unsettling when that illusion slips – and very informative. Believe me, our visual system is more complex than you would ever expect, and certainly not well understood overall at this point.

      –Michael

      • Michael Handford says:

        I remember reading about all the mini DLP projectors that would be integrated in to mobile phones but I haven’t seen that take off yet.

        Have you done any experiments with OLED tech? I haven’t heard anything from Oculus about this and I wonder if the sample and hold effect is a problem. Is strobing the OLED to adjust persistence possible? I know LEDs can be flashed at much higher power than rated and this might restore some brightness from the dead time.

        http://www.blurbusters.com/faq/oled-motion-blur/

        Mike.

        • MAbrash says:

          Yes, we’ve experimented with OLEDs, and it’s a promising direction. This is one of the areas to which I referred a few posts back when I said that VR that solves the current first-order problems (like judder) seemed technically doable at this point, but getting there would require significant work on the part of manufacturers, and it’s not clear they have sufficient incentive to do that at this point.

          –Michael

      • Matty C says:

        Indeed, that is so true :)

        I’m aphakic (no crystalline lenses in either eye; cataracts were removed when I was a baby but can’t get IOLs done now since I also have glaucoma and it’s very risky). I have a nystagmus that means my eyes constantly appear to wobble from side to side (different to the normal eye movements; it’s like superimposing a sine wave on top of those) and decreased visual acuity even with correction, because of how my vision was at a young age.

        Yet I don’t perceive it like you would think so from the description. My world is totally stable, despite the nystagmus. I have stereo vision, despite having a slightly turned right eye. I don’t see things blurry – they just look ‘less detailed’ as the detail gets finer (so the effect of basically having a lower resolution screen).

        All that helps with the Rift lol :) I don’t notice the screen door effect unless I’m making an effort to look for it; the 3D is great but I don’t really get that overpowering vertigo that I’ve seen others get (the sense is there though!) and I don’t have issues with focusing strain since there’s nothing to focus with so I don’t accomodate anyway.

        Was interesting reading about strobing – I’d see that effect on old CRT monitors that ran at 50/60Hz refresh rates if I moved my eyes or head quickly across. Don’t tend to notice anything like that in the Rift.

        It’s utterly amazing how the brain adapts to sensory input.

  5. Magnus L says:

    Interesting article, as always :)

    I agree that high refresh rates is one of the most important aspects for VR immersion.
    While the “holy grail” target of 1000hz may not be likely any time soon in regards to GPU hardware and data transfer / displaypanel controllers.

    However, sometimes the *panel* itself is capable of much higher refresh rates than the usual 60/120hz displaypanel controller, and often use frame interpolation to produce higher refresh rates.

    If you were to combine say 120hz display controller, 120fps output from the game/gpu itself, with a 1000hz panel *with* low persistance *and* frame interpolation? I realise the frame interpolation itself will introduce it’s own inaccuracies and definately some extra latency from the frame interpolation.

    If we were to compare this to a “normal” 120hz panel it could atleast reduce the judder effect (and possibly cause more latency?)

    Obviously 120fps is far from ideal, but perhaps an accaptable minum, say 240fps with a 240hz input, 1000hz frame interpolated output with low perception?

    • MAbrash says:

      Excellent thoughts! Processing on the panel is already interesting as a way to deal with latency, and this is another possible use for it. However, quite a lot of processing would be required, because this wouldn’t be a simple pan or warp – every pixel’s velocity would have to be tracked and its position interpolated independently. Plus there’s the problem of gaps. Still, well worth thinking about.

      –Michael

      • Magnus L says:

        I’m not sure of the technical details of existing processing technologies for consumer TV’s like the Samsung d8000 wich claims 800hz through motion compensation, it probably requires a few frames ahead for processing (and thus introduce a substantial amount of latency), not to mention the size and heat/power constraints within a HMD.

        Your thoughts on motion blur as a possible solution to judder is interesting.
        If we were to move on to realtime raytracing for future game engines (e.g. http://raytracey.blogspot.no/ ) with stochastic motion blur things would start to get interesting, and would eliminate the need for insane refresh rates.
        However it would require a *very* sophisticated eye-tracking system. I wonder what the latency requirements would be?
        You would probably need a couple of IR cameras with an extremely high FPS, the resolution required would probably not be that high.
        And you would need to process all that data in realtime
        Blinking might introduce other problems as well

        But perhaps it’s a more likely solution than the 1000fps target

        • MAbrash says:

          In my experience, the TVs claiming very high refresh do so via motion prediction, which is not reliable enough for an HMD. In any case, they do introduce way too much latency.

          As for the rest of your post – very good thoughts! I agree that motion blur of the sort you suggest seems more likely than 1000 Hz – but still very challenging.

          –Michael

          • Magnus L says:

            I actually think that motion blur via eyetracking is the most likely solution for consumer based VR, considering the unlikelyhood that game developers will revert to year2k polycounts required for 1000fps, and the lack of incentive for display panel manufacturers to produce a mobile-sized display and controll with higher than 120hz refresh rates.

            Seems stochastic motionblur is doable without the leap towards realtime raytracing:
            https://research.nvidia.com/publication/real-time-stochastic-rasterization-conventional-gpu-architectures

            You’d probably need very high tracker/sensor refresh rates for in order to extract vector information for enough information in between sub-frames, but we’re pretty much already there, and sensor manufacturers will like strive for higher refresh rates anyway (unike the displaypanel industry)

            You’d also need a very high fps camera for eye tracking, I’m not sure of the options available right now in regards to power/heat and size.

            The eye-tracker would translate iris position into camera rotation in the game-engine, but this virtual eye-camera wouldn’t actually render the scene, it would exist for solely for calculating camera motionblur and object motionblur relative to it’s own *rotation*.
            While the standard virtual “head” camera would render the scene and calculate motionblur relative to it’s own *position*.

            Even if you achive a very low latency, highly accurate eye-tracker, the problem of display latency from the typical 60-120hz displays available today would still be a huge problem.
            Considering the time it takes for the rendered result appears on the display is still limited by refresh rate.

            However, the required refresh rate for an acceptable maximum latency will likely be lower than 1000hz in this case?

          • MAbrash says:

            Excellent thinking, Magnus! Very much along one of the lines we’ve been looking at.

            I don’t have anything to add except that relatively low latency can be mostly handled via prediction. The problem with prediction is acceleration, which results in error that’s a square of the latency time, so prediction doesn’t extend out very far, but it works quite well for short periods.

            Oh, also – this might or might not work – it’s not the same as reality in terms of photons hitting the retina over time, so there’s no telling ahead of time – and even if it does work, it’s hard. Good enough eyetracking with low enough latency is hard, and also it requires a change in the rendering model to calculate relative velocity information for every pixel. Maybe that could be extracted from frames over time, but that would be expensive and would have to be fast in order to be low-latency.

            –Michael

          • Magnus L says:

            I’m sorry if I’m getting carried away here, spamming you with my chaotic comments, but I *really* believe eye-tracking and motion blur will become more and more important the further we go (eventually absolutely essential).
            Especially considering the higher in resolution we go and the closer we get to a PPD similar to natural eyesight, the stronger the temporal aliasing effect will be and therefore more judder.

            I’m really excited with the possibility of real-time raytracing for use in VR and gaming applications.
            Apart from all the the obvious advantages with physically accurate lightning simulation, there’s another very exciting possibility with combining eye-tracking with real-time raytracing:

            The sampling can be focused in the center of the gaze-spot, so that more rays are recieved by the camera at the very same spot the eye is gazing, and where the vision is at it’s most acute.
            This would lead to highly effective rendering

            I know that the eye is very sensitive to picking up movement and out-of-place stuff in it’s peripheral vision, but reducing the amount of samples with raytracing in the peripheral vision will simple result in random noise, I don’t think the eye will pickup on this effect at all if done right (i.e. just barely enough samples for it not to be noticable).

            (If you render an image with a typical physically based renderer and don’t shut enough rays/samples, and then apply heavy blur to the whole image, and compare the image to the same rendering with enough samples for a noise-less image, they should look more or less identical as long as the under-sampled image doesn’t have too many “fireflies”)

            I don’t think we’re too far away from realtime raytracing becoming a reality in gaming: http://raytracey.blogspot.no/

          • MAbrash says:

            Another perceptive comment, Magnus. As for rendering more densely over the fovea, check out Microsoft’s experiments with foveated rendering, which is what you describe except with traditional rendering. It does indeed work, and it does indeed make better use of rendering requirements.

            –Michael

    • Fred says:

      Ah, yesterday (his quakecon keynote) John Carmack mentioned frame interpolation as a possible way to boost refresh rate for VR.

  6. Chris Emerson says:

    This reminds me of an effect I’ve noticed with an (I presume multiplexed) alarm clock with 7 segment digits. If I move my eyes rapidly across the display, the digits appear to move relative to the rest of the clock. The low persistence effect, or something similar, probably explains it.

    • MAbrash says:

      I’m sure that’s the case. Let’s just say that there are multiple mechanisms operating in this case, and they don’t all respond the same way to eye motion, so relative positions of objects can easily shift. Another good example of seeing through the illusion to the actual mechanisms of our visual system.

      –Michael

    • Miha Lunar says:

      I’ve noticed that too, recorded it and got this neat little 2 frame gif out of it. Due to limitations of gifs, it’s a bit exaggerated and glitchy though.

      It turned out that with my clock, each digit strobed in two sections. This makes for interesting wobbly artifacts, especially if you look at it while eating something crunchy or just chomping with teeth. That seems the easiest way to induce head vibration, which seems almost completely compensated for when looking at normal objects, but becomes very apparent with “strobing displays”.

      • Aaron Nicholls says:

        Chris and Miha, thanks for sharing great everyday examples of this phenomenon. Since the digits are presented stroboscopically and the clock itself is lit continuously (at least mostly so), the visual system localizes the two in different reference frames, and they visibly separate. It’s called the “flash-lag effect”, and is responsible for a number of optical illusions.

  7. Ryan says:

    Would optical image stabilization fix some of the problems with full persistence?

    Making the physical image move to make up for the micro-adjustments seems like an easier solution than reducing inter-frame latency, but there are some problems even then. The perspective won’t be perfect across the movement, and with the speed required for high angular velocity it may not be possible, but at least for low-angular-velocity movement it should give some noticeable benefit.

    • MAbrash says:

      This was discussed in the comments about the latency post, but that was for a different purpose, adjusting the scene on an intra-frame basis to overcome lag. In any case, it’s a clever idea with respect to judder, but I don’t think it really solves anything that low persistence doesn’t, and it has the same problems. Apart from the fact that it would be extremely hard to make work well enough in an HMD (for starters, think of the noise the motors would make right next to your head, not to mention weight, power, and problems with the mechanism being jostled), it would effectively be similar to low persistence, in that exactly one spot on the retina would be stimulated by each pixel per frame, so there would still be strobing for objects that the eye wasn’t tracking. And it probably wouldn’t fix the visual instability phenomenon I discussed, since there would be a sharp image as the eye moved, so it would probably get through saccadic masking.

      –Michael

  8. Bryan Bortz says:

    Some of this reminds me of when you are on a bridge or a moving boat and fill part of your vision with the stationary land. And you fill most of it with fast moving water and you can feel like you are moving when you actually are not.

  9. Philip Starkey says:

    I’d like to think that 1000Hz displays will eventuate in the next 10-20 years, though finding a different solution quicker is much more preferable! There are already commercial spatial light modulators (SLMs) available that will do 500Hz on a 256×256 grid. I might be wrong, but I think the limiting factor in resolution of such devices is the driver of the SLM, which obviously needs to transmit a large amount of data to the panel (as you point out). Creating higher bandwidth drivers wouldn’t necessarily be out of the question I would have thought.

    As a experimental physicist, my gut feeling is that more and more physics research is going to demand faster and higher resolution SLMs as a way to create unique light fields (for instance to interact with a group of atoms in a spatially dependent way). While it doesn’t immediately translate to 1kHz panels for use in VR, some of the advancements might make there way down the chain into commercial products.

    On a related note, would using an SLM be useful for VR? (as it allows you to control the phase of the light field produced)

    • MAbrash says:

      I’m not familiar with SLMs, but I’ll check it out, thanks. One thing to keep in mind, though, is that even if 1000 Hz were to become doable at a reasonable price, it would reduce the amount of rendering that could be done per frame by an order of magnitude, which would have an impact on visual quality on a different axis than judder. I’m definitely not saying that the tradeoff wouldn’t be worth it, just that it wouldn’t be pure win.

      –Michael

      • Philip Starkey says:

        Ah cool, yeah definitely check out SLMs. Glad I could suggest something you hadn’t heard of before! There are three kinds of SLMs, ones that can only modify intensity, ones that can modify phase, and ones that can do both. Spatially controlling the phase allows you to create digital holograms. Sort of related to that, is work on holographic TVs, some of which was recently published in Nature ( http://www.nature.com/nature/journal/v498/n7454/full/nature12217.html?WT.ec_id=NATURE-20130620 ). These people are not using an SLM because they are not fast enough for holographic TV, and neither do SLMs have a high enough resolution yet.

        While obviously quite a way from commercial production, I would say that moving to devices that allow you to control the shape of the wave-front of the light hitting the eye (aka spatially controlling the phase) would be far more flexible than traditional LCD panels.

        But I can see you are right, that rendering capability would then become the bottle neck. Perhaps an easier problem to solve though? (I have no knowledge in this area so it’s hard for me to know what is achievable!)

      • Gavin says:

        Hi there, I’m responding specifically to:
        “it would reduce the amount of rendering that could be done per frame by an order of magnitude”

        If you simply rendered a bubble of vision around the head, and allowed the screen itself to “locate” the correct vector (direction of head) and select the corresponding area of the rendered texture (video buffer with 360 degree render), it could allow rotation to occur much faster, as opposed to many more frame renders. A “freeze frame” effect for each frame, where you could look around a static frame. An obvious improvement would then be to use a multi-layered render of some kind, with a depth map, to allow small amounts of parallax movement, with no need for additional frames or heavy processing. Incidentally, this may also provide an immediate low latency fix for rendering rotation. allow the screen to rotate around “inside” the current rendered frame, and then update the frame at your leisure. Fast immediate rotation, with lower rendered framerate.
        Just make sure to render motion blur, and maintain a consistent framerate.
        It should look like stop motion (not in a bad way tho, it depends on the rendered framerate, and the buffers can be read with lower latency than the next frame can be rendered (most likely), which will assist in keeping things consistent. Real-time rotation latency will be as low as possible.
        You can even separate out high motion events. If we rendered high speed movement and low speed movement as separate renders, then a fast hardware compositor could just read and composite both while using the head motion vector to select the relevant “sector” of the bubble render. This way we could render as slowly as 15 frames per second, while rendering fast motion at a higher rate, and allowing the screen to locate and update itself (rotation/position) based on the lowest latency version of the head position/motion variables, The display could even use a previous buffer and “fade” from the previous buffer to the next buffer, while being relatively independant of the actual rendering rate.
        :)
        -PuFFiN

        NB: i’ve kinda glossed over the “hardware” a little… :P would need to have a set of buffers to render to, and a hardware rotate/parallax move/composite. (perhaps shaders) I think that this can be decoupled from the pc side though… its really just a panorama texture that the device can select a region from. Rotations a pain though.
        I’m picturing this like an onion. render from the inside layer of the onion through the layered shells of the onion, to the outer shell. this all sits in buffers. And despite the idea that we are moving and rotating with the hmd, its still relative to a 360 degree texture, meaning we can “pan and scan and rotate” inside the buffer. This type of rotation should be fairly low latency. (read vector of head, locate relevant buffer sector, interpret buffer with rotate+move, allow new buffer to composite + fade in — if ready)

        • MAbrash says:

          Certainly there is potential in late warping and panning, but it is tricky and complicated. You have thought this through farther than most people who suggest it, which is cool. As you say, there would need to be some way to deal with translation, if only because our heads are not on sticks and translate quite a bit as they turn. However, doing that is not trivial, and requires significant computing. But that only works for a static scene, which is of limited interest. So then, as you also note, you would have to separate out objects by motion. That would require significantly different rendering. It also brings in a host of new problems, because as objects move relative to the eye in non-linear ways warping is required so they look right, and there are limits on how far that warping can take you without rerendering. Consider that at a leisurely 120 degrees/second and the 15 Hz you mentioned, the whole scene moves 8 degrees between frames – that’s awfully far to handle with warping. Talisman tried to use warping to reduce rerendering 15 years ago, and it proved to be highly complex and not as cost-effective as just rendering faster. That could well happen here; if you want to put those kinds of smarts next to the display, that will be expensive (and hot), especially if you want to have enough bandwidth to even scan out at 1000 Hz, let alone do the warping work. I’m not saying it couldn’t work, just that there are all kinds of tricky aspects and cost considerations. I could easily see people working on this fixing one case after another, but every fix causing a new problem elsewhere, for years. Also, rendering would have to change to provide the necessary layers and motion information, and that’s a hard thing to get developers to buy into. Plus obviously it would require extra rendering to allow panning and layering.

          Nonetheless, an intriguing idea!

          I’m not sure why you say “Just make sure to render motion blur”?

          –Michael

          • Gavin says:

            Hi again,
            Thanks for such an informative answer. The motion blur statement wasn’t really well explained, i just meant to imply the rendering of objects with motion blur, in order to ease the transition from frame to frame.
            (perhaps avoiding the stop-motion effect a little)

            I agree about the tricky bits, it certainly isn’t really much of a solution to try and put another full rendering pipeline that close to the visor (although apart from power/heat/latency, maybe its not a bad idea…)
            There’s bound to be a good solution though… I’ll keep thinking, hopefully I’ll figure out something good to share!

            Thank you for your time,
            -PuFFiN

  10. Paul says:

    This is the same concept behind the Nvidia 3D Vision “lightboost” LCD displays. Blur Busters and the companion test site focus on practical motion blur elimination information, and have more info about “lightboost” capable displays.

    Similar to “240hz” 3D tv technology and the suggestions of Magnus and Michael above, you could use device-side logic to deconstruct / interpolate more data, and further, presumably encode more information into the signal to the device. With variations on the strength of the computing hardware in the device, you could have “dumb” (encode many low-rez extra frames accounting for possible minor head movement before the next frame) or “smart” (ala bump maps, encode velocity vector maps and some form of simple hidden surface rendering hints) along the tradeoff of data bandwidth / code complexity curve. With HDMI for instance, dirty tricks could probably be played with unused audio channels or the ethernet channel. Hints might come from existing object sorting done for alpha blending, for instance.

    Also, there’s a Hacker News thread about this post with some good comments.

    Thank you for responding to my mail last year, that made my week.

    • MAbrash says:

      Certainly there are interesting possibilities with encoding and device-side processing. However, they all run into the same stumbling block, which is that the displays themselves are designed to accept frames at 60 Hz. Once again, there are possible technical solutions here that aren’t implementable until the display manufacturers decide it’s worth spending resources on solving VR problems.

      One other consideration is that the more computing power you put in an HMD, the heavier and bulkier it gets – and the more power it dissipates right next to the head.

      –Michael

  11. Oren Tirosh says:

    How about variable a persistence display? If you can adjust both the current and the pulse width of an LED backlight (or the display itself) it should be possible to modulate them in an inverse relationship so the overall brightness stays the same and the result is imperceptible on a static image.

    The persistence may be modulated by a heuristic signal that combines motion tracking, image motion information and a relatively crude eye tracker. My guess is that it need not be perfect to effectively reduce both judder and strobing to a level that would otherwise require a frame rate an order of magnitude higher.

    • MAbrash says:

      If I’m interpreting your proposal correctly, that would help when nothing was moving relative to the eye, but that’s not the problem case. To avoid judder, persistence would have to drop with increasing relative eye motion, and then you’d have all the problems of low persistence. Unless I’m missing something?

      –Michael

      • Oren Tirosh says:

        If the eyes make a rapid motion unrelated to tracking any moving objects in the scene the persistence should be lowered to let the image smear on the retina and suppress strobing during the saccade. If tracking, persistence should remain low or perhaps be adjusted to some compromise value to reduce strobing of the background. The decision may also be adjusted based on the high frequency content of the object being tracked and its immediate background (e.g. tracking an object moving against the sky will not cause visible strobing of its background so it’s ok to maintain low persistence).

        One of the things that makes me think this idea is likely to work well is that modulating persistence is imperceptible on a static image. A solution based, for example, on optimizing motion blur dynamically by eye tracking will have really ugly artifacts if the eye tracker occasionally produces bad samples. This approach is likely to be much more forgiving of low quality tracking.

        Higher frame rate is, of course, the best solution but if some relatively cheap trick can be used to get ahead of the tradeoff curve you could use same processing you would otherwise spend on a higher frame rate on other methods to improve the experience.

        • MAbrash says:

          I see, and I like the idea. I don’t know if it would help – the only way to know for sure would be to try it. However, determining whether the eye is tracking a moving object isn’t easy, and would require app changes to provide information about object motion, not to mention eyetracking, so it’s probably a ways off.

          One potential problem is that it would take several frames to decide whether the eye is tracking anything, and that could be enough to produce artifacts. But again, the only way to find out is to try it.

          –Michael

          • Oren Tirosh says:

            Well, one way to tell if the eye is tracking something and whether strobing is likely to be noticable is to simulate the image on the retina using the image data and tracking signal (add a Kalman filter to get best registration). If the image in the simulated fovea is not stable enough, start increasing persistence.

            This will automatically take into account things like high frequency content of the background, etc. The processing power requirements can be modest enough to do this in the HMD itself at the lowest possible latency.

          • Aaron Nicholls says:

            You’re correct to identify that with perfect eye tracking, we might be able to solve the problem, and I discussed some of the details of this in my reply to user Dan Ferguson. There are two key challenges:
            1) You’d likely need incredibly fast and accurate eye tracking, since mislocalization of a few degrees can occur within 10 milliseconds of the beginning of a saccade.
            2) You need to the ability to magically distinguish saccades from other eye movements the moment the eye begins moving, since you want high persistence during saccades (for suppression) and low persistence during other eye movements (to minimize smear). Errors categorizing the eye movement will contribute to smearing in one case and mislocalization in the other, and inconsistency could be very distracting.

            It’s possible that a partial solution is sufficient – For instance, if mislocalization at the end of the saccade turns out to be more important to eradicate than that at the beginning of the saccade, the second challenge would be less of an issue. More work needs to be done to know if this is the case, and we don’t know what else we’ll discover when we turn over that next stone.

  12. BondvsBatman says:

    Could it be possible to trick the eye into a higher hz-rate by using low percistence and timeshift the lightning of the red, blue and green subpixels? I don’t even know if that is possible from an technical standpoint, but when you could only light up the subpixels seperately, in my idea, it would make sense to let the eye only see one red, one blue and one green image alone and let it combine the colours together itself resulting in a tricked framerate which is 3 times as fast. Or is that a dumb idea?

    • MAbrash says:

      Not a dumb idea at all, but unless the scene was completely static and your eyes didn’t translate at all, you’d get color fringing and other color effects because of the temporal separation between the frames. See the first post in this series of three; unlike that post, I think you’re talking about rerendering at 3X the frame rate, but while that would fix some problems, the new problem is that the different frames would show slightly different scenes, leading to imperfect color registration between the subframes. There’s just no way to get the subframes to overlay exactly right if there’s eye translation or moving objects in the scene. Also, your rods would pick up all three frames, because they don’t care about color, which would create some odd effects due to rod-perceived variation between subframes.

      –Michael

      • M. Cantan says:

        Just a question : wouldn’t it be possible to compensate color fringing by letting the display driver compensate the rotation (eventually, translation) with the input from the sensors?

        Here is a modified version of your diagram :

        instead of :

        Obviously, this will only work with a fixed scene. For a moving object, we could add some vectorial information, of course, but we could also try to race “three beams” (if the driver doesn’t pick RGB values in the buffer, but instead pick R, then G and finally B value, this wouldn’t cause any bandwidth increase, but it would obviously affect the way the image is rendered, and the display loads the buffer).

        While this isn’t a perfect solution, the first part would be relatively easy to implement (it would just require the driver to translate the scene by a given amount of pixels), even if there are some black borders at the edge of the screen, which I guess, won’t even be noticed.

        I am open to any comment.

        – Mayeul

        • MAbrash says:

          Yes, that would be possible – given extremely accurate, low-latency eye tracking. But bear in mind that even a few arc minutes of error would result in fringing, so tolerances are tight.

          –Michael

  13. Dan Ferguson says:

    I’m still trying to wrap my brain around saccadic phenomena and visual instability. As a thought experiment, how would the brain perceive the corner test if you had pixel-level control of persistence combined with eye tracking? Assume a high-speed camera with no added latency for eye tracking, frame rates like the ones you’ve been testing and a display that can alter the persistence so it increases as you move away from where the eye is looking. Are these problems driven more by persistence or by motion of the pixels (latency)?

    • Aaron Nicholls says:

      That’s a great question, and one we’ve spent some time thinking about. To clarify – Are you suggesting increasing persistence per-pixel proportional to the distance from where the eye is looking, or increasing global persistence during a saccade? I’m assuming the latter, since the former wouldn’t change persistence in the center of the field of view.

      As a thought experiment, let’s assume we have a perfect eye-tracking system (high framerate, 100% accuracy/precision, zero latency) with the ability to instantly detect the beginning of a saccade and distinguish it from other eye movements. Let’s also assume we have complete and instant control over display persistence, so we can react immediately and also keep brightness consistent as persistence changes. In that case, we could either blank or go to full persistence to suppress the perception of the mid-saccadic frames. In theory, you’d think that would solve our problem.

      Unfortunately, research into perisaccadic mislocalization (PSML) shows that it even effects objects or scenes flashed up to 150ms before the beginning of the saccade, so you’d think we’d need a time machine as well. However, these studies were performed with isolated flashes – Fortunately, the small body of research into continuous flicker suggests that if you stop flashing at saccadic onset and resume flashing at the end of the saccade, visual stability is largely preserved, at least perceptibly. The paper Perisaccadic perception of continuous flickers is one of the few specifically on this subject. There has been very little research into this particular intersection of saccades and stroboscopic displays with the human visual system, so more work is needed here to be confident.

      Implementing this reliably in real-world hardware could be very difficult. What research has been done suggests a few degrees of mislocalization within the first ten milliseconds after a saccade begins, so you’d need incredibly high speed, latency, and accuracy in saccade detection to completely eliminate PSML, in addition to very low display latency. As a result, this approach may not be feasible in consumer hardware in the near future. It’s possible that with imperfect tracking or latency you could reduce instability enough to make it worthwhile without introducing noticeable artifacts, but we simply don’t know yet.

      • Dan Ferguson says:

        When I posed the question (somewhat prematurely) I was actually thinking of the former method. After I read the links and related papers I understand much better why that would probably not work. During the saccade much of the display, particularly where they eye started, would have low persistence and could still potentially defeat the masking mechanism leading to the aforementioned motion misperception.

        As your latter proposal shows you’d have to detect the saccades and instantly adjust persistence based on eye movement. But I still wonder, where are the edges of these mechanisms? How much persistence is required to permit masking; e.g. what is the masking response curve for duty cycle, frequency and display percentage? I’m just pondering, I know there aren’t answers yet.

        The paper you referenced brings up all kinds of additional questions on how difficult it will be to actually trick the visual localization systems. The design box seems far tighter than what has been discussed here so far. I gather that your team is further down the research path than what actually gets posted to the blog. Where are you actually concentrating your efforts? Do you have a loose roadmap based on your discoveries and reading?

      • Tom says:

        With regard to saccade detection, it seems that there are reliable EEG waveforms that occur ~15-30ms prior to voluntary saccades https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3625857/#__sec10title . In theory it might be possible to detect these with contact electrodes on an HMD. That would give a large window for any processing needed to prepare for the upcoming saccade. It wouldn’t give an exact velocity profile but it would certainly give you ball park and maybe could even be calibrated to the specific user’s saccade speed. That would leave eye tracking only needing to determine direction and stopping, which might not require those fancy 1000hz eyetracking systems.

        From the blog post and the comments I’m not sure whether you guys are also concerned about microsaccades for visual instability, but I suspect that they do not result in strobing. I’m also not entirely sure whether the signals mentioned in the article the predict a volutary saccade are also present for other saccade types that also have similar angular velocity and distance profiles.

        • Aaron Nicholls says:

          According to the literature, microsaccades can result in strobing and/or visual instability, but the threshold is much higher due to the smaller saccade magnitude and duration. Using EEGs/EOGs to detect upcoming saccades might be possible, but adds it own complexity and would require a very high level of fidelity to prevent visible artifacts due to false positives. As you mention, this may also handle voluntary but not involuntary saccades, solving the problem partially.

        • Bernardo says:

          I work in 3D CGI, and rendering dboule the frames means the costs for super servers to crank this imagery out will dboule. The budget for the two hobbit films is 300 million, I would say this will easily climb to 400 mill if the new frame rate is not factored in.All of that so we can see this moving at dboule the frame rate, yikes.

  14. Mark Rejhon says:

    I am the owner of Blur Busters, and the author of the Blur Busters Motion Tests.
    I want to compliment you on your excellent write-up! I created some HTML5 animations that sync to VSYNC in some browsers, view these below links in Google Chrome:

    An excellent animation of eye-tracking-based motion blur:
    http://www.testufo.com/#test=eyetracking

    An excellent animation of how strobing reduces motion blur:
    http://www.testufo.com/#test=blackframes

    Strobing is a great solution to eliminating motion blur. An example is viewing http://www.testufo.com/#test=photo when LigthBoost is enabled. (ToastyX Strobelight now makes it easy to turn ON/OFF LightBoost via a hotkey). One big problem of strobing is you have to refresh the LCD in the dark, before strobing the backlight on clearly refreshed frames. This adds an average half a frame of input lag, which can hurt VR. However, one could also use a high refresh rate (e.g. 120Hz or 240Hz) to fix this type of lag. 120Hz strobing is far more comfortable on the eyes than 60Hz strobing. Shorter strobe flashes reduce motion blur, so a 90%:10% dark:bright duty cycle reduces motion blur by 90%, but can darken the image by 90%.

    • MAbrash says:

      Hi Mark – great to hear from you! We really appreciate your work, and in fact Aaron already forwarded around your animations.

      We have done black-frame insertion at 120Hz, and it works pretty well, but it’s still hard to get the LCD to settle quickly enough.

      –Michael

      • Mark Rejhon says:

        Thanks for the compliment. This probably means you’re familiar with my LightBoost research so far. Have you tried ToastyX’s new Strobelight Utility, which programs a LightBoost 2D motion blur eliminating strobe backlight without needing 3D Vision drivers, and without registry hacks? It also can reprogram the strobe length, via keyboard shortcuts as well:

        Control+Alt+Plus = turn on LightBoost strobe flashing
        Control+Alt+Minus = turn off LightBoost strobe flashing
        Control+Alt+0 = Lightboost 100%, using 2.4ms flash
        Control+Alt+5 = Lightboost 50%, using 1.9ms flash
        Control+Alt+1 = Lightboost 10%, using 1.4ms flash

        It creates a sort of an equivalent of a programmable-persistence display (to a certain extent). Both Adam Simmons (of TFTCentral.co.uk) and I (of Blur Busters) have used an oscilloscope and found that LightBoost strobe lengths are programmable in approximately 0.1ms increments. Our measurements varies a bit, but both of our oscilloscope measurements match within +/-0.1ms and we’ve established that motion blur of a LightBoost strobe backlight is linearly proportional to its strobe length. It does not eliminate stroboscopic/wagonwheel effects, so 1000fps@1000Hz is still a useful Holy Grail.

        The strobe length differences actually show up visually in motion tests too. We were impressed that we could tell the 1 millisecond difference, in a fast-moving motion test:
        http://www.testufo.com/#test=photo&pps=1440
        (while running ToastyX Strobelight with LightBoost enabled)
        Running this animation while hitting Control+Alt+0 versus Control+Alt+1. The windows in the castle at the top obviously became clearer during 1.4ms strobes, than with 2.4ms strobes.

        So this is further proof you are correct, that we need a 1000fps@1000Hz display someday this century (one can hope!) — because 1.4ms is equivalent to 1/700sec, and having 1/1000sec display samples is quite close; at 10%, LightBoost is doing roughly equivalent to a 5:1 black frame insertion via backlight means (1.4ms:6.9ms bright:dark during an 8.3ms refresh cycle).

        LightBoost displays actually partially framebuffer the refresh and does an accelerated scanout (~1/200sec or ~1/240sec) to artificially create a longer vertical blanking interval for a longer pixel-settling time before strobing the backlight. Then it also uses a Y-axis-compensated response-time-acceleration algorithm, to compensate for the difference in the freshness of the pixels at the top edge versus bottom edge (since the LCD scanouts in the total darkness). Marc Repnow (of did some reverse engineering of the LightBoost behavior, and some TestUFO patterns (e.g. blur trail using blue/yellow) revealed evidence of “RTC zones” along the vertical axis. We even discovered that LightBoost strobe backlight is strobed very late into the VSYNC, and the backlight is still ON when the next refresh begins (turning off about 0.5ms into the next refresh), this is because LCD pixels of the next refresh haven’t yet noticeably begun transitioning. Strobe backlight experiments need a phasing control (to control the timing of strobes relative to VSYNC), as it apparently is beneficial to make the flash as late as possible, and last slightly into the next refresh when transitions are still invisible. (this creates more time for the previous refresh’ pixels to settle).

        One needs a fast scanout, followed by a very long VSYNC, for a strobe backlight. And the design of a Y axis variable into the RTC math calculations, will benefit reduction of vertical-axis-asymmetric ghosting artifacts (e.g. increased ghosting along the top/bottom edge).

        • MAbrash says:

          Interesting stuff! One thought: You can strobe as fast as you want, but current LCDs can’t transition at anywhere near 1000 Hz, and that’s going to be a limiting factor with LCDs for a long time, even if a video channel capable of anywhere near that bandwidth becomes available.

          –Michael

          • Mark Rejhon says:

            For current LCD’s, yes, this will limit your strobe Hz to be longer than the pixel’s ability to transition.

            Currently, reducing the contrast of the LCD (e.g. reducing VG278H to about 50%-60%) also increases overdrive headroom so that ghosting and 3D crosstalk falls completely below human-perceptible levels (even along bright vertical edges). You can strobe quicker, but you start to get more and more inter-refresh crosstalk the more you “eat” into the next LCD refresh.

            The active-matrix OLED’s won’t be fast enough for 1000fps@1000Hz, but passive-matrix OLED are definitely fast enough. Problem is passive-matrix isn’t practical for good brightness on a large-size high-def OLED. This may change (e.g. science lab stuff, such as parallel multiscanning an AMOLED to allow the bandwidth needed for 1000Hz)

            We can all hope for Plan B. Blue-phase LCD’s: 10 to 100 microseconds! Enough for 1000fps@1000Hz eventually. This was published in the journal of Society for Information Display (google “blue-phase LCD”. Those can transition in microseconds and they’ve already demonstrated color-sequential LCD’s using this this tech. Color sequential; imagine that — just like DLP.

      • Mark Rejhon says:

        Extra link: The Marc Repnow reverse engineering is at:
        http://display-corner.epfl.ch/index.php/LightBoost

  15. Fred says:

    Off-topic, (and probably raised previously), but I’ve been wondering about the viability of putting 2 outward facing cameras on a VR headset and re-injecting the signal in the headset to be able to do some sort of AR (some recent leaked render of the Oculus Rift commercial prototype had such cameras, maybe just to facilitate de-activating the glasses without taking them out) .
    That seems to open a whole new can of issues.
    Besides a huge lag (esp if doing real time processing on top of doing simple pass-through), I noticed than when I record video on my cell phone and the subject is moving fast, there’s a lot of distortion where the verticals appear slanted at an angle (it’s very obvious when I ride the train and point at another train passing by the opposite direction). I’m not sure if that’s cause more by the video capture process (low quality CCD) or the display as well.

    • MAbrash says:

      It’s hard not to think about that possibility, and I’m sure we’ll see some Rift mods along those lines. However, keep in mind that the cameras would not be where the eyes are, and short of massive warping to correct that (which would add lag), I’m pretty sure that would induce motion sickness in a big way. There’s also lag, as you say, and rolling shutter in either or both of the camera and display will produce artifacts. Then there’s the fact that you’re always focused on the display, not on the real-world objects, and the reduced FOV (don’t try crossing a street!), and the reduced dynamic range, and of course the low pixel density. So it’d be worth trying, but I see a lot of potential problems.

      –Michael

    • Wai Ho says:

      Pretty sure its caused by the rolling shutter in the capture camera. Global shutter is much better for fast camera or object motion. Or you would have to treat each camera pixel row as mini independent cameras when you do the AR

  16. Fred says:

    I haven’t seen much discussion about this aspect (but maybe this is pretty low on the list of things that break the illusion).
    With stereoscopic 3D on TVs (or the 3DS), one big issue is the mismatch between eye convergence and focal distance (the focal plane is always on the screen), which seems to induce quite a bit of eye strain.
    One of the advertized positive aspect of 3D on the Rift is that the optics create a focal distance at infinity for all objects, which creates a much more comfortable viewing experience. But it doesn’t really solve the eye convergence/focal issue (but maybe that’s how far-sighted people perceived reality with corrective glasses?)
    I can’t think of a way to solve this… maybe with some sort of complex eye tracking and adaptive optics solution?

    • MAbrash says:

      It can’t be that bad a problem – most of the population over 50 can only focus at infinity, and they seem to function fine with convergence/focus mismatch. On the other hand, they had years to get used to it, and don’t have to switch back and forth, so it’s not really that comparable. Still, I don’t know yet how significant this mismatch is for VR.

      Solving this is certainly not trivial. Light fields can do it, but you’d need a wearable light field, and even then there would be significant performance costs for light field rendering.

      –Michael

  17. grant says:

    Most of what I know about vision I’ve learned from your blogs, but I still have an idea for an experiment that I don’t think I’ve seen any research on.

    You’ve talked about frame rates and what’s needed to infer motion, what the eye can see and what the brain can interpret, but I’d be interested in looking at lower-level phenomena that occur at much higher “frequencies”, that may be adding to the visual instability you described. Basically, I’m hypothesising that the vision system can process motion at a much higher temporal resolution than the conscious brain can normally interpret, and dropping the persistence of the display triggers this feedback, where the lower-level vision system is seeing still images (helped by the brightness of the flashes breaking the saccadic masking) but the higher parts are seeing motion, and so the final resolved motion perception is off. Increasing the temporal resolution of the whole display system to ~1000Hz would probably brute-force past this as you say, but we’re assuming that’s impossible for now.

    To learn more about this, what I’d wish for is an HMD with “enough” resolution for a scene, then increase the physical resolution by an order of magnitude or so while leaving the logical resolution the same. Assuming the pixels can be switched very fast but the GPU-facing controller is limited to 120Hz, the plan is to flash 3 zero-persistence subframes, while lighting up different full-colour subpixels to correlate with motion. Perhaps you can leave the other subpixels on at a lower brightness. My hope is that you’d get 120Hz of real scene rendering, but three times that for visual stability purposes.

    This would be different to the ideas of hardware-based scene warping, because you’re not actually changing any pixels, just trying to add a “motion element” to each pixel. You’d ideally have the tracking hardware feed into the display to account for head translation and rotation, and you’d have the 3D+physics engine maintain motion information for each pixel, and feed that through into the frame buffer. You’d also need to feed in Z-buffer information for this high-speed controller to calculate the motion vector during head rotation. The end result under magnification would be all these subpixels swarming around within each frame, but hopefully it looks roughly normal to the conscious brain.

    It would be very hard to do, and it may not be possible for a few years, but it’s the easiest experiment I can think of :)

    • MAbrash says:

      That’s an interesting idea; I’ll ask Aaron to respond, since he’s done a lot of research into the relevant areas.

      –Michael

    • Aaron Nicholls says:

      Thanks for the insightful question; I’ve got good news and bad news. The good news is that your hypothesis is correct – While 60-90Hz may be enough to hide flicker, the human visual system is sensitive to motion cues over a shorter window. The bad news is that the window is very short. For example, Happ and Pantle showed that test subjects could identify motion in a two-frame sequence with only 1-2ms between frames. Even if we could run at 1000Hz, when the eye is moving quickly (500-800 deg/sec during a saccade), scenes flashed 1ms apart still land half a degree or more apart on the retina, and you could actually cause backwards motion cues if frequencies aliased badly enough as seen in the wagon wheel effect.

      Your hypothetical display is one way to look into it; If I understand correctly, you’re using the higher physical than logical resolution to allow individual pixels to move within a block of subpixels over the course of a 120Hz frame (during multiple sub-frames); is that correct? This approach has the potential to provide some low-level motion cues which would otherwise be lost, although it presents a couple of interesting challenges:
      1) Unless the sub-pixels were extremely bright it be a very dim display (since brightness doesn’t scale with pixel density), and leaving on the other subpixels would trade contrast for brightness. It might still be bright enough for testing, although changes in brightness impact visual integration timeframes, so you’d need to normalize for that.
      2) You’d need to either restrict a pixel to its sub-pixel grid (significantly hampering your ability to present anything other than very low-speed motion cues) or have large variations in brightness, particularly on object boundaries, as adjacent pixels flowed together or apart across the screen during the temporal sub-frames. It’s possible that more sophisticated techniques of sub-pixel placement/motion would preserve the most important motion cues (possibly trading off high vs. low-spatial frequency cues) without introducing significant artifacts.

      On that vein, we do have a useful piece of hardware for that we’ve used experiments like this – It’s a DLP projector that can run at up to 1,440 fps at one pit per pixel over HDMI. To investigate the issue we saw with saccades, I projected the holodeck-esque pattern on a wall in a dark room, got my face nice and close for maximum FOV, and increased the framerate until the visual instability disappeared. In that particular case, it disappeared at around 400fps, although that number should vary with brightness, contrast, content, and attention, among other variables. We’ve used the projector to investigate a number of theories about visual motion integration (such as inserting temporal sub-frames as you’ve suggested; the jury’s still out on that one). It’s no HMD, but it does allow us to slice off particular aspects and investigate them in isolation.

      Your idea is a novel approach. If you’ve got other ideas for how we can better dig deeper into these issues, please keep them coming!

      • grant says:

        Yeah, I thought of a whole bunch of issues with my experiment since then, mainly that if you’re trying to combine the low persistence effect to reduce judder with sub-frame interpolation of any sort (without scene warping), you practically have a very narrow window to flash the subframes in the middle of a display cycle, otherwise it turns back into a high-persistence situation.

        The more I think about the different conflicting goals, the more I think you’re stuck with needing accurate eye tracking to cater for different movement situations. If you didn’t need to cater for the fixated/counter-rotating eyes with a rotating head you could just add in some motion blur to the whole frame to simulate high persistence in a low-persistence display (or use variable persistence as mentioned further up). With good enough eye tracking you could perhaps variably blur the scene where there’s movement relative to the eye. I guess you could hard-code that into a choreographed experiment to test if variable blurring helps the visual instability at normal frame rates on a low-persistence display.

        • Aaron Nicholls says:

          You’re right – It is pretty complicated no matter how you slice it. A few months ago, we tried another experiment similar to what you suggested. Knowing that motion integration can occur on such short timeframes, we rendered a burst of two or more subframes in rapid succession for each frame (for instance two frames 1ms apart followed by 15.67ms before the next pair of frames for a total framerate of two subframes per 60Hz “frame”). We were hoping that by activating motion detectors that wouldn’t otherwise be stimulated, it might help reduce the visibility of motion strobing. Unfortunately, in our experience motion looked better with evenly spaced frames (e.g. 120Hz looks better than back-to-back pairs of subframes at 60Hz). Admittedly, it was a hail-Mary pass, but it was worth trying.

          To your earlier point about what the eye can see and the brain can interpret, it’s an important distinction. For example, if you tune a CRT to the lowest framerate for which a group of users can’t see flicker, they can still experience eyestrain and headaches until you raise the refresh rate to a higher, less easily measured level. This cuts the other way around as well – There are a number of defects in modern displays that fall well within human perceptual thresholds and are technically visible if you are looking for them (sample-and-hold artifacts in LCDs, for instance), but many people simply don’t notice without an A-B comparison. That’s not to say these aren’t important or even a potential differentiating factor, but there’s a gap between the requirements for a great experience and those for a theoretically perfect stimulus, and it’s non-trivial to judge how “perfect” things need to be.

          If anyone is interested in the high-speed projector mentioned earlier, it’s a DLP LightCrafter from TI. It costs a few hundred dollars, and in addition to 1440Hz (one bit per pixel) over HDMI, it can also do bursts of stored image sequences at 4000Hz. Framerates that high allow you to simulate a range of display properties (persistence, framerate, etc.) and eye movements (e.g. “what does the foreground/background look like during smooth pursuit on a hypothetical display with these properties?”) without needing to build the display or perfectly replicate the eye motions you want to simulate. They have a much faster version as well, but the Lightcrafter has been suitable for our needs so far.

        • grant says:

          By the way, I was thinking of how hard eye tracking and all the required feedback is on a consumer-grade HMD, and thought it might be possible to do it in reverse, as in “passive eye tracking”.

          This setup is probably completely impractical, and of questionable usefulness, but I thought I’d share it anyway:

          - I’ve never used an Oculus before, but I imagine it’s typically dim enough that the user’s pupils aren’t pinholes – too much brightness would probably just be fatiguing. You then have a reasonably large surface over the cornea to “map” to areas of the retina

          - So you get the user to wear a set of contact lenses (I said it was impractical) that have a dot of polarising filter in the middle of the lens (over the fovea), and a non-polarised but darkened area surrounding it.

          - In the display, you have 2 translucent OLED screens sandwiching a polarising filter

          - The front screen (low persistence) has pixels fire as soon as they’re ready, then switch off. The rear screen switches on the corresponding pixel as soon as the front one goes out and leaves it persistent until the next frame is ready.

          So the fovea of the retina gets a bright low-persistence view of wherever it’s aiming at on the display, while the periphery (responsible for high-speed motion detection) gets a smeared high-persistence view, darkened to compensate for the brightness difference.

          Once people are used to wearing contact lenses, you can move all the hardware from the HMD into an “EMD”, layering a display over the cornea ;)

          • MAbrash says:

            Clever, but, as you say, not entirely practical :)

            Once you have a contact in the eye, you can just put an induction coil in it and track the eyes that way. Researchers use that approach. You can also use the Innovega technology and do AR rather than VR, and get very wide FOVs. But of course there’s that pesky contact business.

            You couldn’t do an EMD that way, at least with current technology, because there’s no way to focus the light from something as close as the cornea. Even microlenses wouldn’t work for anything but really low resolutions due to diffraction limits.

            –Michael

  18. Johan says:

    First of all, thank you for your contributions to PC gaming (a considerable part of my life) and for sharing your professional as well as personal experiences on this blog! :)

    I know nothing about displays other than what I’ve read in your posts (and what I’ve googled to better understand them :P), so I hope I’m not wasting too much of your time!

    Is differential update on a global display possible?
    Although I suppose that might be too much of a hardware change to fit the scope of the discussion… :S

    Thanks for making cyberpunk reality! \o/

    • MAbrash says:

      Thanks for the kind words – you’re very welcome!

      Differential update is pretty far from existing hardware, but not impossible. What do you have in mind to do with it?

      –Michael

      • Johan says:

        I was wondering under what circumstances it’d be possible to trade a lower pixel throughput for higher pixel depth (24b for colour, 20b for address), omitting pixels that don’t need new values. Allowing for faster cycles between “firing” the (entire) display when the difference between two rendered frames is less than 50% (since it would take twice the bandwidth), which could be aided through destructive image compression. Sort of a dynamic increase in frame rate when the content allows it.

        Of course, rendered fps would have to be several times the display’s full screen redraw fps, in order to continuously determine whether or not a differential update would cut short of a full update.

        But your comment leads me to believe that conserving bandwidth isn’t the issue. I am guessing something along the lines of the way the display driver circuit conveys the data to the end of the chain (of course I googled it, but… entropy) or it’s likely inability to process differential updates made to the previous frame.

        And I suppose that if differential updating was a viable option someone like you would already have thought of it! (Which you obviously have and discarded it, most certainly for good reasons.)

        Even if I’m completely wrong, I’m happy with your blog making me think of stuff I usually don’t! :)

        • MAbrash says:

          Unfortunately, that wouldn’t help the worst case, just the average case. And it wouldn’t really help the average case either, since most of the time the head is moving to at least some extent and all the pixels change. Anyway, bandwidth can matter (for tether-free operation, for example), but for that foveated rendering seems like a simpler, more reliable approach.

          –Michael

  19. Burninate says:

    A note about GPU hardware – it is the result of decades of iterative improvement, optimizing for a specialized task involving rendering frames reliably at ~60-75hz and no more, at increasingly high resolutions, poly counts, and lighting conditions. The code and pipelines that work at this level are influenced by a variety of bottlenecks other than simple ‘polygons*frames per second’, and rarely reach the 1000fps zone except in extremely old code – this is the result of things like CPU bottlenecking, render units, etc becoming an issue at those framerates for everything but Quake 1 – things whose scaling properties current software & hardware don’t target because nobody’s got a market niche for 1000fps. If I were you, I wouldn’t downplay the possibility of an extremely good-looking product that works at 1000fps, after the hardware and rendering pipelines have been optimized for those framerates.

    Getting a solid interim product, involving low-latency 1080p240x2, would seem to be a rather high-priority thing to get working *just to get the game & graphics designers to a place where high-framerate play can be contemplated as a realistic business goal*. OLED panels can do this kind framerate. HDMI 2.0 and Displayport have just arrived at a place where they can do this bandwidth-wise. It’s not out of reach of most preexisting games with the right hardware, even with the current low-framerate-optimized pipeline, if some of the options are tweaked.

  20. Greetings, Michael, thanks for making your blog and sharing your ideas. It makes the community think of ideas too. Really, ideas should be free. Implementing them is what is costly — IMO of course :P

    I have an idea to get a display using higher frequency than 60-120hz. Split it up into multiple monitors that are projectors, projecting onto the same displayed image.
    Say you use two projectors. The first would display the top half of the image, and the second would display the bottom half.
    Once they are done, they start on a new frame, only displaying half a graphics card framebuffer on each projector.
    The signal from the computer changes to the next frame when only half of the screen is scanned by the display. Its like vsync off basically, with twice the fps as the monitor hz. The monitor is receiving a frame ‘torn’ into two frames.

    Each display connected to its own PC output.

    All you have to do is get the timing between the two projectors right — they would need some special link to tell the next when to turn on, or you could just use a delayed turn on for each sequential projector if that is accurate and repeatable enough.

    Then you just need a display driver to duplicate both monitor/projector screens and vsync to be off. It should work just like that, with the RAMDAC scanning out different portions of the same frame to each monitor/projector, because I’m pretty sure the RAMDAC has to sync
    with the frame scanning of the monitor/projector. If not, there would need to be a custom driver made to scan out from the graphics card at the different scan positions and tell each display where to scan.
    So that could work, say with 6 projectors giving 360hz effective. LCoS has low pixel transition times of about 3ms or less.
    And if you use it with scanning laser instead of constant LED, each projector is not projecting onto another projectors image.
    So 360hz should be fine and with 6 DisplayPort outputs on a graphics card.

    So you could have this 360hz, extremely low persistance due to scanning laser, display, and using even more screens — say 4 DisplayPort with that 3-way splitter = 12 DisplayPort — you could get 720hz.

    But even still, its limited to about 1080p because of the insane bandwidth required. 360hz * 2020 * 1100 * 24bpp = 19Gbit.
    Hmm I guess its not that bad — just a bit more than one DP 1.2 cable can handle.
    But 1080p is low res lol. Going into the future we’ll be wanting 4k * 2 =8 times that bandwidth.
    That’s just for the connection which isn’t that big of a deal. The main problem is the graphics cards. 4k is about 8.3MPixels. If you have 4 Titan Graphics cards in SLI you can maybe get a solid 60fps at 4k.
    Asking for 24 times the pixel pushing power of 4 Titans — that’s going to take a while.

    What if you split the screen the same way to get more hz, splitting it into 4 sections top to bottom.
    The screen resolution of each projector is one quarter of the previous setup though.
    And the projectors are aligned extremely accurately such that the center of each of their pixels forms a 2×2 square.
    So, the graphics card renders at a quarter resolution, setting the viewports’ pixel centers to aim for the center of the top left corner of this 2×2 square.
    Then next frame render top right corner, then bottom right, then bottom left going all the way around.

    So, you’re scanning at 4 times the rate. Each pixel will basically be juming around very rapidly in a circle, and I guess it would look fairly solid if done at 240hz and would look quite alaised since each pixel is 4 times bigger, but moving to make an average.
    The average wouldn’t be as crisp, and I have no idea how it would look, but I guess it would look a bit blurry, but not too bad. Kinda like bilinear filtering, and maybe you could program the pixels to additively look similar to that.

    I mean, the only other options the get true, full-resolution crispness, is it do 1/4 size pixels all the time, with the projectors’ pixels being 1/4 size. I have no idea how that would look, but I guess it wouldn’t be very bright when moving and when stationary its 4x brighter.
    Hey, moving things are less brighter anyway … could look good.

    Effectively, moving pixels are either drawn at quarter the resolution or quarter brightness.
    When they stay still, they sharpen, and either get some aliasing or get 4x brighter.
    I think either way would look good because moving objects are blurry anyway, and then when you want detail, you look at things that have stopped moving.
    Examples are on your phone, when you scroll text, it goes lower resolution. When you stop they apply sub-pixel aliasing.

    Of course, we’d probably scale it to 9x or 16x instead of 4x resolution/refresh rate. 1/16th brightness would be bad, but I think it could work.

    • MAbrash says:

      Interesting. It’s a possible way to get higher frame rate, but it is expensive, power-hungry, probably pretty bulky and heavy for an HMD, and difficult to build. For example, how would you avoid having seams between the adjoining displays? And then, as you point out, there’s the bandwidth issue – which in the end turns into a power issue. By the time we could pump all that data and display it, we would most likely have a high-res, high-refresh display that could do the same thing. But this would be an interesting approach for prototyping.

      –Michael

  21. Bryan says:

    Have you (or anyone you know) looked into VR sickness being related to undetected body movement/motion? There are reports that you can seriously reduce VR sickness by strapping a Razor Hydra to your chest and using it for motion detection (adjusting the camera accordingly). This might indicate a link between undetected chest/body movement and VR sickness.

    • MAbrash says:

      Can you point me to those reports? I don’t see why adjusting the camera based on the motion of the chest (which is not directly connected to motion of the visual system) would work better than adjusting the camera based on the motion of the head (which is directly connected), so I wonder if they’re doing something less obvious.

      –Michael

  22. Bryan says:

    The reports are just forum posts, youtube videos, and comments, nothing official. I think the idea is that movements of the chest affect the orientation and position of the head, and that by taking into account both chest and head orientation VR sickness might be reduced. This could just be inaccurate speculation though. It’s just something that people have commented about on forums.

    Thanks for the quick reply, and thank you for all your contributions to the industry over the years. I used to work at a bookstore that sold your Graphics Programming Black Book when it came out. We put a wall of them on display by stacking them like 10 high and 10 or 20 long. It was a really great book for the time. Before buying it, I used to pick one up and read it whenever I had a break.

    • MAbrash says:

      Ah, I see, with the Rift you might get an improvement from constraining the estimated head position according to the chest position. Of course, you could just put a Hydra on the head and know the actual head position :)

      Thanks for the kind words! The Black Book didn’t sell all that many copies (my frustration with publishers ranks up there with my frustration with chroniclers of projects I’ve worked on), so you might have been responsible for a significant percentage of them!

      –Michael

  23. Lefteris Stamatogiannakis says:

    I’m not a VR specialist (my main interests are databases and physics), but i have an idea and i wonder if it has any chance of working.

    As i understand the problem (with some of my thoughts in parenthesis). We need zero-persistence to reduce judder for the normal eye-movement case (because it permits the eye to do the interpolation by itself?).

    But this presents problems for the saccadic motion case (because the saccadic motion is very fast, and it may fall in the dark time inbetween the zero-persistence light points?).

    So the idea that i have is to increase the zero-persistence light points (beyond the GPU FPS updates) by doing spatio-temporal antialising.

    Assume that we have a GPU that renders at (lets say) twice the resolution of the VR display.

    Instead of doing the antialising on the GPU (using normal antialising techniques) and then send the reduced resolution image to the VR display. We could send the increased resolution image (without any GPU side antialising) to the VR display and let the VR display do the antialising itself using a spatio-temporal technique.

    The spatio-temporal antialising idea does the antialising by introducing more frames on the output. If a pixel from the high resolution GPU image, falls inbetween two pixels of the low resolution VR display, then the display will do a “jitter” between the two pixels at a higher FPS than the GPU works with, and according to the percentage of the GPU pixel that is contained in each of the two VR display pixels.

    So if the GPU sends frames at 60 FPS, we could do spatio-temporal antialising on the VR at a multiple of the base 60 FPS (using zero-persistence). So instead of having this case (ASCII art warning):
    ——-
    *

    *

    With spatio-temporal antialising it would be like this (if the high-res GPU pixel is 50%-50% between two of the low-res VR pixels):
    ————–
    .
    .
    .
    .

    Spatio-temporal antialising should not be that hard to be computed with specialized hardware directly on the VR display.

    For above, i assume that the eyes might have a tolerance of microjittering due to them being used to jittering due to atmospheric distortion of light. So the eyes may “see” the spatio-temporal antialising jittering as a light point between two of the VR display’s pixels.

    • MAbrash says:

      Good thought! The downside is that it requires significant modification to the display, which is not easily or cheaply accomplished, and there’s no telling how well it will work ahead of time. It also requires more processing power on the display, which increases heat and cost. And you’d need to send enough information to let the display generate the intermediate frames properly, or else have the display extract stuff like motion vectors, which again increases the required processing power. Nonetheless, this is an interesting general direction to investigate.

      –Michael

  24. Lefteris Stamatogiannakis says:

    Thank you for answering.

    The idea that i’ve tried to describe doesn’t need motion vectors. It works in the same way that the displays with temporally separated red, green, and blue subpixels work. They also (have to) increase the displayed FPS (x3) but they create color fringing problems.

    What i’ve tried to describe does the same thing, but instead of using the color component to do the FPS multiplication, it uses temporal antialiasing to increase the FPS. So by reducing and doing temporal antialiasing on the higher resolution of the incoming picture, it multiplies the FPS on the lower resolution display.

    Moving this processing on the display is indeed a problem (due to the increased heat), but on the other hand, antialising is a lot cheaper computationally than doing full motion interpolation.

    • MAbrash says:

      I think the problem there is that all the antialiased samples generated from a high-res frame would be at the same time, which would introduce anomalies. Unless you interpolated, but that’s when you need to know motion to get it right. So I’m not sure how you’re planning to generate additional samples that are the right data at the right time.

      –Michael

  25. Mark Rejhon says:

    As Blur Busters, I have published my low-persistence research publicly at:

    “Electronics Hacking: Creating A Strobe Backlight”
    http://www.blurbusters.com/faq/creating-strobe-backlight/

    INDEX
    (Updated September 6th)
    - The Problem With Scanning Backlights
    - Strobed Backlights is the Solution
    - Some Engineering Gotchas with Strobe Backlights
    - Easy Electronics Mods of Existing LightBoost Monitors
    - Understanding LCD Refresh Behavior Via High Speed Video
    - Hacking Existing Monitors: Strobe Backlight Mod
    - Testing A Strobe Backlight
    - Advanced Strobe-Optimized LCD Overdrive Algorithms
    - Advanced Input Lag Considerations
    - Advanced Multipass Refresh Algorithms
    - Flicker/Eye Comfort Considerations
    - Alternative Display Technologies (OLED, etc)

  26. Hi!

    I find your posts about vr very interesting, but i think you let something out.
    A bit silly question: Have you seen avatar in 3d? And at the beginning when a man is speaking in front of defocused background, haven’t you tried to focus on the backdrop? Many of the people i asked tried it, an they described the act as a natural way of looking around to get a sense where they where. So obviously they where fooled by the 3d image. But when they tried to focus on the defocused part, the had a feeling similar like being drunk, or dizziness.

    So my point is. You, till the end of this post leaved out what you will display. And i think this is almost so important, than how you will display it. You can fool the eyes with using of optical illusions, but you will newer get the real sense of seeing without being able to focus like in real life.

    I think this is the primary cause of the fail of 3d in consumer level, because in the last 60-50 years our visual experience in created images (movies, photography) used the dof as a replacement for 3d visualisation. But when the two techniques used together, they creating a more difficult problem. And without using the two effects you will get a plain, flat visual representation.

    My DIY solution would be to take a vr system, add an eye tracker to each eye, and create lytro like display applications, but for movies. I understand, that this would multiply the data what you should handle, but there are also tricks like in storing movie files, by generating depth keyframes, and between those you could only store the difference, or just simple interpolate the missing datas between the two keys when no infocus object is present.
    (Sorry for my english, i hope you can at least get the core ideas from what i wrote.)

    br,
    Lajos

    • MAbrash says:

      Focus may matter, but it’s unclear how much it matters that HMDs are focused near infinity. After all, many people over 50 can only focus near infinity, due to loss of lens flexing, and yet they do fine looking at things up close through bifocals. Note that this is different from Avatar – in the scene you describe, there’s DOF, but in HMDs, everything is always in focus. That should not produce the dizziness effect you describe, which presumably results from they eye trying to bring the defocused area into focus, which is of course impossible. Here, the eye has no reason to try to adjust the focus; there is some conflict between vergence and focus, true, but again, people who wear bifocals don’t have significant problems with this.

      Your solution could be a good one – but it’s very processing intensive, and lightfield displays with decent resolution don’t exist, let alone at an affordable cost. So hypothetically it’s a great idea, but we may have to wait a while for it to be implementable.

      –Michael

      • Aaron Nicholls says:

        Depth of focus and other visual cues differ between the real world, an HMD, and a 3D movie as you indicate. Research suggests that incorrect or conflicting depth cues contribute to discomfort and eye strain, although other factors such as perspective distortion, lack of motion parallax, etc. come into play as well in the 3D movie case. Martin Banks and his team at UC Berkeley have done quite a bit of research in this space, and their 2011 paper The zone of comfort: Predicting visual discomfort with stereo displays goes into more detail if you’re interested.

  27. Mark Rejhon says:

    Hello Michael,

    No doubt that you’ve heard of the G-SYNC news!
    Blur Busters is rapidly expanding with a new G-SYNC section, too.
    I think G-SYNC would be hugely beneficial to VR, especially when combined with strobing:

    I’ve been working to solve the variable-rate strobe backlight problem
    for variable refresh monitors:
    http://www.blurbusters.com/faq/creating-strobe-backlight/#variablerefresh
    It eliminates flicker, since it blends to PWM-free at below 60Hz.

    In addition, you may be interested in my amateur vision research on
    strobe curve shapes:
    http://www.blurbusters.com/faq/creating-strobe-backlight/#linearvsgaussian

    Cheers,
    Mark Rejhon

    • MAbrash says:

      I’m not sure what the effect of G-SYNC on VR will be. On the one hand, having some flexibility in frame times is obviously helpful, especially when you can’t repeat frames without strobing. On the other hand, any significant variation in frame time with low persistence will cause variable problems with flicker and strobing. My current thinking is that slight (1ms or so) variation might be beneficial, but more than that will probably not produce good VR results. But that’s just a guess, and there’s only one way to find out.

  28. Mark Rejhon says:

    Right, it needs to be tested out. My thinking is soft blending between PWM-free and strobing (copy and pasted from my researching):

    10fps@10Hz — PWM-free backlight
    30fps@30Hz — PWM-free backlight
    45fps@45Hz — PWM-free backlight
    60fps@60fps — Minor backlight brightness undulations (bright / dim / bright / dim)
    80fps@80Hz — Sharper backlight brightness undulations (very bright / very dim)
    100fps@100Hz — Starts to resemble rounded-square-wave (fullbright / fulloff)
    120fps@120Hz and up — Nearly square-wave strobing like original LightBoost

    This is just an example, it would be a continuous gradation, and it would use a trailing-average brightness measurement to automatically ensure that average brightness is always constant at all times (over human flicker fusion threshold timescales), and to avoid flicker caused by dramatic framerate transitions.

    Arduino tests have shown that a steady LED (low DC) can transition to strobing (OFF-higher voltage-OFF-higher voltage), and then back to steady LED (low DC), while maintaining perceived brightness at all times. The problem is controlling the strobe transitions precisely. If PWM-free is not possible, then keep amplitude the same at all times. Instead blending (overlapping) high-frequency PWM (steady light at low Hz) with low-frequency PWM (strobing). During low HZ, it would be all consistent high frequency PWM. During middle Hz, the high frequency PWM between strobes is blended with the strobe (low-frequency PWM), as you go up in current Hz, the duty cycle goes more to OFF than ON, for the high frequency PWM between strrobes, and during high Hz, the high-frequency PWM between strobes disappears, so it’s pure strobing mode. It’s essentially frequency modulation, while making sure average brightness (ON-to-OFF periods) remains constant over flicker fusion thresholds. So you’re able to keep amplitude (LED voltage) constant, while using PWM to balance between steady light (PWM-free or high-frequency PWM) versus motion blur reduction strobes (1 strobe per refresh)

    Worth some experimentation, I think.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>