Nested Dielectrics

A few years ago, I wrote a post about attenuated transmission and what I called “deep attenuation” at the time- refraction and transmission through multiple mediums embedded inside of each other, a.k.a. what is usually called nested dielectrics. What I called “deep attenuation” in that post is, in its essence, just pure interface tracking using a stack. This post is meant as a revisit and update of that post; I’ll talk about the problems with the ad-hoc pure interface tracking technique I came up with in that previous post and discuss the proper priority-based nested dielectric technique [Schmidt and Budge 2002] that Takua uses today.

Figure 1: Ice cubes floating in tea inside of a glass teacup, rendered in Takua Renderer using priority-based nested dielectrics.

In my 2015 post, I included a diagram showing the overlapping boundaries required to model ice cubes in a drink in a glass, but I didn’t actually include a render of that scenario! In retrospect, the problems with the 2015 post would have become obvious to me more quickly if I had actually done a render like that diagram. Figure 1 shows an actual “ice cubes in a drink in a glass” scene, rendered correctly using Takua Renderer’s implementation of priority-based nested dielectrics. For comparison, Figure 2 shows what Takua produces using the approach in the 2015 post; there are a number of obvious bizarre problems! In Figure 2, the ice cubes don’t properly refract the tea behind and underneath them, and the ice cubes under the liquid surface aren’t visible at all. Also, where the surface of the tea interfaces with the glass teacup, there is a odd bright ring. Conversely, Figure 1 shows a correct liquid-glass interface without a bright ring, shows proper refraction through the ice cubes, and correctly shows the ice cubes under the liquid surface.

Figure 2: The same scene as in Figure 1, rendered using Takua's old interface tracking system. A number of bizarre physically inaccurate problems are present.

Problems with only Interface Tracking

So what exactly is wrong with using only interface tracking without priorities? First, let’s quickly summarize how my old interface tracking implementation worked. Note that here we refer to the side of a surface a ray is currently on as the incident side, and the other side of the surface as the transmit side. For each path, keep a stack of which Bsdfs the path has encountered:

  • When a ray enters a surface, push the encountered surface onto the stack.
  • When a ray exits a surface, scan the stack from the top down and pop the first instance of a surface in the stack matching the encountered surface.
  • When hitting the front side of a surface, the incident properties comes from the top of the stack (or is the empty default if the stack is empty), and the transmit properties comes from surface being intersected.
  • When hitting the back side of a surface, the incident properties comes from the surface being intersected, and the transmit properties comes from the top of the stack (or is the empty default if the stack is empty).
  • Only push/pop onto the stack when a refraction/transmission event occurs.

Next, as an example, imagine a case where which surface a ray currently in is ambiguous. A common example of this case is when two surfaces are modeled as being slightly overlapping, as is often done when modeling liquid inside of a glass since modeling perfectly coincident surfaces in CG is either extremely difficult or impossible due to floating point precision problems. Even if we could model perfectly coincident surfaces, rendering perfectly coincident surfaces without artifacts is similarly extremely difficult or impossible, also due to floating point precision problems. Figure 3 shows a diagram of how a glass containing water and ice cubes is commonly modeled; in Figure 3, the ambiguous regions are where the water surface is inside of the glass and inside of the ice cube. When a ray enters this overlapping region, it is not clear whether we should treat the ray as being inside the water or inside if the glass (or ice)!

Figure 3: A diagram of a path through a glass containing water and ice cubes, using only interface tracking without priorities.

Using the pure interface tracking algorithm from my old blog post, below is what happens at each path vertex along the path illustrated in Figure 3. In this example, we define the empty default to be air.

  1. Enter Glass.
    • Incident/transmit IOR: Air/Glass.
    • Push Glass onto stack. Stack after event: (Glass).
  2. Enter Water.
    • Incident/transmit IOR: Glass/Water.
    • Push Water onto stack. Stack after event: (Water, Glass).
  3. Exit Glass.
    • Incident/transmit IOR: Glass/Water.
    • Remove Glass from stack. Stack: (Water).
  4. Enter Ice.
    • Incident/transmit IOR: Water/Ice.
    • Push Ice onto stack. Stack: (Ice, Water).
  5. Exit Water.
    • Incident/transmit IOR: Water/Ice.
    • Remove Water from stack. Stack: (Ice).
  6. Exit Ice.
    • Incident/transmit IOR: Ice/Air.
    • Remove Ice from stack. Stack: empty.
  7. Enter Water.
    • Incident/transmit IOR: Air/Water.
    • Push Water onto stack. Stack after event: (Water).
  8. Enter Glass.
    • Incident/transmit IOR: Water/Glass.
    • Push Glass onto stack. Stack after event: (Glass, Water).
  9. Reflect off Water.
    • Incident/transmit IOR: Water/Glass.
    • No change to stack. Stack after event: (Glass, Water).
  10. Reflect off Glass.
    • Incident/transmit IOR: Glass/Glass.
    • No change to stack. Stack after event: (Glass, Water).
  11. Exit Water.
    • Incident/transmit IOR: Water/Glass.
    • Remove Water from stack. Stack after event: (Glass).
  12. Exit Glass.
    • Incident/transmit IOR: Glass/Air.
    • Remove Glass from stack. Stack after event: empty.

Observe events 3 and 5, where the same index of refraction boundary is encountered as in the previous event. These double events are where some of the weirdness in Figure 2 comes from; specifically the bright ring at the liquid-glass surface interface and the incorrect refraction through the ice cube. These double events are not actually physically meaningful; in reality, a ray could never be both inside of a glass surface and inside of a water surface simultaneously. Figure 4 shows a simplified version of the tea cup example above, without ice cubes; even then, the double event still causes a bright ring at the liquid-glass surface interface. Also note how when following the rules from my old blog post, event 10 becomes a nonsense event where the incident and transmit IOR are the same. The fix for this case is to modify the rules so that when a ray exits a surface, the transmit properties come from the first surface on the stack that isn’t the same as the incident surface, but even with this fix, the reflection at event 10 is still physically impossible.

Figure 4: Tea inside of a glass cup, rendered using Takua Renderer's old interface tracking system. Note the bright ring at the liquid-glass surface interface, produced by a physically incorrect double-refraction event.

Really what we want is to model overlapping surfaces, but then in overlapping areas, be able to specify which surface a ray should think it is actually inside of. Essentially, this functionality would make overlapping surfaces behave like boolean operators; we would be able to specify that the ice cubes in Figure 3 “cut out” a space from the water they overlap with, and the glass cut out a space from the water as well. This way, the double events never occur since rays wouldn’t see the second event in each pair of double events. One solution that immediately comes to mind is to simply consider whatever surface is at the top of the interface tracking stack as being the surface we are currently inside, but this causes an even worse problem: the order of surfaces that a ray thinks it is in becomes dependent on what surfaces a ray encounters first, which depends on the direction and location of each ray! This produces an inconsistent view of the world across different rays. Instead, a better solution is provided by priority-based nested dielectrics [Schmidt and Budge 2002].

Priority-Based Nested Dielectrics

Priority-based nested dielectrics work by assigning priority values to geometry, with the priority values determining which piece of geometry “wins” when a ray is in a region of space where multiple pieces of geometry overlap. A priority value is just a single number assigned as an attribute to a piece of geometry or to a shader; the convention established by the paper is that lower numbers indicate higher priority. The basic algorithm in [Schmidt and Budge 2002] works using an interior list, which is conceptually similar to an interface tracking stack. The interior list is exactly what it sounds like: a list of all of the surfaces that a path has entered but not exited yet. Unlike the interface tracking stack though, the interior list doesn’t necessarily have to be a stack or have any particular ordering, although implementing it as a list always sorted by priority may provide some minor practical advantages. When a ray hits a surface during traversal, the following rules apply:

  • If the surface has a higher or equal priority (so lower or equal priority number) than anything else on the interior list, the result is a true hit and a intersection has occured. Proceed with regular shading and Bsdf evaluation.
  • If the surface has a lower priority (so higher priority number) than the highest-priority value on the interior list, the result is a false hit and no intersection has occured. Ignore the intersection and continue with ray traversal.
  • If the hit is a false hit OR if the hit both is a true hit and results in a refraction/transmission event:
    • Add the surface to the interior list if the ray is entering the surface.
    • Remove the surface from the interior list if the ray is exiting the surface.
  • For a true hit the produces a reflection event, don’t add the surface to the interior list.

Note that this approach only works with surfaces that are enclosed manifolds; that is, every surface defines a finite volume. When a ray exits a surface, the surface it is exiting must already be in the interior list; if not, then the interior list can become corrupted and the renderer may start thinking that paths are in surfaces that they are not actually in (or vice verse). Also note that a ray can only ever enter into a higher-priority surface through finding a true hit, and can only enter into a lower-priority surface by exiting a higher-priority surface and removing the higher-priority surface from the interior list. At each true hit, we can figure out the properties of the incident and transmit sides by examining the interior list. If hitting the front side of a surface, before we update the interior list, the surface we just hit provides the transmit properties and the highest-priority surface on the interior list provides the incident properties. If hitting the back side of a surface, before we update the interior list, the surface we just hit provides the incident properties and the second-highest-priority surface on the interior list provides the transmit properties. Alternatively, if the interior list only contains one surface, then the transmit properties come from the empty default. Importantly, if a ray hits a surface with no priority value set, that surface should always count as a true hit. This way, we can embed non-transmissive objects inside of transmissive objects and have everything work automatically.

Figure 5 shows the same scenario as in Figure 3, but now with priority values assigned to each piece of geometry. The path depicted in Figure 5 uses the priority-based interior list; dotted lines indicate parts of a surface that produce false hits due to being embedded within a higher-priority surface:

Figure 5: The same setup as in Figure 3, but now using priority values. The path is calculated using a priority-based interior list.

The empty default air surrounding everything is defined as having an infinitely high priority value, which means a lower priority than any surface in the scene. Using the priority-based interior list, here are the events that occur at each intersection along the path in Figure 5:

  1. Enter Glass.
    • Glass priority (1) is higher than ambient air (infinite), so TRUE hit.
    • Incident/transmit IOR: Air/Glass.
    • True hit, so evaluate Bsdf and produce refraction event.
    • Interior list after event: (Glass:1). Inside surface after event: Glass.
  2. Enter Water.
    • Water priority (2) is lower than highest priority in interior list (1), so FALSE hit.
    • Incident/transmit IOR: N/A.
    • False hit, so do not evaluate Bsdf and just continue straight.
    • Interior list after event: (Glass:1, Water:2). Inside surface after event: Glass.
  3. Exit Glass.
    • Glass priority (1) is equal to the highest priority in interior list (1), so TRUE hit.
    • Incident/transmit IOR: Glass/Water.
    • True hit, so evaluate Bsdf and produce refraction event. Remove Glass from interior list.
    • Interior list after event: (Water:2). Inside surface after event: Water.
  4. Enter Ice.
    • Ice priority (0) is higher than the highest priority in interior list (2), so TRUE hit.
    • Incident/transmit IOR: Water/Ice.
    • True hit, so evaluate Bsdf and produce refraction event.
    • Interior list after event: (Water:2, Ice:0). Inside surface after event: Ice.
  5. Exit Water.
    • Ice priority (0) is higher than the highest priority in interior list (2), so TRUE hit.
    • Incident/transmit IOR: N/A.
    • False hit, so do not evaluate Bsdf and just continue straight. Remove Water from interior list.
    • Interior list after event: (Ice:0). Inside surface after event: Ice.
  6. Exit Ice.
    • Ice priority is only surface in the interior list, so TRUE hit.
    • Incident/transmit IOR: Ice/Air.
    • True hit, so evaluate Bsdf and produce refraction event. Remove Ice from interior list.
    • Interior list after event: empty. Inside surface after event: air.
  7. Enter Water.
    • Water priority (2) is higher than ambient air (infinite), so TRUE hit.
    • Incident/transmit IOR: Air/Water.
    • True hit, so evaluate Bsdf and produce refraction event.
    • Interior list after event: (Water:2). Inside surface after event: Water.
  8. Enter Glass.
    • Glass priority (1) is higher than the highest priority in interior list (2), so TRUE hit.
    • Incident/transmit IOR: Water/Glass.
    • True hit, so evaluate Bsdf and produce refraction event.
    • Interior list after event: (Water:2, Glass:1). Inside surface after event: Glass.
  9. Exit Water.
    • Water priority (2) is lower than highest priority in interior list (1), so FALSE hit.
    • Incident/transmit IOR: N/A.
    • False hit, so do not evaluate Bsdf and just continue straight.
    • Interior list after event: (Glass:1). Inside surface after event: Glass.
  10. Reflect off Glass.
    • Glass priority (1) is equal to the highest priority in interior list (1), so TRUE hit.
    • Incident/transmit IOR: Glass/Air.
    • True hit, so evaluate Bsdf and produce reflection event.
    • Interior list after event: (Glass:1). Inside surface after event: Glass.
  11. Enter Water.
    • Water priority (2) is lower than highest priority in interior list (1), so FALSE hit.
    • Incident/transmit IOR: N/A.
    • False hit, so do not evaluate Bsdf and just continue straight.
    • Interior list after event: (Glass:1, Water:2). Inside surface after event: Glass.
  12. Reflect off Glass.
    • Glass priority (1) is equal to the highest priority in interior list (1), so TRUE hit.
    • Incident/transmit IOR: Glass/Water.
    • True hit, so evaluate Bsdf and produce reflection event.
    • Interior list after event: (Glass:1, Water:2). Inside surface after event: Glass.
  13. Exit Water.
    • Water priority (2) is lower than highest priority in interior list (1), so FALSE hit.
    • Incident/transmit IOR: N/A.
    • False hit, so do not evaluate Bsdf and just continue straight.
    • Interior list after event: (Glass:1). Inside surface after event: Glass.
  14. Exit Glass.
    • Glass priority (1) is equal to the highest priority in interior list (1), so TRUE hit.
    • Incident/transmit IOR: Glass/Air.
    • True hit, so evaluate Bsdf and produce refraction event. Remove Glass from interior list.
    • Interior list after event: empty. Inside surface after event: air.

The entire above sequence of events is physically plausible, and produces no weird double-events! Using priority-based nested dielectrics, Takua generates the correct images in Figure 1 and Figure 6. Note how in Figure 6 below, the liquid appears to come right up against the glass, without any bright boundary artifacts or anything else.

For actually implementing priorty-based nested dielectrics in a ray tracing renderer, I think there are two equally plausible places in the renderer where the implementation can take place. The first and most obvious location is as part of standard light transport integration or shading system. The integrator would be in charge of checking for false hits and tracing continuation rays through false hit geometry. A second, slightly less obvious location is actually as part of ray traversal through the scene itself. Including handling of false hits in the traversal system can be more efficient than handling it in the integrator since the false hit checks could be done in the middle of a single BVH tree traversal, whereas handling false hits by firing continuation rays requires a new BVH tree traversal for each false hit encountered. Also, handling false hits in the traversal system removes some complexity from the integrator. However, the downside to handling false hits in the traversal system is that it requires plumbing all of the interior list data and logic into the traversal system, which sets up something of a weird backwards dependency between the traversal and shading/integration systems. I wound up choosing to implement priority-based nested dielectrics in the integration system in Takua, simply to avoid having to do complex, weird plumbing back into the traversal system. Takua uses priority-based nested dielectrics in all integrators, including unidirectional path tracing, BDPT, PPM, and VCM, and also uses the nested dielectrics system to handle transmittance along bidirectional connections through attenuating mediums.

Figure 6: The same tea in a glass cup scene as in Figure 4, rendered correctly using Takua's priority-based nested dielectrics implementation.

Even though the technique has “nested dielectrics” in the title, this technique is not in principle limited to only dielectrics. In Takua, I now use this technique to handle all transmissive cases, including for both dielectric surfaces and for surfaces with diffuse transmission. Also, in addition to just determining the incident and transmit IORs, Takua uses this system to also determine things like what kind of participating medium a ray is currently inside of in order to calculate attenuation. This technique appears to be more or less the industry standard today; implementations are available for at least Renderman, Arnold, Mantra, and Maxwell Render.

As a side note, during the course of this work, I also upgraded Takua’s attenuation system to use ratio tracking [Novák et al. 2014] instead of ray marching when doing volumetric lookups. This change results in an important improvement to the attenuation system: ratio tracking provides an unbiased estimate of transmittance, whereas ray marching is inherently biased due to being a quadrature-based technique.

Figures 7 and 8 show a fancier scene of liquid pouring into a glass with some ice cubes and such. This scene is the Glass of Water scene from Benedikt Bitterli’s rendering resources page [Bitterli 2016], modified with brighter lighting on a white backdrop and with red liquid. I also had to modify the scene so that the liquid overlaps the glass slightly; providing a clearer read for the liquid-glass interface is why I made the liquid red. One of the neat features of this scene are the cracks modeled inside of the ice cubes; the cracks are non-manifold geometry. To render them correctly, I applied a shader with glossy refraction to the crack geometry but did not set a priority value for them; this works correctly because the cracks, being non-manifold, don’t have a concept of inside or outside anyway, so they should not participate in any interior list considerations.

Figure 7: Cranberry juice pouring into a glass with ice cubes, rendered using Takua's priority-based nested dielectrics. The scene is from Benedikt Bitterli's rendering resources page.

Figure 8: A different camera angle of the scene from Figure 7. The scene is from Benedikt Bitterli's rendering resources page.

References

Benedikt Bitterli. 2016. Rendering Resources. Retrieved from https://benedikt-bitterli.me/resources/.

Jan Novák, Andrew Selle and Wojciech Jarosz. 2014. Residual Ratio Tracking for Estimating Attenuation in Participating Media. ACM Transactions on Graphics. 33, 6 (2014), 179:1-179:11.

Charles M. Schmidt and Brian Budge. 2002. Simple Nested Dielectrics in Ray Traced Images. Journal of Graphics Tools. 7, 2 (2002), 1–8.

Some Blog Update Notes

For the past few years, my blog posts covering personal work have trended towards gignormous epic articles tackling huge subjects published only once or twice a year, such as with the bidirectional mipmapping post and its promised but still unfinished part 2. Unfortunately, I’m not the fastest writer when working on huge posts, since writing those posts often involves significant learning and multiple iterations of implementation and testing on my part. Over the next few months, I’m aiming to write more posts similar to this one, covering some relatively smaller topics, so that I can get posts coming out a bit more frequently while I continue to work on several upcoming, gignormous posts on long-promised topics. Or at least, that’s the plan… we’ll see!

Ralph Breaks the Internet

The Walt Disney Animation Studios film for 2018 is Ralph Breaks the Internet, which is the sequel to 2012’s Wreck-It Ralph. Over the past two years, I’ve been fortunate enough to work on a number of improvements to Disney’s Hyperion Renderer for Ralph Breaks the Internet; collectively, these improvements make up perhaps the biggest jump in rendering capabilities that Hyperion has seen since the original deployment of Hyperion on Big Hero 6. I got my third Disney Animation credit on Ralph Breaks the Internet!

Over the past two years, the Hyperion team has publicly presented a number of major development efforts and research advancements. Many of these advancements were put into experimental use on Olaf’s Frozen Adventure last year, but Ralph Breaks the Internet is the first time we’ve put all of these new capabilities and features into full-scale production together. I was fortunate enough to be fairly deeply involved in several of these efforts (specifically, traversal improvements and volume rendering). One of my favorite things about working at Disney Animation is how production and technology partner together to make our films; we truly would not have been able to pull off any of Hyperion’s new advancements without production’s constant support and willingness to try new things in the name of advancing the artistry of our films.

Ralph Breaks the Internet is our first feature film to use Hyperion’s new spectral and decomposition tracking [Kutz et al. 2017] based null-collision volume rendering system exclusively. Originally we had planned to use the new volume rendering system side-by-side with Hyperion’s previous residual ratio tracking [Novák 2014] based volume rendering system [Fong 2017], but the results from the new system were so compelling that the show decided to switch over to the new volume rendering exclusively, which in turn allowed us to deprecate and remove the old volume rendering system ahead of schedule. This new volume rendering system is the culmination of two years of work from Ralf Habel, Peter Kutz, Patrick Kelly, and myself. We had the enormous privilege of working with a large number of FX and lighting artists to develop, test, and refine this new system; specifically, I want to call out Jesse Erickson, Henrik Falt, and Alex Nijmeh for really championing the new volume rendering system and encouraging and supporting its development. We also owe an enormous amount to the rest of the Hyperion development team, which gave us the time and resources to spent two years building a new volume rendering system essentially from scratch. Finally, I want to underscore that the research and underpins our new volume rendering system was conducted jointly between us and Disney Research Zürich, and that this could not have happened without our colleagues at Disney Research Zürich (specifically, Jan Novák and Marios Papas); I think this entire project has been a huge shining example of the value and importance of having a dedicated blue-sky research division. Every explosion and cloud and dust plume and every bit of fog and atmospherics you see in Ralf Breaks in the Internet was rendered using the new volume rendering system! Interestingly, we actually found that while the new volume rendering system is much faster and much more efficient at rendering dense volumes (and especially volumes with lots of high-order scattering) compared to the old system, the new system actually has some difficulty rendering thin volumes such as mist and atmospheric fog. This isn’t be surprising, since thin volumes require better transmittance sampling over better distance sampling and null collision volume rendering is really optimized for distance sampling. We were able to work with production to come up with workarounds for this problem on Ralph Breaks the Internet, but this area is definitely a good topic for future research.

Ralph Breaks the Internet is also our first feature film to move to exclusively using brute force path-traced subsurface scattering [Chiang 2016] for all characters, as a replacement for Hyperion’s previous normalized diffusion based subsurface scattering [Burley 2015]. This feature was tested on Olaf’s Frozen Adventure in a limited capacity, but Ralph Breaks the Internet is the first time we’ve switched path-traced subsurface to being to default subsurface mode in the renderer. Matt Chiang, Peter Kutz, and Brent Burley put a lot of effort into developing new sampling techniques to reduce color noise in subsurface scattering, and also into developing a new parameterization that closely matched Hyperion’s normalized diffusion parameterization, which allowed artists to basically just flip a switch between normalized diffusion and path-traced subsurface and get a predictable, similar result. Many more details on Hyperion’s path-traced subsurface implementation are in our recent system architecture paper [Burley 2018]. In addition to making characters we already know, such as Ralph and Vanellope, look better and more detailed, path-traced subsurface scattering also proved critical to hitting the required looks for new characters, such as the slug-like Double Dan character.

When Ralph and Vanellope first enter the world of the internet, there are several establishing shots showing vast vistas of the enormous infinite metropolis that the film depicts the internet as. Early in production, some render tests of the internet metropolis proved to be extremely challenging due to the sheer amount of geometry in the scene. Although instancing was used extensively, the way the scenes had to be built in our production pipeline meant that Hyperion wasn’t able to leverage the instancing in the scene as efficiently as we would have liked. Additionally, the way the instance groups were structured made traversal in Hyperion less ideal than it could have been. After encountering smaller-scale versions of the same problems on Moana, Peter Kutz and I had arrived at an idea that we called “multiple entry points”, which basically lets Hyperion blur the lines between top and bottom level BVHs in a two-level BVH structure. By inserting mid-level nodes from bottom level BVHs in to the top-level BVH, Hyperion can produce a much more efficient top-level BVH, dramatically accelerating rendering of large instance groups and other difficult-to-split pieces of large geometry, such as groundplanes. This idea is very similar to BVH rebraiding [Benthin et al. 2017], but we arrived at our approach independently before the publication of BVH rebraiding. After initial testing on Olaf’s Frozen Adventure proved promising, we enabled multiple entry points by default for the entirety of Ralph Breaks the Internet. Additionally, Dan Teece developed a powerful automatic geometry de-duplication system, which allows Hyperion to reclaim large amounts of memory in cases where multiple instance groups are authored with separate copies of the same master geometry. Greg Nichols and I also developed a new multithreading strategy for handling Hyperion’s ultra-wide batched ray traversal, which significantly improved Hyperion’s multithreaded scalability during traversal to near-linear scaling with number of cores. All of these geometry and traversal improvements collectively meant that by the main production push for the show, render times for the large internet vista shots had dropped from being by far the highest in the show to being indistinguishable from any other normal shot. These improvements also proved to be timely, since the internet set was just the beginning of massive-scale geometry and instancing on Ralph Breaks the Internet; solving the render efficiency problems for the internet set also made other large-scale instancing sequences, such as the Ralphzilla battle [Byun et al. 2019] at the end of the film and the massive crowds [Richards et al. 2019] in the internet, easier to render.

Another major advancement we made on Ralph Breaks the Internet, in collaboration with Disney Research Zürich and our sister studio Pixar Animation Studios, is a new machine-learning based denoiser. To the best of my knowledge, Disney Animation was one of the first studios with a successful widescale deployment of a production denoiser on Big Hero 6. The Hyperion denoiser used from Big Hero 6 through Olaf’s Frozen Adventure is a hand-tuned denoiser based on and influenced by [Li et al. 2012] and [Rousselle et al. 2012], and has since been adopted by the Renderman team as the production denoiser that ships with Renderman today. Midway through production on Ralph Breaks the Internet, David Adler from the Hyperion team in collaboration with Fabrice Rousselle, Jan Novák, Gerhard Röthilin, and others from Disney Research Zürich were able to deploy a new, next-generation machine-learning based denoiser [Vogels et al. 2018] Developed primarily by Disney Research Zürich, the new machine-learning denoiser allowed us to cut render times by up to 75% in some cases. This example is yet another case of basic scientific research at Disney Research leading to unexpected but enormous benefits to production in all of the wider Walt Disney Company’s various animation studios!

In addition to everything above, many more smaller improvements were made in all areas of Hyperion for Ralph Breaks the Internet. Dan Teece developed a really cool “edge” shader module, which was used to create all of the silhouette edge glows in the internet world, and Dan also worked closely with FX artists to develop render-side support for various fracture and destruction workflows [Harrower et al. 2018]. Brent Burley developed several improvements to Hyperion’s depth of field support, including a realistic cat’s eye bokeh effect. Finally, as always, the production of Ralph Breaks the Internet has inspired many more future improvements to Hyperion that I can’t write about yet, since they haven’t been published yet.

The original Wreck-It Ralph is one of my favorite modern Disney movies, and I think Ralph Breaks the Internet more than lives up to the original. The film is smart and hilarious while maintaining the depth that made the first Wreck-It Ralph so good. Ralph and Vanellope are just as lovable as before and grow further as characters, and all of the new characters are really awesome (Shank and Yesss and the film’s take on the Disney princesses are particular favorites of mine). More importantly for a rendering blog though, the film is also just gorgeous to look at. With every film, the whole studio takes pride in pushing the envelope even further in terms of artistry, craftsmanship, and sheer visual beauty. The number of environments and settings in Ralph Breaks the Internet is enormous and highly varied; the internet is depicted as a massive city that pushed the limits on how much visual complexity we can render (and from our previous three feature films, we can already render an unbelievable amount!), old locations from the first Wreck-It Ralph are revisited with exponentially more visual detail and richness than before, and there’s even a full on musical number with theatrical lighting somewhere in there!

Below are some stills from the movie, in no particular order, 100% rendered using Hyperion. If you want to see more, or if you just want to see a really great movie, go see Ralph Breaks the Internet on the biggest screen you can find! There are a TON of easter eggs in the film to look out for, and I highly recommend sticking around after the credits for this one.

Here is the part of the credits with Disney Animation’s rendering team! Also, Ralph Breaks the Internet was my wife Harmony Li’s first credit at Disney Animation (she previously was at Pixar)! This frame is kindly provided by Disney. Every person you see in the credits worked really hard to make Ralph Breaks the Internet an amazing film!

All images in this post are courtesy of and the property of Walt Disney Animation Studios.

References

Carsten Benthin, Sven Woop, Ingo Wald, and Attila T. Áfra. 2017. Improved Two-Level BVHs using Partial Re-Braiding. In HPG ‘17 (Proceedings of High Performance Graphics). 7:1-7:8.

Brent Burley. Extending the Disney BRDF to a BSDF with Integrated Subsurface Scattering. 2015. In ACM SIGGRAPH 2015 Course Notes: Physically Based Shading in Theory and Practice.

Brent Burley, David Adler, Matt Jen-Yuan Chiang, Hank Driskill, Ralf Habel, Patrick Kelly, Peter Kutz, Yining Karl Li, and Daniel Teece. 2018. The Design and Evolution of Disney’s Hyperion Renderer. ACM Transactions on Graphics. 37, 3 (2018), 33:1-33:22.

Dong Joo Byun, Alberto Luceño Ros, Alexander Moaveni, Marc Bryant, Joyce Le Tong, and Moe El-Ali. 2019. Creating Ralphzilla: Moshpit, Skeleton Library and Automation Framework. In ACM SIGGRAPH 2019 Talks. 66:1-66:2.

Matt Jen-Yuan Chiang, Peter Kutz, and Brent Burley. 2016. Practical and Controllable Subsurface Scattering for Production Path Tracing. In ACM SIGGRAPH 2016 Talks. 49:1-49:2.

Julian Fong, Magnus Wrenninge, Christopher Kulla, and Ralf Habel. 2017. Production Volume Rendering. In ACM SIGGRAPH 2017 Courses.

Will Harrower, Pete Kyme, Ferdi Scheepers, Michael Rice, Marie Tollec, and Alex Moaveni. 2018. SimpleBullet: Collaborating on a Modular Destruction Toolkit. In ACM SIGGRAPH 2018 Talks. 79:1-79:2.

Peter Kutz, Ralf Habel, Yining Karl Li, and Jan Novák. 2017. Spectral and Decomposition Tracking for Rendering Heterogeneous Volumes. ACM Transactions on Graphics. 36, 4 (2017), 111:1-111:16.

Tzu-Mao Li, Yu-Ting Wu, and Yung-Yu Chiang. 2012. SURE-based Optimization for Adaptive Sampling and Reconstruction. ACM Transactions on Graphics. 31, 6 (2012), 194:1-1949:

Jan Novák, Andrew Selle, and Wojciech Jarosz. 2014. Residual Ratio Tracking for Estimating Attenuation in Participating Media. ACM Transactions on Graphics. 33, 6 (2014), 179:1-179:11.

Josh Richards, Joyce Le Tong, Moe El-Ali, and Tuan Nguyen. 2019. Optimizing Large Scale Crowds in Ralph Breaks the Internet. In ACM SIGGRAPH 2019 Talks. 65:1-65:2.

Fabrice Rousselle, Marco Manzi, and Matthias Zwicker. 2013. Robust Denoising using Feature and Color Information. Computer Graphics Forum. 32, 7 (2013), 121-130.

Thijs Vogels, Fabrice Rousselle, Brian McWilliams, Gerhard Röthlin, Alex Harvill, David Adler, Mark Meyer, and Jan Novák. 2018. Denoising with Kernel Prediction and Asymmetric Loss Functions. ACM Transactions on Graphics. 37, 4 (2018), 124:1-124:15.

Mipmapping with Bidirectional Techniques

One major feature that differentiates production-capable renderers from hobby or research renderers is a texture caching system. A well-implemented texture caching system is what allows a production renderer to render scenes with potentially many TBs of textures, but in a reasonable footprint that fits in a few dozen or a hundred-ish GB of RAM. Pretty much every production renderer today has a robust texture caching system; Arnold famously derives a significant amount of performance from an extremely efficient texture cache implementation, and Vray/Corona/Renderman/Hyperion/etc. all have their own, similarly efficient systems.

In this post and the next few posts, I’ll write about how I implemented a tiled, mipmapped texture caching system in my hobby renderer, Takua Renderer. I’ll also discuss some of the interesting challenges I ran into along the way. This post will focus on the mipmapping part of the system. Building a tiled mipmapping system that works well with bidirectional path tracing techniques was particularly difficult, for reasons I’ll discuss later in this post. I’ll also review the academic literature on ray differentials and mipmapping with path tracing, and I’ll take a look at what several different production renderers do. The scene I’ll use as an example in this post is a custom recreation of a forest scene from Evermotion’s Archmodels 182, rendered entirely using Takua Renderer (of course):

Figure 1: A forest scene in the morning, rendered using Takua Renderer. 6 GB of textures on disk accessed using a 1 GB in-memory texture cache.

Intro: Texture Caches and Mipmaps

Texture caching is typically coupled with some form of a tiled, mipmapped [Williams 1983] texture system; the texture cache holds specific tiles of an image that were accessed, as opposed to an entire texture. These tiles are typically lazy-loaded on demand into a cache [Peachey 1990], which means the renderer only needs to pay the memory storage cost for only parts of a texture that the renderer actually accesses.

The remainder of this section and the next section of this post are a recap of what mipmaps are, mipmap level selection, and ray differentials for the less experienced reader. I also discuss a bit about what techniques various production renderers are known to use today. If you already know all of this stuff, I’d suggest skipping down a bit to the section titled “Ray Differentials and Bidirectional Techniques”.

Mipmapping works by creating multiple resolutions of a texture, and for a given surface, only loading the last resolution level where the frequency detail falls below the Nyquist limit when viewed from the camera. Since textures are often much more high resolution than the final framebuffer resolution, mipmapping means the renderer can achieve huge memory savings, since for objects further away from the camera, most loaded mip levels will be significantly lower resolution than the original texture. Mipmaps start with the original full resolution texture as “level 0”, and then each level going up from level 0 is half the resolution of the previous level. The highest level is the level at which the texture can no longer be halved in resolution again.

Below is an example of a mipmapped texture. The texture below is the diffuse albedo texture for the fallen log that is in the front of the scene in Figure 1, blocking off the path into the woods. On the left side of Figure 2 is level 1 of this texture (I have omitted level 0 both for image size reasons and because the original texture is from a commercial source, which I don’t have the right to redistribute in full resolution). On the right side, going from the top on down, are levels 2 through 11 of the mipmap. I’ll talk about the “tiled” part in a later post.

Figure 2: A mipmapped texture. Level 1 of the mipmap is shown on the left, levels 2 through 11 are shown on the right. Level 0 is not shown here. A bit of terminology that is often confusing: the lowest mipmap level is the highest resolution level, while the highest mipmap level is the lowest resolution level.

Before diving into details, I need to make a major note: I’m not going to write too much about texture filtering for now, mainly because I haven’t done much with texture filtering in Takua at all. Mipmapping was originally invented as an elegant solution to the problem of expensive texture filtering in rasterized rendering; when a texture had detail that was more high frequency than the distance between neighboring pixels in the framebuffer, aliasing would occur when the texture was sampled. Mipmaps are typically generated with pre-computed filtering for mip levels above the original resolution, allowing for single texture samples to appear antialiased. For a comprehensive discussion of texture filtering, how it relates to mipmaps, and more advanced techniques, see section 10.4.3 in Physically Based Rendering 3rd Edition [Pharr et al. 2016].

For now, Takua just uses a point sampler for all texture filtering; my interest in mipmaps is mostly for memory efficiency and texture caching instead of filtering. My thinking is that in a path tracer that is going to generate hundreds or even thousands of paths for each framebuffer pixel, the need for single-sample antialiasing becomes somewhat lessened, since we’re already basically supersampling. Good texture filtering is still ideal of course, but being lazy and just relying on supersampling to get rid of texture aliasing in primary visibility is… not necessarily the worst short-term solution in the world. Furthermore, relying on just point sampling means each texture sample only requires two texture lookups: one from the integer mip level and one from the integer mip level below the continuous float mip level at a sample point (see the next section for more on this). Using only two texture lookups per texture sample is highly efficient due to minimized memory access and minimized branching in the code. Interestingly, the Moonray team at Dreamworks Animation arrived at more or less the same conclusion [Lee et al. 2017]; they point out in their paper that geometric complexity, for all intents and purposes, has an infinite frequency, whereas pre-filtered mipmapped textures are already band limited. As a result, the number of samples required to resolve geometric aliasing should be more than enough to also resolve any texture aliasing. The Moonray team found that this approach works well enough to be their default mode in production.

Mipmap Level Selection and Ray Differentials

The trickiest part of using mipmapped textures is figuring out what mipmap level to sample at any given point. Since the goal is to find a mipmap level with a frequency detail as close to the texture sampling rate as possible, we need to have a sense of what the texture sampling rate at a given point in space relative to the camera will be. More precisely, we want the differential of the surface parameterization (a.k.a. how uv space is changing) with respect to the image plane. Since the image plane is two-dimensional, we will end up with a differential for each uv axis with respect to each axis of the image plane; we call these differentials dudx/dvdx and dudy/dvdy, where u/v are uv coordinates and x/y are image plane pixel coordinates. Calculating these differentials is easy enough in a rasterizer: for each image plane pixel, take the texture coordinate of the fragment and subtract with the texture coordinates of the neighboring fragments to get the gradient of the texture coordinates with respect to the image plane (a.k.a. screen space), and then scale by the texture resolution. Once we have dudx/dvdx and dudy/dvdy, for a non-fancy box filter all we have to do to get the mipmap level is take the longest of these gradients and calculate its logarithm base 2. A code snippet might look something like this:

float mipLevelFromDifferentialSurface(const float dudx,
                                      const float dvdx,
                                      const float dudy,
                                      const float dvdy,
                                      const int maxMipLevel) {
    float width = max(max(dudx, dvdx), max(dudy, dvdy));
    float level = float(maxMipLevel) + log2(width);
    return level;
}

Notice that the level value is a continuous float. Usually, instead of rounding level to an integer, a better approach is to sample both of the integer mipmap levels above and below the continuous level and blend between the two values using the fractional part of level. Doing this blending helps immensely with smoothing transitions between mipmap levels, which can become very important when rendering an animated sequence with camera movement.

In a ray tracer, however, figuring out dudx/dvdx and dudy/dvdy is not as easy as in a rasterizer. If we are only considering primary rays, we can do something similar to the rasterization case: fire a ray from a given pixel and fire rays from the neighboring pixels, and calculate the gradient of the texture coordinates with respect to screen space (the screen space partial derivatives) by examining the hit points of each neighboring ray that hits the same surface as the primary ray. This approach rapidly falls apart though, for the following reasons and more:

  • If a ray hits a surface but none of its neighboring rays hit the same surface, then we can’t calculate any differentials and must fall back to point sampling the lowest mip level. This isn’t a problem in the rasterization case, since rasterization will run through all of the polygons that make up a surface, but in the ray tracing case, we only know about surfaces that we actually hit with a ray.
  • For secondary rays, we would need to trace secondary bounces not just for a given pixel’s ray, but also its neighboring rays. Doing so would be necessary since, depending on the bsdf at a given surface, the distance between the main ray and its neighbor rays can change arbitrarily. Tracing this many additional rays quickly becomes prohibitively expensive; for example, if we are considering four neighbors per pixel, we are now tracing five times as many rays as before.
  • We would also have to continue to guarantee that neighbor secondary rays continue hitting the same surface as the main secondary ray, which will become arbitrarily difficult as bxdf lobes widen or narrow.

A better solution to these problems is to use ray differentials [Igehy 1999], which is more or less just a ray along with the partial derivative of the ray with respect to screen space. Thinking of a ray differential as essentially similar to a ray with a width or a cone, similar to beam tracing [Heckbert and Hanrahan 1984], pencil tracing [Shinya et al. 1987], or cone tracing [Amanatides 1984], is not entirely incorrect, but ray differentials are a bit more nuanced than any of the above. With ray differentials, instead of tracing a bunch of independent neighbor rays with each camera ray, the idea is to reconstruct dudx/dvdy and dudy/dvdy at each hit point using simulated offset rays that are reconstructed using the ray’s partial derivative. Ray differentials are generated alongside camera rays; when a ray is traced from the camera, offset rays are generated for a single neighboring pixel vertically and a single neighboring pixel horizontally in the image plane. Instead of tracing these offset rays independently, however, we always assume they are at some angular width from main ray. When the main ray hits a surface, we need to calculate for later use the differential of the surface at the intersection point with respect to uv space, which is called dpdu and dpdv. Different surface types will require different functions to calculate dpdu and dpdv; for a triangle in a triangle mesh, the code requires the position and uv coordinates at each vertex:

DifferentialSurface calculateDifferentialSurfaceForTriangle(const vec3& p0,
                                                            const vec3& p1,
                                                            const vec3& p2,
                                                            const vec2& uv0,
                                                            const vec2& uv1,
                                                            const vec2& uv2) {
    vec2 duv02 = uv0 - uv2;
    vec2 duv12 = uv1 - uv2;
    float determinant = duv02[0] * duv12[1] - duv02[1] * duv12[0];

    vec3 dpdu, dpdv;

    vec3 dp02 = p0 - p2;
    vec3 dp12 = p1 - p2;
    if (abs(determinant) == 0.0f) {
        vec3 ng = normalize(cross(p2 - p0, p1 - p0));
        if (abs(ng.x) > abs(ng.y)) {
            dpdu = vec3(-ng.z, 0, ng.x) / sqrt(ng.x * ng.x + ng.z * ng.z);
        } else {
            dpdu = vec3(0, ng.z, -ng.y) / sqrt(ng.y * ng.y + ng.z * ng.z);
        }
        dpdv = cross(ng, dpdu);
    } else {
        float invdet = 1.0f / determinant;
        dpdu = (duv12[1] * dp02 - duv02[1] * dp12) * invdet;
        dpdv = (-duv12[0] * dp02 + duv02[0] * dp12) * invdet;
    }
    return DifferentialSurface(dpdu, dpdv);
}

Calculating surface differentials does add a small bit of overhead to the renderer, but the cost can be minimized with some careful work. The naive approach to surface differentials is to calculate them with every intersection point and return them as part of the hit point information that is produced by ray traversal. However, this computation is wasted if the shading operation for a given hit point doesn’t actually end up doing any texture lookups. In Takua, surface differentials are calculated on demand at texture lookup time instead of at ray intersection time; this way, we don’t have to pay the computational cost for the above function unless we actually need to do texture lookups. Takua also supports multiple uv sets per mesh, so the above function is parameterized by uv set ID, and the function is called once for each uv set that a texture specifies. Surface differentials are also cached within a shading operation per hit point, so if a shader does multiple texture lookups within a single invocation, the required surface differentials don’t need to be redundantly calculated.

Sony Imageworks’ variant of Arnold (we’ll refer to it as SPI Arnold to disambiguate from Solid Angle’s Arnold) does something even more advanced [Kulla et al. 2018]. Instead of the above explicit surface differential calculation, SPI Arnold implements an automatic differentiation system utilizing dual arithmetic [Piponi 2004]. SPI Arnold extensively utilizes OSL for shading; this means that they are able to trace at runtime what dependencies a particular shader execution path requires, and therefore when a shader needs any kind of derivative or differential information. The calls to the automatic differentiation system are then JITed into the shader’s execution path, meaning shader authors never have to be aware of how derivatives are computed in the renderer. The SPI Arnold team’s decision to use dual arithmetic based automatic differentiation is influenced by lessons they had previously learned with BMRT’s finite differencing system, which required lots of extraneous shading computations for incoherent ray tracing [Gritz and Hahn 1996]. At least for my purposes, though. I’ve found that the simpler approach I have taken in Takua is sufficiently negligible in both final overhead and code complexity that I’ll probably skip something like the SPI Arnold approach for now.

Once we have the surface differential, we can then approximate the local surface geometry at the intersection point with a tangent plane, and intersect the offset rays with the tangent plane. To find the corresponding uv coordinates for the offset ray tangent plane intersection planes, dpdu/dpdv, the main ray intersection point, and the offset ray intersection points can be used to establish a linear system. Solving this linear system leads us directly to dudx/dudy and dvdx/dvdy; for the exact mathematical details and explanation, see section 10.1 in Physically Based Rendering 3rd Edition. The actual code might look something like this:

// This code is heavily aped from PBRT v3; consult the PBRT book for details!
vec4 calculateScreenSpaceDifferential(const vec3& p,            // Surface intersection point
                                      const vec3& n,            // Surface normal
                                      const vec3& origin,       // Main ray origin
                                      const vec3& rDirection,   // Main ray direction
                                      const vec3& xorigin,      // Offset x ray origin
                                      const vec3& rxDirection,  // Offset x ray direction
                                      const vec3& yorigin,      // Offset y ray origin
                                      const vec3& ryDirection,  // Offset y ray direction
                                      const vec3& dpdu,         // Surface differential w.r.t. u
                                      const vec3& dpdv          // Surface differential w.r.t. v
                                      ) {
    // Compute offset-ray intersection points with tangent plane
    float d = dot(n, p);
    float tx = -(dot(n, xorigin) - d) / dot(n, rxDirection);
    vec3 px = origin + tx * rxDirection;
    float ty = -(dot(n, yorigin) - d) / dot(n, ryDirection);
    vec3 py = origin + ty * ryDirection;
    vec3 dpdx = px - p;
    vec3 dpdy = py - p;

    // Compute uv offsets at offset-ray intersection points
    // Choose two dimensions to use for ray offset computations
    ivec2 dim;
    if (std::abs(n.x) > std::abs(n.y) && std::abs(n.x) > std::abs(n.z)) {
        dim = ivec2(1,2);
    } else if (std::abs(n.y) > std::abs(n.z)) {
        dim = ivec2(0,2);
    } else {
        dim = ivec2(0,1);
    }
    // Initialize A, Bx, and By matrices for offset computation
    mat2 A;
    A[0][0] = ds.dpdu[dim[0]];
    A[0][1] = ds.dpdv[dim[0]];
    A[1][0] = ds.dpdu[dim[1]];
    A[1][1] = ds.dpdv[dim[1]];
    vec2 Bx(px[dim[0]] - p[dim[0]], px[dim[1]] - p[dim[1]]);
    vec2 By(py[dim[0]] - p[dim[0]], py[dim[1]] - p[dim[1]]);

    float dudx, dvdx, dudy, dvdy;

    // Solve two linear systems to get uv offsets
    auto solveLinearSystem2x2 = [](const mat2& A, const vec2& B, float& x0, float& x1) -> bool {
        float det = A[0][0] * A[1][1] - A[0][1] * A[1][0];
        if (abs(det) < (float)constants::EPSILON) {
            return false;
        }
        x0 = (A[1][1] * B[0] - A[0][1] * B[1]) / det;
        x1 = (A[0][0] * B[1] - A[1][0] * B[0]) / det;
        if (std::isnan(x0) || std::isnan(x1)) {
            return false;
        }
        return true;
    };
    if (!solveLinearSystem2x2(A, Bx, dudx, dvdx)) {
        dudx = dvdx = 0.0f;
    }
    if (!solveLinearSystem2x2(A, By, dudy, dvdy)) {
        dudy = dvdy = 0.0f;
    }

    return vec4(dudx, dvdx, dudy, dvdy);
}

Now that we have dudx/dudy and dvdx/dvdy, getting the proper mipmap level works just like in the rasterization case. The above approach is exactly what I have implemented in Takua Renderer for camera rays. Similar to surface differentials, Takua caches dudx/dudy and dvdx/dvdy computations per shader invocation per hit point, so that multiple textures utilizing the same uv set dont’t require multiple redundant calls to the above function.

The ray derivative approach to mipmap level selection is basically the standard approach in modern production rendering today for camera rays. PBRT [Pharr et al. 2016], Mitsuba [Jakob 2010], and Solid Angle’s version of Arnold [Georgiev et al. 2018] all use ray differential systems based on this approach for camera rays. Renderman [Christensen et al. 2018] uses a simplified version of ray differentials that only tracks two floats per ray, instead of the four vectors needed to represent a full ray differential. Renderman tracks a width at each ray’s origin, and a spread value representing the linear rate of change of the ray width over a unit distance. This approach does not encode as much information as the full ray derivative approach, but nonetheless ends up being sufficient since in a path tracer, every pixel essentially ends up being supersampled. Hyperion [Burley et al. 2018] uses a similarly simplified scheme.

A brief side note: being able to calculate the differential for surface normals with respect to screen space is useful for bump mapping, among other things, and the calculation is directly analogous to the pseudocode above for calculateDifferentialSurfaceForTriangle() and calculateScreenSpaceDifferential(), just with surface normals substituted in for surface positions.

Ray Differentials and Path Tracing

We now know how to calculate filter footprints using ray differentials for camera rays, which is great, but what about secondary rays? Without ray differentials for secondary rays, path tracing texture access behavior degrades severely, since secondary rays have to fall back to point sampling textures at the lowest mip level. A number of different schemes exist for calculating filter footprints and mipmap levels for secondary rays; here are a few that have been presented in literature and/or are known to be in use in modern production renderers:

Igehy [1999] demonstrates how to propagate ray differentials through perfectly specular reflection and refraction events, which boil down to some simple extensions to the basic math for optical reflection and refraction. However, we still need a means for handling glossy (so really, non-zero surface roughness), which requires an extended version of ray differentials. Path differentials [Suykens and Willems 2001] consider more than just partial derivatives for each screen space pixel footprint; with path differentials, partial derivatives can also be taken at each scattering event along a number of dimensions. As an example, for handling a arbitrarily shaped BSDF lobe, new partial derivatives can be calculated along some parameter of the lobe that describes the shape of the lobe, which takes the form of a bunch of additional scattering directions around the main ray’s scattering direction. If we imagine looking down the main scattering direction and constructing a convex hull around the additional scattering directions, the result is a polygonal footprint describing the ray differential over the scattering event. This footprint can then be approximated by finding the major and minor axis of the polygonal footprint. While the method is general enough to handle arbitrary factors impacting ray directions, unfortunately it can be fairly complex and expensive to compute in practice, and differentials for some types of events can be very difficult to derive. For this reason, none of the major production renderers today actually use this approach. However, there is a useful observation that can be drawn from path differentials: generally, in most cases, primary rays have narrow widths and secondary rays have wider widths [Christensen et al. 2003]; this observation is the basis of the ad-hoc techniques that most production renderers utilize.

Recently, research has appeared that provides an entirely different, more principled approach to selecting filter footprints, based on covariance tracing [Belcour et al. 2013]. The high-level idea behind covariance tracing is that local light interaction effects such as transport, occlusion, roughness, etc. can all be encoded as 5D covariance matrices, which in turn can be used to determine ideal sampling rates. Covariance tracing builds an actual, implementable rendering algorithm on top of earlier, mostly theoretical work on light transport frequency analysis [Durand et al. 2005]. Belcour et al. [2017] presents an extension to covariance tracing for calculating filter footprints for arbitrary shading effects, including texture map filtering. The covariance-tracing based approach differs from path differentials in two key areas. While both approaches operate in path space, path differentials are much more expensive to compute than the covariance-tracing based technique; path differential complexity scales quadratically with path length, while covariance tracing only ever carries a single covariance matrix along a path for a given effect. Also, path differentials can only be generated starting from the camera, whereas covariance tracing works from the camera and the light; in the next section, we’ll talk about why this difference is critically important.

Covariance tracing based techniques have a lot of promise, and are the best known approach to date for for selecting filter footprints along a path. The original covariance tracing paper had some difficulty with handling high geometric complexity; covariance tracing requires a voxelized version of the scene for storing local occlusion covariance information, and covariance estimates can degrade severely if the occlusion covariance grid is not high resolution enough to capture small geometric details. For huge production scale scenes, geometric complexity requirements can make covariance tracing either slow due to huge occlusion grids, or degraded in quality due to insufficiently large occlusion grids. However, the voxelization step is not as much of a barrier to practicality as it may initially seem. For covariance tracing based filtering, visibility can be neglected, so the entire scene voxelization step can be skipped; Belcour et al. [2017] demonstrates how. Since covariance tracing based filtering can be used with the same assumptions and data as ray differentials but is both superior in quality and more generalizable than ray differentials, I would not be surprised to see more renderers adopt this technique over time.

As of present, however, instead of using any of the above techniques, pretty much all production renderers today use various ad-hoc methods for tracking ray widths for secondary rays. SPI Arnold tracks accumulated roughness values encountered by a ray: if a ray either encounters a diffuse event or reaches a sufficiently high accumulated roughness value, SPI Arnold automatically goes to basically the highest available MIP level [Kulla et al. 2018]. This scheme produces very aggressive texture filtering, but in turn provides excellent texture access patterns. Solid Angle Arnold similarly uses an ad-hoc microfacet-inspired heuristic for secondary rays [Georgiev et al. 2018] . Renderman handles reflection and refraction using something similar to Igehy [1999], but modified for the single-float-width ray differential representation that Renderman uses [Christensen et al. 2018]. For glossy and diffuse events, Renderman uses an empirically determined heuristic where higher ray width spreads are driven by lower scattering direction pdfs. Weta’s Manuka has a unified roughness estimation system built into the shading system, which uses a mean cosine estimate for figuring out ray differentials [Fascione et al. 2018].

Generally, roughness driven heuristics seem to work reasonably well in production, and the actual heuristics don’t actually have to be too complicated! In an experimental branch of PBRT, Matt Pharr found that a simple heuristic that uses a ray differential covering roughly 1/25th of the hemisphere for diffuse events and 1/100th of the hemisphere for glossy events generally worked reasonably well [Pharr 2017].

Ray Differentials and Bidirectional Techniques

So far everything we’ve discussed has been for unidirectional path tracing that starts from the camera. What about ray differentials and mip level selection for paths starting from a light, and by extension, for bidirectional path tracing techniques? Unfortunately, nobody has a good, robust solution for calculating ray differentials for light path! Calculating ray differentials for light paths is fundamentally something of an ill defined problem: a ray differential has to be calculated with respect to a screen space pixel footprint, which works fine for camera paths since the first ray starts from the camera, but for light paths, the last ray in the path is the one that reaches the camera. With light paths, we have something of a chicken-and-egg problem; there is no way to calculate anything with respect to a screen space pixel footprint until a light path has already been fully constructed, but the shading computations required to construct the path are the computations that want differential information in the first place. Furthermore, even if we did have a good way to calculate a starting ray differential from a light, the corresponding path differential can’t become as wide as in the case of a camera path, since at any given moment the light path might scatter towards the camera and therefore needs to maintain a footprint no wider than a single screen space pixel.

Some research work has gone into this question, but more work is needed on this topic. The previously discussed covariance tracing based technique [Belcour et al. 2017] does allow for calculating an ideal texture filtering width and mip level once a light path is fully constructed, but again, the real problem is that footprints need to be available during path construction, not afterwards. With bidirectional path tracing, things get even harder. In order to keep a bidirectional path unbiased, all connections between camera and light path vertices must be consistent in what mip level they sample; however, this is difficult since ray differentials depend on the scattering events at each path vertex. Belcour et al. [2017] demonstrates how important consistent texture filtering between two vertices is.

Currently, only a handful of production renderers have extensive support for bidirectional techniques; of the ones that do, the most common solution to calculating ray differentials for bidirectional paths is… simply not to at all. Unfortunately, this means bidirectional techniques must rely on point sampling the lowest mip level, which defeats the whole point of mipmapping and destroys texture caching performance. The Manuka team alludes to using ray differentials for photon map gather widths in VCM and notes that these ray differentials are implemented as part of their manifold next event estimation system [Fascione et al. 2018], but there isn’t enough detail in their paper to be able to figure out how this actually works.

Camera-Based Mipmap Level Selection

Takua has implementations of standard bidirectional path tracing, progressive photon mapping, and VCM, and I wanted mipmapping to work with all integrator types in Takua. I’m interested in using Takua to render scenes with very high complexity levels using advanced (often bidirectional) light transport algorithms, but reaching production levels of shading complexity without a mipmapped texture cache simply is not possible without crazy amounts of memory (where crazy is defined as in the range of dozens to hundreds of GB of textures or more). However, for the reasons described above, standard ray differential based techniques for calculating mip levels weren’t going to work with Takua’s bidirectional integrators.

The lack of a ray differential solution for light paths left me stuck for some time, until late in 2017, when I got to read an early draft of what eventually became the Manuka paper [Fascione et al. 2018] in the ACM Transactions on Graphics special issue on production rendering. I highly recommend reading all five of the production renderer system papers in the ACM TOG special issue. However, if you’re already generally familiar with how a modern PBRT-style renderer works and only have time to read one paper, I would recommend the Manuka paper simply because Manuka’s architecture and the set of trade-offs and choices made by the Manuka team are so different from what every other modern PBRT-style production path tracer does. What I eventually implemented in Takua is directly inspired by Manuka, although it’s not what Manuka actually does (I think).

The key architectural feature that differentiates Manuka from Arnold/Renderman/Vray/Corona/Hyperion/etc. is its shade-before-hit architecture. I should note here that in this context, shade refers to the pattern generation part of shading, as opposed to the bsdf evaluation/sampling part of shading. The brief explanation (you really should go read the full paper) is that Manuka completely decouples pattern generation from path construction and path sampling, as opposed to what all other modern path tracers do. Most modern path tracers use shade-on-hit, which means pattern generation is lazily evaluated to produce bsdf parameters upon demand, such as when a ray hits a surface. In a shade on hit architecture, pattern generation and path sampling are interleaved, since path sampling depends on the results of pattern generation. Separating out geometry processing from path construction is fairly standard in most modern production path tracers, meaning subdivision/tessellation/displacement happens before any rays are traced, and displacement usually involves some amount of pattern generation. However, no other production path tracer separates out all of pattern generation from path sampling the way Manuka does. At render startup, Manuka runs geometry processing, which dices all input geometry into micropolygon grids, and then runs pattern generation on all of the micropolygons. The result of pattern generation is a set of bsdf parameters that are baked into the micropolygon vertices. Manuka then builds a BVH and proceeds with normal path tracing, but at each path vertex, instead of having to evaluate shading graphs and do texture lookups to calculate bsdf parameters, the bsdf parameters are looked up directly from the pre-calculated cached values baked into the micropolygon vertices. Put another way, Manuka is a path tracer with a REYES-style shader execution model [Cook et al. 1987] instead of a PBRT-style shader execution model; Manuka preserves the grid-based shading coherence from REYES while also giving more flexibility to path sampling and light transport, which no longer have to worry about pattern generation making shading slow.

So how does any of this relate to the bidirectional path tracing mip level selection problem? The answer is: in a shade-before-hit architecture, by the time the renderer is tracing light paths, there is no need for mip level selection because there are no texture lookups required anymore during path sampling. During path sampling, Manuka evaluates bsdfs at each hit point using pre-shaded parameters that are bilinearly interpolated from the nearest micropolygon vertices; all of the texture lookups were already done in the pre-shade phase of the renderer! In other words, at least in principle, a Manuka-style renderer can entirely sidestep the bidirectional path tracing mip level selection problem (although I don’t know if Manuka actually does this or not). Also, in a shade-before-hit architecture, there are no concerns with biasing bidirectional path tracing from different camera/light path vertex connections seeing different mip levels. Since all mip level selection and texture filtering decisions take place before path sampling, the view of the world presented to path sampling is always consistent.

Takua is not a shade-before-hit renderer though, and for a variety of reasons, I don’t plan on making it one. Shade-before-hit presents a number of tradeoffs which are worthwhile in Manuka’s case because of the problems and requirements the Manuka team aimed to solve and meet, but Takua is a hobby renderer aimed at something very different from Manuka. The largest drawback of shade-before-hit is the startup time associated with having to pre-shade the entire scene; this startup time can be quite large, but in exchange, the total render time can be faster as path sampling becomes more efficient. However, in a number of workflows, the time to a full render is not nearly as important as the time to a minimum sample count at which point an artistic decision can be made on a noisy image; beyond this point, full render time is less important as long as it is within a reasonable ballpark. Takua currently has a fast startup time and reaches a first set of samples quickly, and I wanted to keep this behavior. As a result, the question then became: in a shade-on-hit architecture, is there a way to emulate shade-before-hit’s consistent view of the world, where texture filtering decisions are separated from path sampling?

The approach I arrived at is to drive mip level selection based on only a world-space distance-to-camera metric, with no dependency at all on the incoming ray at a given hit point. This approach is… not even remotely novel; in a way, this approach is probably the most obvious solution of all, but it took me a long time and a circuitous path to arrive at for some reason. Here’s the high-level overview of how I implemented a camera-based mip level selection technique:

  1. At render startup time, calculate a ray differential for each pixel in the camera’s image plane. The goal is to find the narrowest differential in each screen space dimension x and y. Store this piece of information for later.
  2. At each ray-surface intersection point, calculate the differential surface.
  3. Create a ‘fake’ ray going from the camera’s origin position to the current intersection point, with a ray differential equal to the minimum differential in each direction found in step 1.
  4. Calculate dudx/dudy and dvdx/dvdy using the usual method presented above, but using the fake ray from step 3 instead of the actual ray.
  5. Calculate the mip level as usual from dudx/dudy and dvdx/dvdy.

The rational for using the narrowest differentials in step 1 is to guarantee that texture frequency remains sub-pixel for the all pixels in screen space, even if that means that we might be sampling some pixels at a higher resolution mip level than whatever screen space pixel we’re accumulating radiance too. In this case, being overly conservative with our mip level selection is preferable to visible texture blurring from picking a mip level that is too low resolution.

Takua uses the above approach for all path types, including light paths in the various bidirectional integrators. Since the mip level selection is based entirely on distance-to-camera, as far as the light transport integrators are concerned, their view of the world is entirely consistent. As a result, Takua is able to sidestep the light path ray differential problem in much the same way that a shade-before-hit architecture is able to. There are some particular implementation details that are slightly complicated by Takua having support for multiple uv sets per mesh, but I’ll write about multiple uv sets in a later post. Also, there is one notable failure scenario, which I’ll discuss more in the results section.

Results

So how well does camera-based mipmap level selection work compared to a more standard approach based on path differentials or ray widths from the incident ray? Typically in a production renderer, mipmaps work in conjunction with tiled textures, where tiles are a fixed size (unless a tile is in a mipmap level with a total resolution smaller than the tile resolution). Therefore, the useful metric to compare is how many texture tiles each approach access throughout the course of a render; the more an approach accesses higher mipmap levels (meaning lower resolution mipmap levels), the fewer tiles in total should be accessed since lower resolution mipmap levels have fewer tiles.

For unidirectional path tracing from the camera, we can reasonably expect the camera-based approach to perform less well than a path differential or ray width technique (which I’ll call simply ‘ray-based’). In the camera-based approach, every texture lookup has to use a footprint corresponding to approximately a single screen space pixel footprint, whereas in a more standard ray-based approach, footprints get wider with each successive bounce, leading to access to higher mipmap levels. Depending on how aggressively ray widths are widened at diffuse and glossy events, ray-based approaches can quickly reach the highest mipmap levels and essentially spend the majority of the render only accessing high mipmap levels.

For bidirectional integrators though, the camera-based techinque has the major advantage of being able to provide reasonable mipmap levels for both camera and light paths, whereas the more standard ray-based approaches have to fall back to point sampling the lowest mipmap level for light paths. As a result, for bidirectional paths we can expect that a ray-based approach should perform somewhere in between how a ray-based approach performs in the unidirectional case and how point sampling only the lowest mipmap level performs in the unidirectional case.

As a baseline, I also implemented a ray-based approach with a relatively aggressive widening heuristic for glossy and diffuse events. For the forest scene from Figure 1, I got the following results at 1920x1080 resolution with 16 samples per pixel. I compared unidirectional path tracing from the camera and standard bidirectional path tracing; statistics are presented as total number of texture tiles accessed divided by total number of texture tiles across all mipmap levels. The lower the percentage, the better:

16 SPP 1920x1080 Unidirectional (PT)
    No mipmapping:                       314439/745394 tiles (42.18%)
    Ray-based level selection:           103206/745394 tiles (13.84%)
    Camera-based level selection:        104764/745394 tiles (14.05%)

16 SPP 1920x1080 Bidirectional (BDPT)
    No mipmapping:                       315452/745394 tiles (42.32%)
    Ray-based level selection:           203491/745394 tiles (27.30%)
    Camera-based level selection:        104858/745394 tiles (14.07%)

As expected, in the unidirectional case, the camera-based approach accesses slightly more tiles than the ray-based approach, and both approaches significantly outperform point sampling the lowest mipmap level. In the bidirectional case, the camera-based approach accesses slightly more tiles than in the unidirectional case, while the ray-based approach performs somewhere between the ray-based approach in unidirectional and point sampling the lowest mipmap level in unidirectional. What surprised me is how close the camera-based approach performed compared to the ray-based approach in the unidirectional case, especially since I chose a fairly aggresive widening heuristic (essentially a more aggressive version of the same heuristic that Matt Pharr uses in the texture cached branch of PBRTv3).

To help with visualizing what mipmap levels are being accessed, I implemented a new AOV in Takua that assigns colors to surfaces based on what mipmap level is accessed. With camera-based mipmap level selection, this AOV shows simply what mipmap level is accessed by all rays that hit a given point on a surface. Each mipmap level is represented by a different color, with support up to 12 mipmap levels. The following two images show accessed mipmap level at 1080p and 2160p (4K); note how the 2160p render accesses more lower mipmap levels than the 1080p render. The pixel footprints in the higher resolution render are smaller when projected into world space since more pixels have to pack into the same field of view. The key below each image shows what mipmap level each color corresponds to:

Figure 3: Mipmap levels accessed for the forest scene from Figure 1, rendered at 1920x1080 resolution.

Figure 4: Mipmap levels accessed for the forest scene from Figure 2, rendered at 3840x2160 resolution. Note how since the render is higher resolution and therefore pixel footprints are smaller for the same field of view, lower mipmap levels are accessed more frequently compared to Figure 3.

In general, everything looks as we would expect it to look in a working mipmapping system! Surface points farther away from the camera are generally accessing higher mipmap levels, and surface points closer to the camera are generally accessing lower mipmap levels. The ferns in the front of the frame near the camera access higher mipmap levels than the big fallen log in the center of the frame even though the ferns are closer to the camera because the textures for each leaf are extremely high resolution and the fern leaves are very small in screen-space. Surfaces that are viewed at highly glancing angles from the camera tend to access higher mipmap levels than surfaces that are camera-facing; this effect is easiest to see on the rocks in bottom front of the frame. The interesting sudden shift in mipmap level on some of the tree trunks comes from the tree trunks using two diffrent uv sets; the lower part of each tree trunk uses a different texture than the upper part, and the main textures are blended using a blending mask in a different uv space from the main textures; since the differential surface depends in part on the uv parameterization, different uv sets can result in different mipmap level selection behavior.

I also added a debug mode to Takua that tracks mipmap level access per texture sample. In this mode, for a given texture, the renderer splats into an image the lowest acceessed mipmap level for each texture sample. The result is sort of a heatmap that can be overlaid on the original texture’s lowest mipmap level to see what parts of texture are sampled at what resolution. Figure 5 shows one of these heatmaps for the texture on the fallen log in the center of the frame:

Figure 5: Mipmap level access patterns for the texture in Figure 2. Colors correspond to mipmap levels using the same key as in Figures 3 and 4. Dark grey indicates areas of the texture that were not sampled at all.

Just like in Figures 3 and 4, we can see that renders at higher resolutions will tend to access lower mipmap levels more frequently. Also, we can see that the vast majority of the texture is never sampled at all; with a tiled texture caching system where tiles are loaded on demand, this means there are a large number of texture tiles that we never bother to load at all. In cases like Figure 5, not loading unused tiles provides enormous memory savings compared to if we just loaded an entire non-mipmapped texture.

So far using a camera-based approach to mipmap level selection combined with just point sampling at each texture sample has held up very well in Takua! In fact, the Scandinavian Room scene from earlier this year was rendered using the mipmap approach described in this post as well. There is, however, a relatively simple type of scene that Takua’s camera-based approach fails badly at handling: refraction near the camera. If a lens is placed directly in front of the camera that significantly magnifies part of the scene, a purely world-space metric for filter footprints can result in choosing mipmap levels that are too high, which translates to visible texture blurring or pixelation. I don’t have anything implemented to handle this failure case right now. One possible solution I’ve thought about is to initially trace a set of rays from the camera using traditional ray differential propogation for specular objects, and cache the resultant mipmap levels in the scene. Then, during the actual renders, the renderer could compare the camera-based metric from the nearest N cached metrics to infer if a lower mipmap level is needed than what the camera-based metric produces. However, such a system would add significant cost to the mipmap level selection logic, and there are a number of implementation complications to consider. I do wonder how Manuka handles the “lens in front of a camera” case as well, since the shade-before-hit paradigm also fails on this scenario for the same reasons.

Long term, I would like to spend more time looking in to (and perhaps implementing) a covariance tracing based approach. While Takua currently gets by with just point sampling, filtering becomes much more important for other effects, such as glinty microfacet materials, and covariance tracing based filtering seems to be the best currently known solution for these cases.

In an upcoming post, I’m aiming to write about how Takua’s texture caching system works in conjunction with the mipmapping system described in this post. As mentioned earlier, I’m also planning a (hopefully) short-ish post about supporting multiple uv sets, and how that impacts a mipmapping and texture caching system.

Additional Renders

Finally, since this has been a very text-heavy post, here are some bonus renders of the same forest scene under different lighting conditions. When I was setting up this scene for Takua, I tried a number of different lighting conditions and settled on the one in Figure 1 for the main render, but some of the alternatives were interesting too. In a future post, I’ll show a bunch of interesting renders of this scene from different camera angles, but for now, here is the forest at different times of day:

Figure 6: The forest early on an overcast morning.

Figure 7: The forest later on a sunnier morning.

Figure 8: The forest at noon on a sunny blue sky day.

Figure 9: The forest at sunset.

References

John Amanatides. 1984. Ray Tracing with Cones. Computer Graphics (Proceedings of SIGGRAPH) 18, 3 (1984), 129-135.

Laurent Belcour, Cyril Soler, Kartic Subr, Nicolas Holzschuch, and Frédo Durand. 2013. 5D Covariance Tracing for Efficient Defocus and Motion Blur. ACM Transactions on Graphics. 32, 3 (2013), 31:1–31:18.

Laurent Belcour, Ling-Qi Yan, Ravi Ramamoorthi, and Derek Nowrouzezahrai. 2017. Antialiasing Complex Global Illumination Effects in Path-Space. ACM Transactions on Graphics. 36, 1 (2017), 9:1–9:13.

Brent Burley, David Adler, Matt Jen-Yuan Chiang, Hank Driskill, Ralf Habel, Patrick Kelly, Peter Kutz, Yining Karl Li, and Daniel Teece. 2018. The Design and Evolution of Disney’s Hyperion Renderer. ACM Transactions on Graphics. 37, 3 (2018), 33:1-33:22.

Per Christensen, Julian Fong, Jonathan Shade, Wayne Wooten, Brenden Schubert, Andrew Kensler, Stephen Friedman, Charlie Kilpatrick, Cliff Ramshaw, Marc Bannister, Brenton Rayner, Jonathan Brouillat, and Max Liani. 2018. RenderMan: An Advanced Path-Tracing Architeture for Movie Rendering. ACM Transactions on Graphics. 37, 3 (2018), 30:1–30:21.

Per Christensen, David M. Laur, Julian Fong, Wayne Wooten, and Dana Batali. 2003. Ray Differentials and Multiresolution Geometry Caching for Distribution Ray Tracing in Complex Scenes. Computer Graphics Forum. 22, 3 (2003), 543-552.

Robert L. Cook, Loren Carpenter, and Edwin Catmull. 1987. The Reyes Image Rendering Architecture. Computer Graphics (Proceedings of SIGGRAPH) 21, 4 (1987), 95-102.

Frédo Durand, Nicolas Holzchuch, Cyril Soler, Eric Chan, and François X Sillion. 2005. A Frequency Analysis of Light Transport. ACM Transactions on Graphics. 24, 3 (2005), 1115-1126.

Luca Fascione, Johannes Hanika, Mark Leone, Marc Droske, Jorge Schwarzhaupt, Tomáš Davidovič, Andrea Weidlich and Johannes Meng. 2018. Manuka: A Batch-Shading Architecture for Spectral Path Tracing in Movie Production. ACM Transactions on Graphics. 37, 3 (2018), 31:1–31:18.

Iliyan Georgiev, Thiago Ize, Mike Farnsworth, Ramón Montoya-Vozmediano, Alan King, Brecht van Lommel, Angel Jimenez, Oscar Anson, Shinji Ogaki, Eric Johnston, Adrien Herubel, Declan Russell, Frédéric Servant, and Marcos Fajardo. 2018. Arnold: A Brute-Force Production Path Tracer. ACM Transactions on Graphics. 37, 3 (2018), 32:1-32:12.

Larry Gritz and James K. Hahn. 1996. BMRT: A Global Illumination Implementation of the RenderMan Standard. Journal of Graphics Tools. 3, 1 (1996), 29-47.

Paul S. Heckbert and Pat Hanrahan. 1984. Beam Tracing Polygonal Objects. Computer Graphics (Proceedings of SIGGRAPH) 18, 3 (1984), 119-127.

Homan Igehy. 1999. Tracing Ray Differentials. In SIGGRAPH ‘99 (Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques). 179–186.

Wenzel Jakob. 2010. Mitsuba Renderer.

Christopher Kulla, Alejandro Conty, Clifford Stein, and Larry Gritz. 2018. Sony Pictures Imageworks Arnold. ACM Transactions on Graphics. 37, 3 (2018), 29:1-29:18.

Mark Lee, Brian Green, Feng Xie, and Eric Tabellion. 2017. Vectorized Production Path Tracing. In HPG ‘17 (Proceedings of High Performance Graphics). 10:1-10:11.

Darwyn Peachey. 1990. Texture on Demand. Technical Report 217. Pixar Animation Studios.

Matt Pharr, Wenzel Jakob, and Greg Humphreys. 2016. Physically Based Rendering: From Theory to Implementation, 3rd ed. Morgan Kaufmann.

Matt Pharr. 2017. The Implementation of a Scalable Texture Cache. Physically Based Rendering Supplemental Material.

Dan Piponi. 2004. Automatic Differentiation, C++ Templates and Photogrammetry. Journal of Graphics Tools. 9, 4 (2004), 41-55.

Mikio Shinya, Tokiichiro Takahashi, and Seiichiro Naito. 1987. Principles and Applications of Pencil Tracing. Computer Graphics (Proceedings of SIGGRAPH) 21, 4 (1987), 45-54.

Frank Suykens and Yves. D. Willems. 2001. Path Differentials and Applications. In Rendering Techniques 2001 (Proceedings of the 12th Eurographics Workshop on Rendering). 257–268.

Lance Williams. 1983. Pyramidal Parametrics. Computer Graphics (Proceedings of SIGGRAPH) 12, 3 (1983), 1-11.

Transactions on Graphics Paper- The Design and Evolution of Disney's Hyperion Renderer

The August 2018 issue of ACM Transactions on Graphics (Volume 37 Issue 3) is partially a special issue on production rendering, featuring five systems papers describing notable, major production renderers in use today. I got to contribute to one of these papers as part of the Hyperion team at Walt Disney Animation Studios! Our paper, titled “The Design and Evolution of Disney’s Hyperion Renderer”, discusses exactly what the title suggests. We present a detailed look inside how Hyperion is designed today, discuss the decisions that went into its current design, and examine how Hyperion has evolved since the original EGSR 2013 “Sorted Deferred Shading for Production Path Tracing” paper that was the start of Hyperion. A number of Hyperion developers contributed to this paper as co-authors, along with Hank Driskill, who was the technical supervisor on Big Hero 6 and Moana and was one of the key supporters of Hyperion’s early development and deployment.

Image from paper Figure 1: Production frames from Big Hero 6 (upper left), Zootopia (upper right), Moana (bottom left), and Olaf’s Frozen Adventure (bottom right), all rendered using Disney’s Hyperion Renderer.

Here is the paper abstract:

Walt Disney Animation Studios has transitioned to path-traced global illumination as part of a progression of brute-force physically based rendering in the name of artist efficiency. To achieve this without compromising our geometric or shading complexity, we built our Hyperion renderer based on a novel architecture that extracts traversal and shading coherence from large, sorted ray batches. In this article, we describe our architecture and discuss our design decisions. We also explain how we are able to provide artistic control in a physically based renderer, and we demonstrate through case studies how we have benefited from having a proprietary renderer that can evolve with production needs.

The paper and related materials can be found at:

We owe a huge thanks to Matt Pharr, who came up with the idea for a TOG special issue on production rendering and coordinated the writing of all of the papers, and Kavita Bala, who as editor-in-chief of TOG supported all of the special issue papers. This issue has actually been in the works for some time; Matt Pharr contacted us over a year ago about putting together a special issue, and we began work on our paper in May 2017. Matt and Kavita generously gave all of the contributors to the special issue a significant amount of time to write, and Matt provided a lot of valuable feedback and suggestions to all five of the final papers. The end result is, in my opinion, something special indeed. The five rendering teams that contributed papers in the end were Solid Angle’s Arnold, Sony Imageworks’ Arnold, Weta Digital’s Manuka, Pixar’s Renderman, and ourselves. All five of the papers in the special issue are fascinating, well-written, highly technical rendering systems papers (as opposed to just marketing fluff), and absolutely worth a read!

Something important that I want to emphasize here is that the author lists for all five papers are somewhat deceptive. One might think that the author lists represent all of the people responsible for each renderers’ success; this idea is, of course, inaccurate. For Hyperion, the authors on this paper represent just a small fraction of all of the people responsible for Hyperion’s success. Numerous engineers not on the author list have made significant contributions to Hyperion in the past, and the project relies enormously on all of the QA engineers, managers/leaders, TDs, artists, and production partners that test, lead, deploy, and use Hyperion every day. We also owe an enormous amount to all of the researchers that we have collaborated directly with, or who we haven’t collaborated directly with but have used their work. The success of every production renderer comes not just from the core development team, but instead from the entire community of folks that surround a production renderer; this is just as true for Hyperion as it is for Renderman, Arnold, Manuka, etc. The following is often said in our field but nonetheless true: building an advanced production renderer in a reasonable timeframe really is only possible through a massive team effort.

This summer, in addition to publishing this paper, members of the Hyperion team also presented the following at SIGGRAPH 2018:

  • Peter Kutz was on the “Design and Implementation of Modern Production Renderers” panel put together by Matt Pharr to discuss the five TOG production rendering papers. Originally Brent Burley was supposed to represent the Hyperion team, but due to some outside circumstances, Brent wasn’t able to make it to SIGGRAPH this year, so Peter went in Brent’s place.
  • Matt Jen-Yuan Chiang presented a talk on rendering eyes, titled “Plausible Iris Caustics and Limbal Arc Rendering”, in the “It’s a Material World” talks session.

Disney Animation Data Sets

Today at EGSR 2018, Walt Disney Animation Studios announced the release of two large, production quality/scale data sets for rendering research purposes. The data sets are available on a new data sets page on the official Disney Animation website. The first data set is the Cloud Data Set, which contains a large and highly detailed volumetric cloud data set that we used for our “Spectral and Decomposition Tracking for Rendering Heterogeneous Volumes” SIGGRAPH 2017 paper, and the second data set is the Moana Island Scene, which is a full production scene from Moana.

Figure 1: The Moana Island Data Set, rendered using Disney's Hyperion Renderer.

Figure 2: The Cloud Data Set, rendered using Disney's Hyperion Renderer.

In this post, I’ll share some personal thoughts, observations, and notes. The release of these data sets was announced by my teammate, Ralf Habel, at EGSR today, but this release has been in the works for a very long time now, and is the product of the collective effort of an enormous number of people across the studio. A number of people deserve to be highlighted: Rasmus Tamstorf spearheaded the entire effort and was instrumental in getting the resources and legal approval needed for the Moana Island Scene. Heather Pritchett is the TD that did the actual difficult work of extracting the Moana Island Scene out of Disney Animation’s production pipeline and converting it from proprietary data formats into usable, industry-standard data formats. Sean Palmer and Jonathan Garcia also helped in resurrecting the data from Moana. Hyperion developers Ralf Habel and Peter Kutz led the effort to get the Cloud Data Set approved and released; the cloud itself was made by artists Henrik Falt and Alex Nijmeh. On the management side of things, technology manager Rajesh Sharma and Disney Animation CTO, Nick Cannon, provided crucial support and encouragement. Matt Pharr has been crucial in collaborating with us to get these data sets released. Matt was highly accommodating in helping us get the Moana Island Scene into a PBRT scene; I’ll talk a bit more about this later. Intel’s Embree team also gave significant feedback. My role was actually quite small; along with other members of the Hyperion development team, I just provided some consultation throughout the whole process.

Please note the licenses that the data sets come with. The Cloud Data Set is licensed under a Creative Commons Attribution ShareAlike 3.0 Unported License; the actual cloud is based on a photograph by Kevin Udy on his Colorado Clouds Blog, which is also licensed under the same Creative Commons license. The Moana Island Scene is licensed under a more restrictive, custom Disney Enterprises research license. This is because the Moana Island Scene is a true production scene; it was actually used to produce actual frames in the final film. As such, the data set is being released only for pure research and development purposes; it’s not meant for use in artistic projects. Please stick to and follow the licenses these data sets are released under; if people end up misusing these data sets, then it makes releasing more data sets into the community in the future much harder for us.

This entire effort was sparked two years ago at SIGGRAPH 2016, when Matt Pharr made an appeal to the industry to provide representative production-scale data sets to the research community. I don’t know how many times I’ve had conversations about how well new techniques or papers or technologies will scale to production cases, only to have further discussion stymied by the lack of any true production data sets that the research community can test against. We decided as a studio to answer Matt’s appeal, and last year at SIGGRAPH 2017, Brent Burley and Rasmus Tamstorf announced our intention to release both the Cloud and Moana Island data sets. It’s taken nearly a year from announcement to release because the process has been complex, and it was very important to the studio to make sure the release was done properly.

One of the biggest challenges was getting all of the data out of the production pipeline and our various proprietary data formats into something that the research community can actually parse and make use of. Matt Pharr was extremely helpful here; over the past year, Matt has added support for Ptex textures and implemented the Disney Bsdf in PBRT v3. Having Ptex and the Disney Bsdf available in PBRT v3 made PBRT v3 the natural target for an initial port to a renderer other than Hyperion, since internally all of Hyperion’s shading uses the Disney Bsdf, and all of our texturing is done through Ptex. Our texturing also relies heavily on procedural SeExpr expressions; all of the expression-drive texturing had to be baked down into Ptex for the final release.

Both the Cloud and Moana Island data sets are, quite frankly, enormous. The Cloud data set contains a single OpenVDB cloud that weighs in at 2.93 GB; the data set also provides versions of the VDB file scaled down to half, quarter, eighth, and sixteenth scale resolutions. The Moana Island data set comes in three parts: a base package containing raw geometry and texture data, an animation package containing animated stuff, and a PBRT package containing a PBRT scene generated from the base package. These three packages combined, uncompressed, weigh in at well over 200 GB of disk space; the uncompressed PBRT package along weighs in at around 38 GB.

For the Moana Island Scene, the provided PBRT scene requires a minimum of around 90 GB if RAM to render. This many seem enormous for consumer machines, because it is. However, this is also what we mean by “production scale”; for Disney Animation, 90 GB is actually a fairly mid-range memory footprint for a production render. On a 24-core, dual-socket Intel Xeon Gold 6136 system, the PBRT scene took me a little over an hour and 15 minutes to render from the ‘shotCam’ camera. Hyperion renders the scene faster, but I would caution against using this data set to do performance shootouts between different renders. I’m certain that within a short period of time, enthusiastic members of the rendering community will end up porting this scene to Renderman and Arnold and Vray and Cycles and every other production renderer out there, which will be very cool! But keep in mind, this data set was authored very specifically around Hyperion’s various capabilities and constraints, which naturally will be very different from how one might author a complex data set for other renderers. Every renderer works a bit differently, so the most optimal way to author a data set for every renderer will be a bit different; this data set is no exception. So if you want to compare renderers using this data set, make sure you understand the various ways how the way this data set is structured impacts the performance of whatever renderers you are comparing.

For example, Hyperion subdivides/tessellates/displaces everything to as close to sub-poly-per-pixel as it can get while still fitting within computational resources. This means our scenes are usually very heavily subdivided and tessellated. However, the PBRT version of the scene doesn’t come with any subdivision; as a result, silhouettes in the following comparison images don’t fully match in some areas. Similarly, PBRT’s lights and lighting model differ from Hyperion’s, and Hyperion has various artistic controls that are unique to Hyperion, meaning the renders produced by PBRT versus Hyperion differ in many ways:

Figure 3a: 'shotCam' camera angle, rendered using Disney's Hyperion Renderer.

Figure 3b: 'shotCam' camera angle, rendered using PBRT v3.

Figure 4a: 'beachCam' camera angle, rendered using Disney's Hyperion Renderer.

Figure 4b: 'beachCam' camera angle, rendered using PBRT v3.

Figure 5a: 'dunesACam' camera angle, rendered using Disney's Hyperion Renderer.

Figure 5b: 'dunesACam' camera angle, rendered using PBRT v3. Some of the plants are in slightly different locations than the Hyperion render; this was just a small change that happened in data conversion to the PBRT scene.

Figure 6a: 'flowersCam' camera angle, rendered using Disney's Hyperion Renderer.

Figure 6b: 'flowersCam' camera angle, rendered using PBRT v3. Note that the silhouette of the flowers is different compared to the Hyperion render because the Hyperion render subdivides the flowers, whereas the PBRT render displays the base cage.

Figure 7a: 'grassCam' camera angle, rendered using Disney's Hyperion Renderer.

Figure 7b: 'grassCam' camera angle, rendered using PBRT v3. The sand dune in the background looks particularly different from the Hyperion render due to subdivision and displacement.

Figure 8a: 'palmsCam' camera angle, rendered using Disney's Hyperion Renderer.

Figure 8b: 'palmsCam' camera angle, rendered using PBRT v3. The palm leaves look especially different due to differences in artistic lighting shaping and curve shading differences. Most notably, the look in Hyperion depends heavily on attributes that vary along the length of the curve, which is something PBRT doesn't support yet. Some more work is needed here to get the palm leaves to look more similar between the two renders.

Figure 9a: 'rootsCam' camera angle, rendered using Disney's Hyperion Renderer.

Figure 9b: 'rootsCam' camera angle, rendered using PBRT v3. Again, the significant difference in appearance in the rocks is probably just due to subdivision/tesselation/displacement.

Another example of a major difference between the Hyperion renders and the PBRT renders is in the water, which Hyperion renders using photon mapping to get the caustics. The provided PBRT scenes use unidirectional pathtracing for everything including the water, hence the very different caustics. Similarly, the palm trees in the ‘palmsCam’ camera angle look very different between PBRT and Hyperion because Hyperion’s lighting controls are very different from PBRT; Hyperion’s lights include various artistic controls for custom shaping and whatnot, which aren’t necessarily fully physical. Also, the palm leaves are modeled using curves, and the shading depends on varying colors and attributes along the length and width of the curve, which PBRT doesn’t support yet (getting the palm leaves is actually the top priority for if more resources are freed up to improve the data set release). These difference between renderers don’t necessarily mean that one renderer is better than the other; they simply mean that the renderers are different. This will be true for any pair of renderers that one wants to compare.

The Cloud Data Set includes an example render from Hyperion, which implements our Spectral and Decomposition Tracking paper in its volumetric rendering system to efficiently render the cloud with thousands of bounces. This render contains no post-processing; what you see in the provided image is exactly what Hyperion outputs. The VDB file expresses the cloud as a field of heterogeneous densities. Also provided is an example Mitsuba scene, renderable using the Mitsuba-VDB plugin that can be found on Github. Please consult the README file for some modifications in Mitsuba that are necessary to render the cloud. Also, please note that the Mitsuba example will take an extremely long time to render, since Mitsuba isn’t really meant to render high-albedo heterogeneous volumes. With proper acceleration structures and algorithms, rendering the cloud only takes us a few minutes using Hyperion, and should be similarly fast in any modern production renderer.

One might wonder just why production data sets in general are so large. This is an interesting question; the short answer across the industry basically boils down to “artist time is more expensive and valuable than computer hardware”. We could get these scenes to fit into much smaller footprints if we were willing to make our artists spend a lot of time aggressively optimizing assets and scenes and whatnot so that we could fit these scenes into smaller disk, memory, and compute footprints. However, this isn’t actually always a good use of artist time; computer hardware is cheap compared to wasting artist time, which often could be better spent elsewhere making the movie better. Throwing more memory and whatnot at huge data sets is also simply more scalable than using more artist resources, relatively speaking.

Both data sets come with detailed README documents; the Moana Island Scene’s documentation in particular is quite extensive and contains a significant amount of information about how assets are authored and structured at Disney Animation, and how renders are lit, art-directed, and assembled at Disney Animation. I highly recommend reading all of the documentation carefully if you plan on working with these data sets, or just if you are generally curious about how production scenes are built at Disney Animation.

Personally, I’m very much looking forward to seeing what the rendering community (and the wider computer graphics community at large) does with these data sets! I’m especially excited to see what the realtime world will be able to do with this data; seeing the Moana Island Scene in its full glory in Unreal Engine 4 or Unity would be something indeed, and I think these data sets should provide a fantastic challenge to research into light transport and ray tracing speed as well. If you do interesting things with these data sets, please write to us at the email addresses in the provided README files!

Also, Matt Pharr has written on his blog about how the Moana Island Scene has further driven the development of PBRT v3. I highly recommend giving Matt’s blog a read!

Scandinavian Room Scene

Almost three years ago, I rendered a small room interior scene to test an indoor, interior illumination scenario. Since then, a lot has changed in Takua, so I thought I’d revisit an interior illumination test with a much more complex, difficult scene. I don’t have much time to model stuff anymore these days, so instead I bought Evermotion’s Archinteriors Volume 48 collection, which is labeled as Scandinavian interior room scenes (I don’t know what’s particularly Scandinavian about these scenes, but that’s what the label said) and ported one of the scenes to Takua’s scene format. Instead of simply porting the scene as-is, I modified and added various things in the scene to make it feel a bit more customized. See if you can spot what they are:

Figure 1: A Scandinavian room interior, rendered in Takua a0.8 using VCM.

I had a lot of fun adding all of my customizations! I brought over some props from the old complex room scene, such as the purple flowers and vase, a few books, and Utah teapot tray, and also added a few new fun models, such as the MacBook Pro in the back and the copy of Physically Based Rendering 3rd Edition in the foreground. The black and white photos on the wall are crops of my Minecraft renders, and some of the books against the back wall have fun custom covers and titles. Even all of the elements that came with the original scene are re-shaded. The original scene came with Vray’s standard VrayMtl as the shader for everything; Takua’s base shader parameterization draws some influence from Vray, but also draws from the Disney Bsdf and Arnold’s AlShader and as a result has a parameterization that is sufficiently different that I wound up just re-shading everything instead of trying to write some conversion tool. For the most part I was able to re-use the textures that came with the scene to drive various shader parameters. The skydome is from the noncommercial version of VizPeople’s HDRi v1 collection.

Speaking of the skydome… the main source of illumination in this scene comes from the sun in the skydome, which presented a huge challenge for efficient light sampling. Takua has had domelight/environment map importance sampling using CDF inversion sampling for a long time now, which helps a lot, but the indoor nature of this scene still made sampling the sun difficult. Sampling the sun in an outdoor scene is fairly efficient since most rays will actually reach the sun, but in indoor scenes, importance sampling the sun becomes inefficient without taking occlusion into account since only rays that actually make it outdoors through windows can reach the sun. The best known method currently for handling domelight importance sampling through windows in an indoor scene is Portal Masked Environment Map Sampling (PMEMS) by Bitterli et al. I haven’t actually implemented PMEMS yet though, so the renders in this post all wound up requiring a huge number of samples per pixel to render; I intend on implementing PMEMS at some point in the near future.

Apart from the skydome, this scene also contains several other practical light sources, such as the lamp’s bulb, the MacBook Pro’s screen, and the MacBook Pro’s glowing Apple logo on the back of the screen (which isn’t even visible to camera, but is still enabled since it provides a tiny amount of light against the back wall!). In addition to choosing where on a single light to sample, choosing which light to sample is also an extremely important and difficult problem. Until this rendering this scene, I hadn’t really put any effort into efficiently selecting which light to sample. Most of my focus has been on the integration part of light transport, so Takua’s light selection has just been uniform random selection. Uniform random selection is terrible for scenes that contain multiple lights with highly varying emission between different lights, which is absolutely the case for this scene. Like any other importance sampling problem, the ideal solution is to send rays towards lights with a probability proportional to the amount of illumination we expect each light to contribute to each ray origin point.

I implemented a light selection strategy where the probability of selecting each light is weighted by the total emitted power of each light; essentially this boils down to estimating the total emitted power of each light according to the light’s surface texture and emission function, building a CDF across all of the lights using the total emission estimates, and then using standard CDF inversion sampling to pick lights. This strategy works significantly better than uniform random selection and made a huge difference in render speed for this scene, as seen in Figures 2 through 4. Figure 2 uses uniform random light selection with 128 spp; note how the area lit by the wall-mounted lamp is well sampled, but the image overall is really noisy. Figure 3 uses power-weighted light selection with the same spp as Figure 2; the lamp area is more noisy than in Figure 2, but the render is less noisy overall. Notably, Figure 3 also took a third of the time compared to Figure 2 for the same sample count; this is because in this scene, sending rays towards the lamp is significantly more expensive due to heavier geometry than sending rays towards the sun, even when rays towards the sun get occluded by the walls. Figure 4 uses power-weighted light selection again, but is equal-time to Figure 2 instead of equal-spp; note the significant noise reduction:

Figure 2: The same frame from Figure 1, 128 spp using uniform random light selection. Average pixel RMSE compared to Figure 1: 0.439952.

Figure 3: Power-weighted light selection, with equal spp to Figure 2. Average pixel RMSE compared to Figure 1: 0.371441.

Figure 4: Power-weighted light selection again, but this time with equal time instead of equal spp to Figure 2. Average pixel RMSE compared to Figure 1: 0.315465.

Figure 5: Zoomed crops of Figures 2 through 4. From left to right: uniform random sampling, equal sample power-weighted sampling, and equal time power-weighted sampling.

However, power-weighted light selection still is not even close to being the most optimal technique possible; this technique completely ignores occlusion and distance, which are extremely important. Unfortunately, because occlusion and distance to each light varies for each point in space, creating a light selection strategy that takes occlusion and distance into account is extremely difficult and is a subject of continued research in the field. In Hyperion, we use a cache point system, which we described on page 97 of our SIGGRAPH 2017 Production Volume Rendering course notes. Other published research on the topic includes Practical Path Guiding for Efficient Light-Transport Simulation by Muller et al, On-line Learning of Parametric Mixture Models for Light Transport Simulation by Vorba et al, Product Importance Sampling for Light Transport Path Guiding by Herholz et al, Learning Light Transport the Reinforced Way by Dahm et al, and more. At some point in the future I’ll revisit this topic.

For a long time now, Takua has also had a simple interactive mode where the camera can be moved around in a non-shaded/non-lit view; I used this mode to interactively scout out some interesting and fun camera angles for some more renders. Being able to interactively scout in the same renderer used to final rendering is an extremely powerful tool; instead of guessing at depth of field settings and such, I was able to directly set and preview depth of field with immediate feedback. Unfortunately some of the renders below are noisier than I would like, due to the previously mentioned light sampling difficulties. All of the following images are rendered using Takua a0.8 with VCM:

Figure 6: A MacBook Pro running Takua Renderer to produce Figure 1.

Figure 7: Physically Based Rendering Third Edition sitting on the coffee table.

Figure 8: Closeup of the same purple flowers from the old Complex Room scene.

Figure 9: Utah Teapot tea set on the coffee table.

Figure 10: A glass globe with mirror-polished metal continents, sitting in the sunlight from the window.

Figure 11: Close-up of two glass and metal mugs filled with tea.

Beyond difficult light sampling, generally complex and difficult light transport with lots of subtle caustics also wound up presenting major challenges in this scene. For example, note the subtle caustics on the wall in the upper right hand part of Figure 10; those caustics are actually visibly not fully converged, even though the sample count across Figure 10 was in the thousands of spp! I intentionally did not use adaptive sampling in any of these renders; instead, I wanted to experiment with a common technique used in a lot of modern production renderers for noise reduction: in-render firefly clamping. My adaptive sampler is already capable of detecting firefly pixels and driving more samples at fireflies in the hopes of accelerating variance reduction on firefly pixels, but firefly clamping is a much more crude, biased, but nonetheless effective technique. The idea is to detect on each pixel spp if a returned sample is an outlier relative to all of the previously accumulated samples, and discard or clamp the sample if it in fact is an outlier. Picking what threshold to use for outlier detection is a very manual process; even Arnold provides a tuning max-value parameter for firefly clamping.

I wanted to be able to directly compare the render with and without firefly clamping, so I implemented firefly clamping on top of Takua’s AOV system. When enabled, firefly clamping mode produces two images for a single render: one output with firefly clamping enabled, and one with clamping disabled. I tried re-rendering Figure 10 using unidirectional pathtracing and a relatively low spp count to produce as many fireflies as I could, for a clearer comparison. For this test, I set the firefly threshold to be samples that are at least 250 times brighter than the estimated pixel value up to that sample.

Figure 12: The same render as Figure 10, but rendered with a lower sample count and using unidirectional pathtracing instead of VCM to draw out more fireflies.

Figure 13: From the same run of Takua Renderer as Figure 12, but the firefly-clamped render output instead of the raw render.

Note how Figure 13 appears to be completely firefly-free compared to Figure 12, and how Figure 13 doesn’t have visible caustic noise on the walls compared to Figure 10. However, notice how Figure 13 is also missing significant illumination in some areas, such as in the corner of the walls near the floor behind the wooden step ladder, or in the deepest parts of the purple flower bunch. Finding a threshold that eliminates all fireflies without loosing significant illumination in other areas is very difficult or, in some cases, impossible since some of these types of light transport essentially manifest as firefly-like high energy samples that only smooth out over time. For the final renders in Figure 1 and Figures 6 through 11, I wound up not actually using any firefly clamping. While biased noise-reduction techniques are a necessary evil in actual production, I expect that I’ll try to avoid relying on firefly clamping in the vast majority of what I do with Takua, since Takua is meant to just be a brute-force, hobby kind of thing anyway.