Just a quick update on future plans. Starting in July, I’m going to be working full time for Walt Disney Animation Studios as a software engineer on their custom, in-house Hyperion Renderer. I couldn’t be more excited about working with everyone on the Hyperion team; ever since the Sorted Deferred Shading paper was published two years ago, I’ve thought that the Hyperion team is doing some of the most interesting work there is in the rendering field right now.

I owe an enormous thanks to everyone that’s advised and supported and encouraged me to continue exploring the rendering and graphics world. Thanks, Joe, Don, Peter, Tony, Mark, Christophe, Amy, Fran, Gabriel, Harmony, and everyone else!

Normally as a rule I only post images to this blog that I made or have a contribution to, but this time I’ll make an exception. Here’s one of my favorite stills from Big Hero 6, rendered entirely using Hyperion and lit by Angela McBride, a friend from PUPs 2011! Images like this one are an enormous source of inspiration to me, so I absolutely can’t wait to get started at Disney and help generate more gorgeous imagery like this!

A still from Big Hero 6, rendered entirely using Hyperion. Property of Walt Disney Animation Studios.

BSDF System

Takua a0.5’s BSDF system was particularly interesting to build, especially because in previous versions of Takua Render, I never really had a good BSDF system. Previously, my BSDFs were written in a pretty ad-hoc way and were somewhat hardcoded into the pathtracing integrator, which made BSDF extensibility very difficult and multi-integrator support nearly impossible without significant duplication of BSDF code. In Takua a0.5, I’ve written a new, extensible, modularized BSDF system that is inspired by Mitsuba and Renderman 19/RIS. In this post, I’ll write about how Takua a0.5’s BSDF system works and show some pretty test images generated during development with some interesting models and props.

First, here’s a still-life sort of render showcasing a number of models with a number of interesting materials, all using Takua a0.5’s BSDF system and rendered using my VCM integrator. All of the renders in this post are rendered either using my BDPT integrator or my VCM integrator.

Still-life scene with a number of interesting, complex materials created using Takua a0.5's BSDF system. The chess pieces and notebooks make use of instancing. Rendered in Takua a0.5 using VCM.

BSDFs in Takua a0.5 are designed to support bidirectional evaluation and importance sampling natively. Basically, this means that all BSDFs need to implement five basic functions. These five basic functions are:

  • Evaluate, which takes input and output directions of light and a normal, and returns the BSDF weight, cosine of the angle of the input direction, and color absorption of the scattering event. Evaluate can also optionally return the probability of the output direction given the input direction, with respect to solid angle.
  • CalculatePDFW, which takes the input and output directions of light and a normal, and returns the forward probability of the output direction given the input direction. In order to make the BSDF operate bidirectionally, this function also needs to be able to return the backwards probability if the input and output are reversed.
  • Sample, which takes in an input direction, a normal, and a random number generator and returns an output direction, the BSDF weight, the forward probability of the output direction, and the cosine of the input angle.
  • IsDelta, which returns true if the BSDF’s probability distribution function is a Dirac delta function and false otherwise. This attribute is important for allowing BDPT and VCM to handle perfectly specular BSDFs correctly, since perfectly specular BSDFs are something of a special case.
  • GetContinuationProbability, which takes in an input direction and normal and returns the probability of ending a ray path at this BSDF. This function is used for Russian Roulette early path termination.

In order to be correct and bididirectional, each of these functions should return results that agree with the other functions. For example, taking the output direction generated by Sample and calling Evaluate with the Sample output direction should produce the same color absorption and forward probability and other attributes as Sample. Sample, Evaluate, and CalculatePDFW are all very similar functions and often can share a large amount of common code, but each one is tailored to a slightly different purpose. For example, Sample is useful for figuring out a new random ray direction along a ray path, while Evaluate is used for calculating BSDF weights while importance sampling light sources.

Small note: I wrote that these five functions all take in a normal, which is technically all they need in terms of differential geometry. However, in practice, passing in a surface point and UV and other differential geometry information is very useful since that allows for various properties to be driven by 2D and 3D textures. In Takua a0.5, I pass in a normal, surface point, UV coordinate, and a geom and primitive ID for future PTEX support, and allow every BSDF attribute to be driven by a texture.

One of the test props I made is the PBRT book, since I thought rendering the Physically Based Rendering book with a physically based renderer and physically based shading would be amusing. The base diffuse color is driven by a texture map, and the interesting rippled and variation in the glossiness of the book cover comes from driving additional gloss and specular properties with more texture maps.

Physically Based Rendering book, rendered with my physically based renderer. Note the texture-driven gloss and specular properties. Rendered using BDPT.

In order to be physically correct, BSDFs should also fulfill the following three properties:

  • Positivity, meaning that the return value of the BSDF should always be positive or equal to 0.
  • Helmholtz Reciprocity, which means the return value of the BSDF should not be changed by switching the input and output directions (although switching the input and output CAN change how things are calculated internally, such as in perfectly specular refractive materials).
  • Energy Conservation, meaning the surface cannot reflect more light than arrives.

At the moment, my base BSDFs are not actually the best physically based BSDFs in the world… I just have Lambertian diffuse, normalized Blinn-Phong, and Fresnel-based perfectly specular reflection/refraction. At a later point I’m planning on adding Beckmann and Disney’s Principled BSDF, and possibly others such as GGX and Ward. However, for the time being, I can still create highly complex and interesting materials because of the modular nature of Takua a0.5’s BSDF system; one of the most powerful uses of this modular system is combining base BSDFs into more complex BSDFs. For example, I have another BSDF called FresnelPhong, which internally calls normalized Blinn-Phong BSDF but also calls the Fresnel code from my Fresnel specular BSDF to account for an output direction with the Fresnel effect with glossy surfaces. Since the Fresnel specular BSDF handles refractive materials, FresnelPhong allows for creating glossy transmissive surfaces such as frosted glass (albeit not as accurate to reality as one would get with Beckmann or GGX).

Another one of my test props is a glass chessboard, where half of the pieces and board squares are using frosted glass. Needless to say, this scene is very difficult to render using unidirectional pathtracing. I only have one model of each chess piece type, and all of the pieces on the board are instances with varying materials per instance.

Chessboard with ground glass squares and clear glass squares. Rendered using BDPT.

Chessboard with ground glass and clear glass pieces. Rendered using BDPT.

Another interesting use of modular BSDFs and embedding BSDFs inside of other BSDFs is in implementing bump mapping. Takua a0.5 implements bump mapping as a simple BSDF wrapper that calculates the bump mapped normal and passes that normal into whatever the underlying BSDF is. This approach allows for any BSDF to have a bump map, and even allows for applying multiple bump maps to the same piece of geometry. In addition to specifying bump maps as wrapper BSDFs, Takua a0.5 also allows attaching bump maps to individual geometry so that the same BSDF can be reused with a number of different bump maps attached to a number of different geometries, but under the hood this system works exactly the same as the BSDF wrapper bump map.

This notebook prop’s leathery surface detail comes entirely from a BSDF wrapper bump map:

Notebook with a leathery surface. All surface detail comes from bump mapping. Rendered using BDPT.

Finally, one of the most useful and interesting features of Takua a0.5’s BSDF system is the layered BSDF. The layered BSDF is a special BSDF that allows arbitrary combining, layering, and mixing between different BSDFs, much like Vray’s BlendMtl or Renderman 19/RIS’s LM shader system. Any BSDF can be used as a layer in a layered BSDF, including entire other layered BSDF networks. The Takua layered BSDF consists of a base substrate BSDF, and an arbitrary number of coat layers on top of the substrate. Each coat is given a texture-drive weight which determines how much of the final output BSDF is from the current coat layer versus from all of the layers and substrate below the current coat layer. Since the weight for each coat layer must be between 0 and 1, the result layered BSDF maintains physical correctness as long as all of the component BSDFs are also physically correct. Practically, the layered BSDF is implemented so that with each iteration, only one of the component BSDFs is evaluated and sampled, with the particular component BSDF per iteration chosen randomly based on each component BSDF’s weighting.

The layered BSDF system is what allows the creation of truly interesting and complex materials, since objects in reality often have complex materials consisting of a number of different scattering event types. For example, a real object may have a diffuse base with a glossy clear coat, but there may also be dust and fingerprints on top of the clear coat contributing to the final appearance. The globe model seen in my adaptive sampling post uses a complex layered BSDF; the base BSDF is ground glass, with the continents layered on top as a perfectly specular mirror BSDF, and then an additional dirt and fingerprints layer on top made up of diffuse and varying glossy BSDFs:

Glass globe using Takua's layered BSDF system. The globe has a base ground glass layer, a mirror layer for continents, and a dirt/fingerprints layer for additional detail. Rendered using VCM.

Here’s an additional close-up render of the globe that better shows off some of the complex surface detail:

Close-up of the globe. Rendered using VCM.

Going forward, I’m planning on adding a number of better BSDFs to Takua a0.5 (as mentioned before). Since the BSDF system is so modular and extensible, adding new BSDFs should be relatively simple and should require little to no additional work to integrate into the renderer. Because of how I designed BSDF wrappers, any new BSDF I add will automatically work with the bump map BSDF wrapper and the layered BSDF system. I’m also planning on adding interesting effects to the refractive/transmission BSDF, such as absorption based on Beer’s law and spectral diffraction.

After I finish work on my thesis, I also intend on adding more complex materials for subsurface scattering and volume rendering. These additions will be much more involved than just adding GGX or Beckmann, but I have a rough roadmap for how to proceed and I’ve already built a lot of supporting infrastructure into Takua a0.5. The plan for now is to implement a unified SSS/volume system based on the Unified Points, Beams, and Paths presented at SIGGRAPH 2014. UPBP can be thought of as extending VCM to combine a number of different volumetric rendering techniques. I can’t wait to get started on that over the summer!

Adaptive Sampling

Adaptive sampling is a relatively small and simple but very powerful feature, so I thought I’d write briefly about how adaptive sampling works in Takua a0.5. Before diving into the details though, I’ll start with a picture. The scene I’ll be using for comparisons in this post is a globe of the Earth, made of a polished ground glass with reflective metal insets for the landmasses and with a rough scratched metal stand. The globe is on a white backdrop and is lit by two off-camera area lights. The following render is the fully converged reference baseline for everything else in the post, rendered using VCM:

Fully converged reference baseline. Rendered in Takua a0.5 using VCM.

As mentioned before, in pathtracing based renderers, we solve the path integral through Monte Carlo sampling, which gives us an estimate of the total integral per sample thrown. As we throw more and more samples at the scene, we get a better and better estimate of the total integral, which explains why pathtracing based integrators start out producing a noisy image but eventually converge to a nice, smooth image if enough rays are traced per pixel.

In a naive renderer, the number of samples traced per pixel is usually just a fixed number, equal for all pixels. However, not all parts of the image are necessarily equally difficult to sample; for example, in the globe scene, the backdrop should require fewer samples than the ground glass globe to converge, and the ground glass globe in turn should require fewer samples than the two caustics on the ground. This observation means that a fixed sampling strategy can potentially be quite wasteful. Instead, computation can be used much more efficiently if the sampling strategy can adapt and drive more samples towards pixels that require more work to converge, while driving fewer samples towards pixels that have already converged mid-render. Such a sample can also be used to automatically stop the renderer once the sampler has detected that the entire render has converged, without needing user guesswork for how many samples to use.

The following image is the same globe scene as above, but limited to 5120 samples per pixel using bidirectional pathtracing and a fixed sampler. Note that most of the image is reasonable converged, but there is still noise visible in the caustics:

Fixed sampling, 5120 samples per pixel, BDPT.

Since it may be difficult to see the difference between this image and the baseline image on smaller screens, here is a close-up crop of the same caustic area between the two images:

500% crop. Left: converged baseline render. Right: fixed sampling, 5120 samples per pixel, BDPT.

The difficult part of implementing an adaptive sampler is, of course, figuring out a metric for convergence. The PBRT book presents a very simple adaptive sampling strategy on page 388 of the 2nd edition: for each pixel, generate some minimum number of initial samples and record the radiances returned by each initial sample. Then, take the average of the luminances of the returned radiances, and compute the contrast between each initial sample’s radiance and the average luminance. If any initial sample has a contrast from the average luminance above some threshold (say, 0.5), generate more samples for the pixel up until some maximum number of samples per pixel is reached. If all of the initial samples have contrasts below the threshold, then the sampler can mark the pixel as finished and move onto the next pixel. The idea behind this strategy is to try to eliminate fireflies, since fireflies result from statistically improbably samples that are significantly above the true value of the pixel.

The PBRT adaptive sampler works decently, but has a number of shortcomings. First, the need to draw a large number of samples per pixel simultaneously makes this approach less than ideal for progressive rendering; while well suited to a bucketed renderer, a progressive renderer prefers to draw a small number of samples per pixel per iteration, and return to each pixel to draw more samples in subsequent iterations. In theory, the PBRT adaptive sampler could be made to work with a progressive renderer if sample information was stored from each iteration until enough samples were accumulated to run an adaptive sampling check, but this approach would require storing a lot of extra information. Second, while the PBRT approach can guarantee some degree of per-pixel variance minimization, each pixel isn’t actually aware of what its neighbours look like, meaning that there still can be visual noise across the image. A better, global approach would have to take into account neighbouring pixel radiance values as a second check for whether or not a pixel is sufficiently sampled.

My first attempt at a global approach (the test scene in this post is a globe, but that pun was not intended) was to simply have the adaptive sampler check the contrast of each pixel with it’s immediate neighbours. Every N samples, the adaptive sampler would pull the accumulated radiances buffer and flag each pixel as unconverged if the pixel has a contrast greater than some threshold from at least one of its neighbours. Pixels marked unconverged are sampled for N more iterations, while pixels marked as converged are skipped for the next N iterations. After another N iterations, the adaptive sampler would go back and reflag every pixel, meaning that a pixel previously marked as converged could be reflagged as unconverged if its neighbours changed enormously. Generally N should be a rather large number (say, 128 samples per pixel), since doing convergence checks is meaningless if the image is too noisy at the time of the check.

Using this strategy, I got the following image, which was set to run for a maximum of 5120 samples per pixel but wound up averaging 4500 samples per pixel, or about a 12.1% reduction in samples needed:

Adaptive sampling per pixel, average 4500 samples per pixel, BDPT.

At an initial glance, this looks pretty good! However, as soon as I examined where the actual samples went, I realized that this strategy doesn’t work. The following image is a heatmap showing where samples were driven, with brighter areas indicating more samples per pixel:

Sampling heatmap for adaptive sampling per pixel. Brighter areas indicate more samples.

Generally, my per-pixel adaptive sampler did correctly identify the caustic areas as needing more samples, but a problem becomes apparent in the backdrop areas: the per-pixel adaptive sampler drove samples at clustered “chunks” evenly, but not evenly across different clusters. This behavior happens because while the per-pixel sampler is now taking into account variance across neighbours, it still doesn’t have any sort of global sense across the entire image! Instead, the sampler is finding localized pockets where variance seems even across pixels, but those pockets can be quite disconnected from further out areas. While the resultant render looks okay at a glance, clustered variance patterns becomes apparent if the image contrast is increased:

Adaptive sampling per pixel, with enhanced contrast. Note the local clustering artifacts.

Interestingly, these artifacts are reminiscent of the artifacts that show up in not-fully-converged Metropolis Light Transport renders. This similarity makes sense, since in both cases they arise from uneven localized convergence.

The next approach that I tried is a more global approach adapted from Dammertz et al.’s paper, “A Hierarchical Automatic Stopping Condition for Monte Carlo Global Illumination”. For the sake of simplicity, I’ll refer to the approach in this paper as Dammertz for the rest of this post. Dammertz works by considering the variance across an entire block of pixels at once and flagging the entire block as converged or unconverged, allowing for much more global analysis. At the first variance check, the only block considered is the entire image as one enormous block; if the total variance eb in the entire block is below a termination threshold et, the block is flagged as converged and no longer needs to be sampled further. If eb is greater than et but still less than a splitting threshold es, then the block will be split into two non-overlapping child blocks for the next round of variance checking after N iterations have passed. At each variance check, this process is repeated for each block, meaning the image eventually becomes split into an ocean of smaller blocks. Blocks are kept inside of a simple unsorted list, require no relational information to each other, and are removed from the list once marked as converged, making the memory requirements very simple. Blocks are split along their major axis, with the exact split point chosen to keep error as equal as possible across the two sides of the split.

The actual variance metric used is also very straightforward; instead of trying to calculate an estimate of variance based on neighbouring pixels, Dammertz stores two framebuffers: one buffer I for all accumulated radiances so far, and a second buffer A for accumulated radiances from every other iteration. As the image approaches full convergence, the differences between I and A should shrink, so an estimation of variance can be found simply by comparing radiance values between I and A. The specific details and formulations can be found in section 2.1 of the paper.

I made a single modification to the paper’s algorithm: I added a lower bound to the block size. Instead of allowing blocks to split all the way to a single pixel, I stop splitting after a block reaches 64 pixels in a 8x8 square. I found that splitting down to single pixels could sometimes cause false positives in convergence flagging, leading to missed pixels similar to in the PBRT approach. Forcing blocks to stop splitting at 64 pixels means there is a chance of false negatives for convergence, but a small amount of unnecessary oversampling is preferable to undersampling.

Using this per-block adaptive sampler, I got the following image, which again is superficially extremely similar to the fixed sampler result. This render was also set to run for a maximum of 5120 samples, but wound up averaging just 2920 samples per pixel, or about a 42.9% reduction in samples needed:

Adaptive sampling per block, average 2920 samples per pixel, BDPT.

The sample heatmap looks good too! The heatmap shows that the sampler correctly identified the caustic and highlight areas as needing more samples, and doesn’t have clustering issues in areas that needed fewer samples:

Sampling heatmap for adaptive sampling per block. Brighter areas indicate more samples.

Boosting the image contrast shows that the image is free of local clustering artifacts and noise is even across the entire image, which is what we would expect:

Adaptive sampling per block, with enhanced contrast. Note the even noise spread and lack of local clustering artifacts.

Looking at the same 500% crop area as earlier, the adaptive per-block and fixed sampling renders are indistinguishable:

500% crop. Left: fixed sampling, 5120 samples per pixel, BDPT. Right: adaptive per-block sampling, average 2920 samples per pixel, BDPT.

So with that, I think Dammertz works pretty well! Also, the computational and memory overhead required for the Dammertz approach is basically negligible relative to the actual rendering process. This approach is the one that is currently in Takua a0.5.

I actually have an additional adaptive sampling trick designed specifically for targeting fireflies. This additional trick works in conjunction with the Dammertz approach. However, this post is already much longer than I originally planned, so I’ll save that discussion for a later post. I’ll also be getting back to the PPM/VCM posts in my series of integrator posts shortly; I have not had much time to write on my blog since the vast majority of my time is currently focused on my thesis, but I’ll try to get something posted soon!

Flower Vase Renders

Rendered in Takua a0.5 using BDPT. Nearly a quarter of a billion triangles.

In order to test Takua a0.5, I’ve been using my renderer on some quick little “pretty picture” projects. I recently ran across a fantastic flower vase model by artist Andrei Mikhalenko and used Andrei’s model as the basis for a shading exercise. The above and following images are rendered entirely in Takua a0.5 using bidirectional pathtracing. I textured and shaded everything using Takua a0.5’s layered material system, and also made some small modifications to the model (moved some flowers around, extended the stems to the bottom of the vase, and thickened the bottom of the vase). Additionally, I further subdivided the flower petals to gain additional detail and smoothness, meaning the final rendered model weighs in at nearly a quarter of a billion triangles. Obviously using such heavy models is not practical for a single prop in real world production, but I wanted to push the amount of geometry my renderer can handle. Overall, total memory usage for each of these renders hovered around 10.5 GB. All images were rendered at 1920x1080 resolution; click on each image to see the full resolution renders.

For the flowers, I split all of the flowers into five randomly distributed groups and assigned each group a different flower material. Each material is a two-sided material with a different BSDF assigned to each side, with side determined by the surface normal direction. For each flower, the outside BSDF has a slightly darker reflectance than the inner BSDF, which efficiently approximates the subsurface scattering effect real flowers have, but without actually having to use subsurface scattering. In this case, using a two-sided material to fake the effect of subsurface scattering is desirable since the model is so complex and heavy. Also, the stems and branches are all bump mapped.

Rendered in Takua a0.5 using BDPT. Note the complex caustics from the vase and water.

This set of renders was a good test for bidirectional pathtracing because of the complex nature of the caustics in the vase and water; note that the branches inside of the vase and water cannot be efficiently rendered by unidirectional pathtracing since they are in glass and therefore cannot directly sample the light sources. The scene is lit by a pair of rectlights, one warmer and one cooler in temperature. This lighting setup, combined with the thick glass and water volume at the bottom of the vase, produces some interesting caustic on the ground beneath the vase.

The combination of the complex caustics and the complex geometry in the bouquet itself meant that a fairly deep maximum ray path length was required (16 bounces). Using BDPT helped immensely with resolving the complex bounce lighting inside of the bouquet, but the caustics proved to be difficult for BDPT; in all of these renders, everything except for the caustics converged within about 30 minutes on a quad-core Intel Core i7 machine, but the caustics took a few hours to converge in the top image, and a day to converge for the second image. I’ll discuss caustic performance in BDPT compared to PPM and VCM in some upcoming posts.

Rendered in Takua a0.5 using BDPT. Depth of field and circular bokeh entirely in-camera.

All depth of field is completely in-camera and in-renderer as well. No post processed depth of field whatsoever! For the time being, Takua a0.5 only supports circular apertures and therefore only circular bokeh, but I plan on adding custom aperture shapes after I finish my thesis work. In general, I think that testing my own renderer with plausibly real-world production quality scenes is very important. After all, having just a toy renderer with pictures of spheres is not very fun… the whole point of a renderer is to generate some really pretty pictures! For my next couple of posts, I’m planning on showing some more complex material/scene tests, and then moving onto discussing the PPM and VCM integrators in Takua.

Addendum: I should comment on the memory usage a bit more, since some folks have expressed interest in what I’m doing there. By default, the geometry actually weighs in closer to 30 GB in memory usage, so I had to implement some hackery to get this scene to fit in memory on a 16 GB machine. The hack is really simple: I added an optional half-float mode for geometry storage. In practice, using half-floats for geometry is usually not advisable due to precision loss, but in this particular scene, that precision loss becomes more acceptable due to a combination of depth of field hiding most alignment issues closer to camera, and sheer visual complexity making other alignment issues hard to spot without looking too closely. Additionally, for the flowers I also threw away all of the normals and recompute them on the fly at render-time. Recomputing normals on the fly results in a small performance hit, but it vastly preferable to going out of core.

Multiple Importance Sampling

A key tool introduced by Veach as part of his bidirectional pathtracing formulation is multiple importance sampling (MIS). As discussed in my previous post, the entire purpose of rendering from a mathematical perspective is to solve the light transport equation, which in the case of all pathtracing type renderers means solving the path integral formulation of light transport. Since the path integral does not have a closed form solution in all but the simplest of scenes, we have to estimate the full integral using various sampling techniques in path space, hence unidirectional pathtracing and bidirectional pathtracing and metropolis based techniques, etc. As we saw with the light source in glass case and with SDS paths, often a single path sampling technique is not sufficient for capturing a good estimate of the path integral. Instead, a good estimate often requires a combination of a number of different path sampling techniques; MIS is a critical mechanism for combining multiple sampling techniques in a manner that reduces total variance. Without MIS, directly combining sampling techniques through averaging can often have the opposite effect and increase total variance.

The following image is a recreation of the test scene used in the Veach thesis to demonstrate MIS. The scene consists of four glossy bars going from less glossy at the top to more glossy at the bottom, and four sphere lights of increasing size. The smallest sphere light has the highest emission intensity, and the largest sphere light has the lowest emission. I modified the scene to add in a large rectangular area light off camera on each side of the scene, and I added an additional bar to the bottom of the scene with gloss driven by a texture map:

Figure 1: Recreation of the Veach MIS test scene. Rendered in Takua.

The above scene is difficult to render using any single path sampling technique because of the various combinations of surface glossiness and emitter size/intensity. For large emitter/low gloss combinations, importance sampling by the BSDF tends to result in lower variance. In the case, the reason is that a given random ray direction is more likely to hit the large light than it is to fall within a narrow BSDF lobe, so matching the sample distribution to the BSDF lobe is more efficient. However, for small emitter/high gloss combinations, the reverse is true. If we take the standard Veach scene and sample by only BSDF and then only by light source, we can see how each strategy fails in different cases. Both of these renders would eventually converge if left to render for long enough, but the rate of convergence in difficult areas would be extremely slow:

Figure 2: BSDF sampling only, 64 iterations.

Figure 3: Light sampling only, 64 iterations.

MIS allows us to combine m different sampling strategies to produce a single unbiased estimator by weighting each sampling strategy by its probability distribution function (pdf). Mathematically, this is expressed as:

\[ \langle I_{j} \rangle_{MIS} = \sum_{i=1}^{m} \frac{1}{n_{i}} \sum_{j=1}^{n_{i}} w_{i}(X_{i,j}) \frac{f(X_{i,j})}{p_{i}(X_{i,j})} \]

where Xi,j are independent random variables drawn from some distribution function pi and wi(Xi,j) is some heuristic for weighting each sampling technique with respect to pdf. The reason MIS is able to significantly lower variance is because a good MIS weighting function should dampen contributions with low pdfs. The Veach thesis presents two good weighting heuristics, the power heuristic and the balance heuristic. The power heuristic is defined as:

\[ w_{i}(x) = \frac{[n_{i}p_{i}(x)]^{\beta}}{\sum_{n}^{k=1}[n_{k}p_{k}(x)]^{\beta}}\]

The power heuristic states that the weight for a given sampling technique should be the pdf of the sampling technique raised to a power β divided by the sum of the pdfs of all considered sampling techniques, with each sampling technique also raised to β. For the power heuristic, β is typically set to 2. The balance heuristic is simply the power heuristic for β=1. In the vast majority of cases, the balance heuristic is a near optimal weighting heuristic (and the power heuristic can cover most remaining edge cases), assuming that the base sampling strategies are decent to begin with.

For the standard Veach MIS demo scene, the best result is obtained by using MIS to combine BSDF and light sampling. The following image is the Veach scene again, this time rendered using MIS with 64 iterations. Note that all highlights are now roughly equally converged and the entire image matches the reference render above, apart from noise:

Figure 4: Light and BSDF sampling combined using MIS, 64 iterations.

BDPT inherently does not necessarily have an improved convergence rate over vanilla unidirectional pathtracing; BDPT gains its significant edge in convergence rate only once MIS is applied since BDPT’s efficiency comes from being able to extract a large number of path sampling techniques out of a single bidirectional path. To demonstrate the impact of MIS on BDPT, I rendered the following images using BDPT with and without MIS. The scene is a standard Cornell Box, but I replaced the back wall with a more complex scratched, glossy surface. The first image is the fully converged ground truth render, followed by with and without MIS:

Figure 5: Cornell Box with scratched glossy back wall. Rendered using BDPT with MIS.

Figure 6: BDPT with MIS, 16 iterations.

Figure 7: BDPT without MIS, 16 iterations.

As seen above, the version of BDPT without MIS is significantly less converged. BDPT without MIS will still converge to the correct solution, but in practice can often be only as good as, or worse than unidirectional pathtracing.

Later on, we’ll discuss MIS beyond bidirectional pathtracing. In fact, MIS is the critical component to making VCM possible!

Addendum 01/12/2018: A reader noticed some brightness inconsistencies in the original versions of Figures 2 and 3, which came from bugs in Takua’s light sampling code without MIS at the time. I have replaced the original versions of Figures 1, 2, 3, and 4 with new, correct versions rendered using the current version of Takua as of the time of writing for this addendum.

Because of how much noise there is in Figures 2 and 3, seeing that they converge to the reference image might be slightly harder. To make the convergence clearer, I rendered out each sampling strategy using 1024 samples per pixel, instead of just 64:

Figure 8: BSDF sampling only, 1024 iterations.

Figure 9: Light sampling only, 1024 iterations.

Note how Figures 8 and 9 match Figure 1 exactly, aside from noise. In Figure 9, the reflection of the right-most sphere light on the top-most bar is still extremely noisy because of the extreme difficulty of finding a random light sample that happens to produce a valid bsdf response for the near-perfect specular lobe.

One last minor note: I’m leaving the main text of this post unchanged, but the updated renders use Takua’s modern shading system instead of the old one from 2015; in the new shading system, the metal bars use roughness instead of gloss, and use GGX instead of a normalized Phong variant.

Bidirectional Pathtracing Integrator

As part of Takua a0.5’s complete rewrite, I implemented the vertex connection and merging (VCM) light transport algorithm. Implementing VCM was one of the largest efforts of this rewrite, and is probably the single feature that I am most proud of. Since VCM subsumes bidirectional pathtracing and progressive photon mapping, I also implemented Veach-style bidirectional pathtracing (BDPT) with multiple importance sampling (MIS) and Toshiya Hachisuka’s stochastic progressive photon mapping (SPPM) algorithm. Since each one of these integrators is fairly complex and interesting by themselves, I’ll be writing a series of posts on my BDPT and SPPM implementations before writing about my full VCM implementation. My plan is for each integrator to start with a longer post discussing the algorithm itself and show some test images demonstrating interesting properties of the algorithm, and then follow up with some shorter posts detailing specific tricky or interesting pieces and also show some pretty real-world production-plausible examples of when each algorithm is particularly useful.

As usual, we’ll start off with an image. Of course, all images in this post are rendered entirely using Takua a0.5. The following image is a Cornell Box lit by a textured sphere light completely encased in a glass sphere, rendered using my bidirectional pathtracer integrator. For reasons I’ll discuss a bit later in this post, this scene belongs to a whole class of scenes that unidirectional pathtracing is absolutely abysmal; these scenes require a bidirectional integrator to converge in any reasonable amount of time:

Room lit with a textured sphere light enclosed in a glass sphere, converged result rendered using bidirectional pathtracing.

To understand why BDPT is a more robust integrator than unidirectional pathtracing, we need to start by examining the light transport equation and its path integral formulation. The light transport equation was introduced by Kajiya and is typically presented using the formulation from Eric Veach’s thesis:

\[ L_{\text{o}}(\mathbf{x},\, \omega_{\text{o}}) \,=\, L_e(\mathbf{x},\, \omega_{\text{o}}) \ +\, \int_{\mathcal{S}^2} L_{\text{o}}(\mathbf{x}_\mathcal{M}(\mathbf{x},\, \omega_{i}),\, -\omega_{i}) \, f_s(\mathbf{x},\, \omega_{i} \rightarrow \omega_{\text{o}}) \, d \sigma_{\mathbf{x}}^{\perp} (\omega_{i}) \]

Put into words instead of math, the light transport equation simply states that the amount of light leaving any point is equal to the amount of light emitted at that point plus the total amount of light arriving at that point from all directions, weighted by the surface reflectance and absorption at that point. Combined with later extensions to account for effects such as volume scattering and subsurface scattering and diffraction, the light transport equation serves as the basis for all of modern physically based rendering. In order to solve the light transport equation in a practical manner, Veach presents the path integral formulation of light transport:

\[ I_{j} = \int_{\Omega}^{} L_{e}(\mathbf{x}_{0})G(\mathbf{x}_{0}\leftrightarrow \mathbf{x}_{1})[\prod_{i=1}^{k-1}\rho(\mathbf{x}_{i})G(\mathbf{x}_{i}\leftrightarrow \mathbf{x}_{i+1})]W_{e}(\mathbf{x}_{k}) d\mu(\bar{\mathbf{x}}) \]

The path integral states that for a given pixel on an image, the amount of radiance arriving at that pixel is the integral of all radiance coming in through all paths in path space, where a path is the route taken by an individual photon from the light source through the scene to the camera/eye/sensor, and path space simply encompasses all possible paths. Since there is no closed form solution to the path integral, the goal of modern physically based ray-tracing renderers is to sample a representative subset of path space in order to produce a reasonably accurate estimate of the path integral per pixel; progressive renderers estimate the path integral piece by piece, producing a better and better estimate of the full integral with each new iteration.

At this point, we should take a brief detour to discuss the terms “unbiased” versus “biased” rendering. Within the graphics world, there’s a lot of confusion and preconceptions about what each of these terms mean. In actuality, an unbiased rendering algorithm is simply one where each iteration produces an exact result for the particular piece of path space being sampled. A biased rendering algorithm is one where at each iteration, an approximate result is produced for the piece of path space being sampled. However, biased algorithms are not necessarily a bad thing; a biased algorithm can be consistent, that is, converges in the limit to the same result as an unbiased algorithm. Consistency means that an estimator arrives at the accurate result in the limit; so in practice, we should care less about whether or not an algorithm is biased or unbiased so long as it is consistent. BDPT is an unbiased, consistent integrator whereas SPPM is a biased but still consistent integrator.

Going back to the path integral, we can quickly see where unidirectional pathtracing comes from once we view light transport through the path integral. The most obvious way to evaluate the path integral is to do exactly as the path integral says: trace a path starting from a light source, through the scene, and if the path eventually hits the camera, accumulate the radiance along the path. This approach is one form of unidirectional pathtracing that is typically referred to as light tracing (LT). However, since the camera is a fairly small target for paths to hit, unidirectional pathtracing is typically implemented going in reverse: start each path at the camera, and trace through the scene until each path hits a light source or goes off into empty space and is lost. This approach is called backwards pathtracing and is what people usually are referring to when they use the term pathtracing (PT).

As I discussed a few years back in a previous post, pathtracing with direct light importance sampling is pretty efficient at a wide variety of scene types. However, pathtracing with direct light importance sampling will fail for any type of path where the light source cannot be directly sampled; we can easily construct a number of plausible, common setups where this situation occurs. For example, imagine a case where a light source is completely enclosed within a glass container, such as a glowing filament within a glass light bulb. In this case, for any pair consisting of a point in space and a point on the light source, the direction vector to hit the light point from the surface point through glass is not just the light point minus the surface point normalized, but instead has to be at an angle such that the path hits the light point after refracting through glass. Without knowing the exact angle required to make this connection beforehand, the probability of a random direct light sample direction arriving at the glass interface at the correct angle is extremely small; this problem is compounded if the light source itself is very small to begin with.

Taking the sphere light in a glass sphere scene from earlier, we can compare the result of pathtracing without glass around the light versus with glass around the light. The following comparison shows 16 iterations each, and we can see that the version with glass around the light is significantly less converged:

Pathtracing, 16 iterations, with glass sphere.

Pathtracing, 16 iterations, without glass sphere.

Generally, pathtracing is terrible at resolving caustics, and the glass-in-light scenario is one where all illumination within the scene is through caustics. Conversely, light tracing is quite good at handling caustics and can be combined with direct sensor importance sampling (same idea as direct light importance sampling, just targeting the camera/eye/sensor instead of a light source). However, light tracing in turn is bad at handling certain scenarios that pathtracing can handle well, such as small distant spherical lights.

The following image again shows the sphere light in a glass sphere scene, but is now rendered for 16 iterations using light tracing. Note how the render is significantly more converged, for approximately the same computational cost. The glass sphere and sphere light render as black since in light tracing, the camera cannot be directly sampled from a specular surface.

Light tracing, 16 iterations, with glass sphere.

Since bidirectional pathtracing subsumes both pathtracing and light tracing, I implemented pathtracing and light tracing simultaneously and used each integrator as a check on the other, since correct integrators should converge to the same result. Implementing light tracing requires BSDFs and emitters to be a bit more robust than in vanilla pathtracing; emitters have to support both emission and illumination, and BSDFs have to support bidirectional evaluation. Light tracing also requires the ability to directly sample the camera and intersect the camera’s image plane to figure out what pixel to contribute a path to; as such, I implemented a rasterize function for my thin-lens and fisheye camera models. My thin-lens camera’s rasterization function supports the same depth of field and bokeh shape capabilities that the thin-lens camera’s raycast function supports.

The key insight behind bidirectional pathtracing is that since light tracing and vanilla pathtracing each have certain strengths and weaknesses, combining the two sampling techniques should result in a more robust path sampling technique. In BDPT, for each pixel per iteration, a path is traced starting from the camera and a second path is traced starting from a point on a light source. The two paths are then joined into a single path, conditional on an unoccluded line of sight from the end vertices of the two paths to each other. A BDPT path of length k with k+1 vertices can then be used to generate up to k+2 path sampling techniques by connecting each vertex on each subpath to every other vertex on the other subpath. While BDPT per iteration is much more expensive than unidirectional pathtracing, the much larger number of sampling techniques leads to a significantly higher convergence rate that typically outweighs the higher computational cost.

Below is the same scene as above rendered with 16 iterations of BDPT, and rendered with the same amount of computation time (about 5 iterations of BDPT). Note how with just 5 iterations, the BDPT result with the glass sphere has about the same level of noise as the pathtraced result for 16 iterations without the glass sphere. At 16 iterations, the BDPT result with the glass sphere is noticeably more converged than the pathtraced result for 16 iterations without the glass sphere.

BDPT, 16 iterations, with glass sphere.

BDPT, 5 iterations (same compute time as 16 iterations pathtracing), with glass sphere.

A naive implementation of BDPT would be for each pixel per iteration, trace a full light subpath, store the result, trace a full camera subpath, store the result, and then perform the connection operations between each vertex pair. However, since this approach requires storing the entirety of both subpaths for the entire iteration, there is room for some improvement. For Takua a0.5, my implementation stores only the full light subpath. At each bounce of the camera subpath, my implementation connects the current vertex to each vertex of the stored light subpath, weights and accumulates the result, and then moves onto the next bounce without having to store previous path vertices.

The following image is another example of a scene that BDPT is significantly better at sampling than any unidirectional pathtracing technique. The scene consists of a number of diffuse spheres and spherical lights inside of a glass bunny. In this scene, everything outside of the bunny is being lit using only caustics, while diffuse surfaces inside of the bunny are being lit using a combination of direct lighting, indirect diffuse bounces, and caustics from outside of the bunny reflecting/refracting back into the bunny. This last type of lighting belongs to a category of paths known as specular-diffuse-specular (SDS) paths that are especially difficult to sample unidirectionally.

Various diffuse spheres and sphere lights inside of a glass bunny, rendered using BDPT.

Here is the same scene as above, but with the glass bunny removed just so seeing what is going on with the spheres is a bit easier:

Same spheres as above, sans bunny. Rendered using BDPT.

Comparing pathtracer versus BDPT performance for 16 interations, BDPT’s vastly better performance on this scene becomes obvious:

16 iterations, rendered using pathtracing.

16 iterations, rendered using BDPT.

In the next post, I’ll write about multiple importance sampling (MIS), how it impacts BDPT, and my MIS implementation in Takua a0.5.