Importance Sampled Direct Lighting

Takua Render now has correct, fully working importance sampled direct lighting, supported for any type of light geometry! More importantly, the importance sampled direct lighting system is now fully integrated with the overall GI pathtracing integrator.

A naive, standard pathtracing implementation shoots out rays and accumulates colors until a light source is reached, upon which the total accumulated color is multiplied by the emittance of the light source and added to the framebuffer. As a result, even the simplest pathtracing integrator does account for both the indirect and direct illumination within a scene, but since sampling light sources is entirely dependent on the BRDF at each point, correctly sampling the direct illumination component in the scene is extremely inefficient. The canonical example of this inefficiency is a scene with a single very small, very intense, very far away light source. Since the probability of hitting such a small light source is so small, convergence is extremely slow.

To demonstrate/test this property, I made a simple test scene with an extremely bright sun-like object illuminating the scene from a huge distance away:

Using naive pathtracing without importance sampled direct lighting produces an image like this after 16 samples per pixel:

Mathematically, the image is correct, but is effectively useless since so few contributing ray paths have actually been found. Even after 5120 samples, the image is still pretty useless:

Instead, a much better approach is to accumulate colors just like before, but not bother waiting until a light source is hit by the ray path through pure BRDF sampling to multiply emittance. Instead, at each ray bounce, a new indirect ray is generated via the BRDF like before, AND to generate a new direct ray towards a randomly chosen light source via multiple importance sampling and multiply the accumulated color by the resultant emittance. Multiple importance sampled direct lighting works by balancing two different sampling strategies: sampling by light source and sampling by BRDF, and then weighting the two results with some sort of heuristic (such as the power heuristic described in Eric Veach’s thesis).

Sampling by light source is the trickier part of this technique. The idea is to generate a ray that we know will hit a light source, and then weight the contribution from that ray by the probability of generating that ray to remove the bias introduced by artificially choosing a ray direction. There’s a few good ways to do this: one way is to generate an evenly distributed random point on a light source as the target for the direct lighting ray, and then weight the result using the probability distribution function with respect to surface area, transformed into a PDF with respect to solid angle.

Takua Render at the moment uses a slightly different approach, for the sake of simplicity. The approach I’m using is similar to the one described in my earlier post on the topic, but with a disk instead of a sphere. The approach works like this:

  1. Figure out a bounding sphere for the light source
  2. Construct a ray from the point to be lit to the center of the bounding sphere. Let’s call the direction of this ray D.
  3. Find a great circle on the bounding sphere with a normal N, such that N is lined up exactly with D.
  4. Move the great circle along its normal towards the point to be lit by a distance of exactly the radius of the bounding sphere
  5. Treat the great circle as a disk and generate uniformly distributed random points on the disk to shoot rays towards.
  6. Weight light samples by the projected solid angle of the disk on the point being lit.

Alternatively, the weighting can simply be based on the normal solid angle instead of the projected solid angle, since the random points are chosen with a cosine weighted distribution.

The nice thing about this approach is that it allows for importance sampled direct lighting even for shapes that are difficult to sample random points on; effectively, the problem of sampling light sources is abstracted away, at the cost of a slight loss in efficiency since some percentage of rays fired at the disk have to miss the light in order for the weighting to remain unbiased.

I also started work on the surface area PDF to solid angle PDF method, so I might post about that later too. But for now, everything works! With importance sampled direct lighting, the scene from above is actually renderable in a reasonable amount of time. With just 16 samples per pixel, Takua Render now can generate this image:

…and after 5120 samples per pixel, a perfectly clean render:

The other cool thing about this scene is that most of the scene is actually being lit through pure indirect illumination. With only direct illumination and no GI, the render looks like this:

Quick Update on Future Plans

Just a super quick update on my future plans:

Next year, starting in September, I’ll be joining Dr. Don Greenberg and Dr. Joseph T. Kider and others at Cornell’s Program for Computer Graphics. I’ll be pursuing a Master of Science in Computer Graphics there, and will most likely be working on something involving rendering (which I suppose is not surprising).

Between the end of school and September, I’ll be spending the summer at Pixar Animation Studios once again, this time as part of Pixar’s Research Group.

Obviously I’m quite excited by all of this!

Now, back to working on my renderer.

Working Towards Importance Sampled Direct Lighting

I haven’t made a post in a few weeks now since I’ve been working on a number of different things all of which aren’t quite done yet. Since its been a few weeks, here’s a writeup of one of the things I’m working on and where I am with that.

One of the major features I’ve been working towards for the past few weeks is full multiple importance sampling, which will serve a couple of purposes. First, importance sampling the direct lighting contribution in the image should allow for significantly higher convergence rates for the same amount of compute, allowing for much smoother renders for the same render time. Second, MIS will serve as groundwork for future bidirectional integration schemes, such as Metropolis transport and photon mapping. I’ve been working with my friend Xing Du on understanding the math behind MIS and figuring out how exactly the math should translate into implementation.

So first off, some ground truth tests. All ground truth tests are rendered using brute force pathtracing with hundreds of thousands of iterations per pixel. Here is the test scene I’ve been using lately, with all surfaces reduced to lambert diffuse for the sake of simplification:

Ground truth global illumination render, representing 512000 samples per pixel. All lights sampled by BRDF only.

Ground truth for direct lighting contribution only, with all lights sampled by BRDF only.

Ground truth for indirect lighting contribution only.

The motivation behind importance sampling lights by directly sampling objects with emissive materials comes from the difficulty of finding useful samples from the BRDF only; for example, for the lambert diffuse case, since sampling from only the BRDF produces outgoing rays in totally random (or, slightly better, cosine weighted random) directions, the probability of any ray coming from a diffuse surface actually hitting a light is relatively low, meaning that the contribution of each sample is likely to be low as well. As a result, finding the direct lighting contribution via just BRDF sampling.

For example, here’s the direct lighting contribution only, after 64 samples per pixel with only BRDF sampling:

Direct lighting contribution only, all lights sampled by BRDF only, 64 samples per pixel.

Instead of sampling direct lighting contribution by shooting a ray off in a random direction and hoping that maybe it will hit a light, a much better strategy would be to… shoot the ray towards the light source. This way, the contribution from the sample is guaranteed to be useful. There’s one hitch though: the weighting for a sample chosen using the BRDF is relatively simple to determine. For example, in the lambert diffuse case, since the probability of any particular random sample within a hemisphere is the same as any other sample, the weighting per sample is even with all other samples. Once we selectively choose the ray direction specifically towards the light though, the weighting per sample is no longer even. Instead, we must weight each sample by the probability of a ray going in that particular direction towards the light, which we can calculate by the solid angle subtended by the light source divided by the total solid angle of the hemisphere.

So, a trivial example case would be if a point was being lit by a large area light subtending exactly half of the hemisphere visible from the point. In this case, the area light subtends Pi steradians, making its total weight Pi/(2*Pi), or one half.

The tricky part of calculating the solid angle weighting is in calculating the fractional unit-spherical surface area projection for non-uniform light sources. In other words, figuring out what solid angle a sphere subtends is easy, but figuring out what solid angle a Stanford Bunny subtends is…. less easy.

The initial approach that Xing and I arrived at was to break complex meshes down into triangles and treat each triangle as a separate light, since calculating the solid angle subtended by a triangle is once again easy. However, since treating a mesh as a cloud of triangle area lights is potentially very expensive, for each iteration the direct lighting contribution from all lights in the scene becomes potentially untenable, meaning that each iteration of the render will have to randomly select a small number of lights to directly sample.

As a result, we brainstormed some ideas for potential shortcuts. One shortcut idea we came up with was that instead of choosing an evenly distributed point on the surface of the light to serve as the target for our ray, we could instead shoot a ray at the bounding sphere for the target light and weight the resulting sample by the solid angle subtended not by the light itself, but by the bounding sphere. Our thinking was that this approach would dramatically simplify the work of calculating the solid angle weighting, while still maintaining mathematical correctness and unbiasedness since the number of rays fired at the bounding sphere that will miss the light should exactly offset the overweighting produced by using the bounding sphere’s subtended solid angle.

I went ahead and tried out this idea, and produced the following image:

Direct lighting contribution only, all lights sampled by direct sampling weighted by subtended solid angle, 64 samples per pixel.

First off, for the most part, it works! The resulting direct illumination matches the ground truth and the BRDF-sampling render, but is significantly more converged than the BRDF-sampling render for the same number of samples. BUT, there is a critical flaw: note the black circle around the light source on the ceiling. That black circle happens to fall exactly within the bounding sphere for the light source, and results from a very simple mathematical fact: calculating the solid angle subtended by the bounding sphere for a point INSIDE of the bounding sphere is undefined. In other words, this shortcut approach will fail for any points that are too close to a light source.

One possible workaround I tried was to have any points inside of a light’s bounding sphere to fall back to pure BRDF sampling, but this approach is also undesirable, as a highly visible discontinuity between the differently sampled area develops due to vastly different convergence rates.

So, while the overall solid angle weighting approach checks out, our shortcut does not. I’m now working on implementing the first approach described above, which should produce a correct result, and will post in the next few days.

Stratified versus Uniform Sampling

As part of Takua Render’s new pathtracing core, I’ve implemented a system allowing for multiple sampling methods instead of just uniform sampling. The first new sampling method I’ve added in addition to uniform sampling is stratified sampling. Basically, in stratified sampling, instead of spreading samples per iteration across the entire probability region, the probability region is first divided into a number of equal sized, non-overlapping subregions, and then for each iteration, a sample is drawn with uniform probability from within a single subregion, called a strata. The result of stratified sampling is that samples are guaranteed to be more evenly spread across the entire probability domain instead of clustered within a single area, resulting in less visible noise for the same number of samples compared to uniform sampling. At the same time, since stratified sampling still maintains a random distribution within each strata, the aliasing problems associated with a totally even sample distribution are still avoided.

Here’s a video showing a scene rendered in Takua Render with uniform and then stratified sampling. The video also shows a side-by-side comparison in its last third.

In the renders in the above video, stratified sampling is being used to choose new ray directions from diffuse surface bounces; instead of choosing a random point over the entire cosine-weighted hemisphere at an intersection point, the renderer first chooses a strata with the same steradian as all other strata, and then chooses a random sample within that solid angle. The strata is chosen sequentially for primary bounces, and then chosen randomly for all secondary bounces to maintain unbiased sampling over the whole render. As a result of the sequential strata selection for primary bounces, images rendered in Takua Render will not converged to an unbiased solution until N iterations have elapsed, where N is the number of strata the probability region is divided into. The number of strata can be set by the user as a value in the scene description which is then squared to get the total strata number. So, if a user specifies a strata level of 16, then the probability region will be divided into 256 strata and a unbiased result will not be reached until 256 or more samples per pixel have been taken.

Here’s the Lamborghini model from last post at 256 samples per pixel with stratified (256 strata) and uniform sampling, to demonstrate how much less perceptible noise there is with the stratified sampler. From a distance, the uniform sampler renders may seem slightly darker side by side due to the higher percentage of noise, but if you compare them using the lightbox, you can see that the lighting and brightness is the same.

Stratified sampling, 256 strata, 256 samples per pixel

Uniform sampling, 256 samples per pixel

…and up-close crops with 400% zoom:

Stratified sampling, 256 strata, 256 samples per pixel, 400% crop

Uniform sampling, 256 samples per pixel, 400% crop

At some point soon I will also be implementing Halton sequence sampling and [0,2]-sequence sampling, but for the time being, stratified sampling is already providing a huge visual boost over uniform! In fact, I have a small secret to confess: all of the renders in the last post were rendered with the stratified sampler!

First Progress on New Pathtracing Core

I’ve started work on a completely new pathtracing core to replace the one used in Rev 2. The purpose of totally rewriting the entire pathtracing integrator and brdf systems is to produce something much more modular and robust; as much as possible, I am now decoupling brdf and new ray direction calculation from the actual pathtracing loop.

I’m still in the earliest stages of this rewrite, but I have some test images! Each of the following images was rendered out to somewhere around 25000 samples per pixel (a lot!), at about 5/6 samples per pixel per second. I let the renders run without a hard ending point and terminated them after I walked away for a while and came back, hence the inexact but enormous samples per pixel counts. Each scene was lit with my standard studio-styled lighting setup and in addition to the showcased model, uses a smooth backdrop that consists of about 10000 triangles.

Approximately 100000 face Stanford Dragon:

Approximately 150000 face Deloreon model:

Approximately 250000 face Lamborghini Aventador model:

Short-stack KD-Tree Traversal

In my last post, I talked about implementing history flag based kd-tree traversal. While the history flag based stackless traverse worked perfectly fine in terms of traversing the tree and finding the nearest intersection, I discovered over the past week that its performance is… less than thrilling. Unfortunately, the history flag system results in a huge amount of redundant node visits, since the entire system is state based and therefore necessarily needs to visit every node in a branch of the tree both to move down and up the branch.

So instead, I decided to try out a short-stack based approach. My initial concern with short-stack based approaches was the daunting memory requirements that keeping a short stack for a few hundred threads, however, I realized that realistically, a short stack never needs to be any larger than the maximum depth of the kd-tree being traversed. Since I haven’t yet had a need to test a tree with a depth beyond 100, the memory usage required for keeping short stacks is reasonably predictable and manageable; as a precaution, however, I’ve also decided to allow for the system to fall back to a stackless traverse in the case that a tree’s depth causes short stack memory usage to become unreasonable.

The actual short-stack traverse I’m using is a fairly standard while-while traverse based on the 2008 Kun Zhou realtime kd-tree paper and the 2007 Daniel Horn GPU kd-tree paper. I’ve added one small addition though: in addition to keeping a short stack for traversing the kd-tree, I’ve also added an optional second short stack that tracks the last N intersection test objects. The reason for keeping this second short stack is that kd-trees allow for objects to be split across multiple nodes; by tracking which objects we have already encountered, we can safely detect and skip objects that have already been tested. The object tracking short stack is meant to be rather small (say, no more than 10 to 15 objects at a time), and simply loops back and overwrites the oldest values in the stack when it overflows.

The new while-while traversal is significantly faster than the history flag approach, to the order of a 10x or better performance increase in some cases.

In order to validate that the entire kd traversal system works, I did a quick and dirty port of the old Rev 2 pathtracing integrator to run on top of the new Rev 3 framework. The following test images contain about 20000 faces and objects, and clocked in at about 6 samples per pixel per second with a tree depth of 15. Each image was rendered to 1024 samples per pixel:

I also attempted to render these images without any kd-tree acceleration as a control. Without kd-tree acceleration, each sample per pixel took upwards of 5 seconds, and I wound up terminating the renders before they got even close to completion.

The use of my old Rev 2 pathtracing core is purely temporary, however. The next task I’ll be tackling is a total rewrite of the entire pathtracing system and associated lighting and brdf evaluation systems. Previously, this systems have basically been monolithic blocks of code, but with this rewrite, I want to create a more modular, robust system that can recycle as much code as possible between GPU and CPU implementations, the GL debugger, and eventually other integration methods, such as photon mapping.