Bounding Boxes for Ellipsoids

Update (2014): Tavian Barnes has written a far better / more detailed post on this topic; instead of reading my post, I suggest you go read Tavian’s post instead.

Warning: this post is going to be pretty math-heavy.

Let’s talk about spheres, or more generally, ellipsoids. Specifically, let’s talk about calculating axis aligned bounding boxes for arbitrarily transformed ellipsoids, which is a bit of an interesting problem I recently stumbled upon while working on Takua Rev 3. I’m making this post because finding a solution took a lot of searching and I didn’t find any single collected source of information on this problem, so I figured I’d post it for both my own reference and for anyone else who may find this useful.

So what’s so hard about calculating tight axis aligned bounding boxes for arbitrary ellipsoids?

Well, consider a basic, boring sphere. The easiest way to calculate a tight axis aligned bounding box (or AABB) for a mesh is to simply min/max all of the vertices in the mesh to get two points representing the min and max points of the AABB. Similarly, getting a tight AABB for a box is easy: just use the eight vertices of the box for the min/max process. A naive approach to getting a tight AABB for a sphere seems simple then: along the three axes of the sphere, have one point on each end of the axis on the surface of the sphere, and then min/max. Figure 1. shows a 2D example of this naive approach, to extend the example to 3D, simply add two more points for the Z axis (I drew the illustrations quickly in Photoshop, so apologies for only showing 2D examples):

Figure 1.

This naive approach, however, quickly fails if we rotate the sphere such that its axes are no longer lined up nicely with the world axes. In Figure 2, our sphere is rotated, resulting in a way too small AABB if we min/max points on the sphere axes:

Figure 2.

If we scale the sphere such that it becomes an ellipsoid, the same problem persists, as the sphere is just a subtype of ellipsoid. In Figures 3 and 4, the same problem found in Figures 1/2 is illustrated with an ellipsoid:

Figure 3.

Figure 4.

One possible solution is to continue using the naive min/max axes approach, but simply expand the resultant AABB by some percentage such that it encompasses the whole sphere. However, we have no way of knowing what percentage will give an exact bound, so the only feasible way to use this fix is by making the AABB always larger than a tight fit would require. As a result, this solution is almost as undesirable as the naive solution, since the whole point of this exercise is to create as tight of an AABB as possible for as efficient intersection culling as possible!

Instead of min/maxing the axes, we need to use some more advanced math to get a tight AABB for ellipsoids.

We begin by noting our transformation matrix, which we’ll call M. We’ll also need the transpose of M, which we’ll call MT. Next, we define a sphere S using a 4x4 matrix:

[ r 0 0 0 ]
[ 0 r 0 0 ]
[ 0 0 r 0 ]
[ 0 0 0 -1] 

where r is the radius of the sphere. So for a unit diameter sphere, r = .5. Once we have built S, we’ll take its inverse, which we’ll call SI.

We now calculate a new 4x4 matrix R = M*SI*MT. R should be symmetric when we’re done, such that R = transpose(R). We’ll assign R’s indices the following names:

R = [ r11 r12 r13 r14 ] 
  [ r12 r22 r23 r24 ] 
  [ r13 r23 r23 r24 ] 
  [ r14 r24 r24 r24 ] 

Using R, we can now get our bounds:

zmax = (r23 + sqrt(pow(r23,2) - (r33*r22)) ) / r33; 
  	zmin = (r23 - sqrt(pow(r23,2) - (r33*r22)) ) / r33; 
  	ymax = (r13 + sqrt(pow(r13,2) - (r33*r11)) ) / r33; 
  	ymin = (r13 - sqrt(pow(r13,2) - (r33*r11)) ) / r33; 
  	xmax = (r03 + sqrt(pow(r03,2) - (r33*r00)) ) / r33; 
  	xmin = (r03 - sqrt(pow(r03,2) - (r33*r00)) ) / r33; 

…and we’re done!

Just to prove that it works, a screenshot of a transformed ellipse inside of a tight AABB in 3D from Takua Rev 3’s GL Debug view:

I’ve totally glossed over the mathematical rationale behind this method in this post and focused just on how to quickly get a working implementation, but if you want to read more about the actual math behind how it works, these are the two sources I pulled this from:

Stack Overflow post by user fd

Article by Inigo Quilez

In other news, Takua Rev 3’s new scene system is now complete and I am working on a brand new, better, faster, stackless KD-tree implementation. More on that later!

Revision 3, Old Renders

At the beginning of the semester, I decided to re-architect Takua again, hence the lack of updates for a couple of weeks now. I’ll talk more in-depth about the details of how this new architecture works in a later post, so for now I’ll just quickly describe the motivation behind this second round of re-architecting. As I wrote about before, I’ve been keeping parallel CPU and GPU branches of my renderer so far, but the two branches have increasingly diverged. On top of that, the GPU branch of my renderer, although significantly better organized than the experimental CUDA renderer from spring 2012, still is rather suboptimal; after TAing CIS565 for a semester, I’ve developed what I think are some better ways of architecting CUDA code. Over winter break, I began to wonder if merging the CPU and GPU branches might be possible, and if such a task could be done, how I might go about doing it.

This newest re-structuring of Takua accomplishes that goal. I’m calling this new version of Takua “Revision 3”, as it is the third major rewrite.

My new architecture centers around a couple of observations. First, we can observe that the lowest common denominator (so to speak) for structured data in CUDA and C++ is… a struct. Similarly, the easiest way to recycle code between CUDA and C++ is to implement code as inlineable, C style functions that can either be embedded in a CUDA kernel at compile time, or wrapped within a C++ class for use in C++. Therefore, one possible way to unify CPU C++ and GPU CUDA codebases could be to implement core components of the renderer using structs and C-style, inlineable functions, allowing easy integration into CUDA kernels, and then write thin wrapper classes around said structs and functions to allow for nice, object oriented C++ code. This exact system is how I am building Takua Revision 3; the end result should be a unified codebase that can compile to both CPU and GPU versions, and allow for both versions to develop in near lockstep.

Again, I’ll go into a more detailed explanation once this process is complete.

I’ll leave this post with a slightly orthogonal note; whilst in the process of merging code, I found some images from Takua Revision 1 that I never posted for some reason. Here’s a particularly cool pair of images from when I was implementing depth of field. The first image depicts a glass Stanford dragon without any depth of field, and the second image depicts the same exact scene with some crazy shallow aperture (I don’t remember the exact settings). You can tell these are from the days of Takua Revision 1 by the ceiling; I often made the entire ceiling a light source to speed up renders back then, until Revision 2’s huge performance increases rendered cheats like that unnecessary.

Glass Stanford dragon without depth of field

Glass Stanford dragon with depth of field

Texture Mapping

A few weeks back I started work on another piece of super low-hanging fruit: texture mapping! Before I delve into the details, here’s a test render showing three texture mapped spheres with varying degrees of glossiness in a glossy-walled Cornell box. I was also playing with logos for Takua render and put a test logo idea on the back wall for fun:

…and the same scene with the camera tilted down just to show off the glossy floor (because I really like the blurry glossy reflections):

My texturing system can, of course, support textures of arbitrary resolution. The black and white grid and colored UV tile textures in the above render are square 1024x1024, while the Earth texture is rectangular 1024x512. Huge textures are handled just fine, as demonstrated by the following render using a giant 2048x2048, color tweaked version of Simon Page’s Space Janus wallpaper:

Of course UV transformations are supported. Here’s the same texture with a 35 degree UV rotation applied and tiling switched on:

Since memory is always at a premium, especially on the GPU, I’ve implemented textures in a fashion inspired by geometry instancing and node based material systems, such as the system for Maya. Inside of my renderer, I represent texture files as a file node containing the raw image data, streamed from disk via stb_image. I then apply transformations, UV operations, etc through a texture properties node, which maintains a pointer to the relevant texture file node, and then materials point to whatever texture properties nodes they need. This way, texture data can be read and stored once in memory and recycled as many times as needed, meaning that a well formatted scene file can altogether eliminate the need for redundant texture read/storage in memory. This system allows me to create amusing scenes like the following one, where a single striped texture is reused in a number of materials with varied properties:

Admittedly I made that stripe texture really quickly in Photoshop without too much care for straightness of lines, so it doesn’t actually tile very well. Hence why the sphere in the lower front shows a discontinuity in its texture… that’s not glitchy UVing, just a crappy texture!

I’ve also gone ahead and extended my materials system to allow any material property to be driven with a texture. In fact, the stripe room render above is using the same stripe texture to drive reflectiveness on the side walls, resulting in reflective surfaces where the texture is black and diffuse surfaces where the texture is white. Here’s another example of texture driven material properties showing emission being driven using the same color-adjusted Space Janus texture from before:

Even refractive and reflective index of refraction can be driven with textures, which can yield some weird/interesting results. Here are a pair of renders showing a refractive red cube with uniform IOR, and with IOR driven with a Perlin noise map:

Uniform refractive IOR

Refractive IOR driven with a Perlin noise texture map

The nice thing about a node-style material representation is that I should be able to easily plug in procedural functions in place of textures whenever I get around to implementing some (that way I can use procedural Perlin noise instead of using a noise texture).

Here’s an admittedly kind of ugly render using the color UV grid texture to drive refractive color:

For some properties, I’ve had to add a requirement to specify a range of valid values by the user when using a texture map, since RGB values don’t map well to said properties. An example would be glossiness, where a gloss value range of 0% to 100% leaves little room for detailed adjustment. Of course this issue can be fixed by adding support for floating point image formats such as OpenEXR, which is coming very soon! In the following render, the back wall’s glossiness is being driven using the stripe texture (texture driven IOR is also still in effect on the red refractive cube):

Of course, even with nice instancing schemes, textures potentially can take up a gargantuan amount of memory, which poses a huge problem in the GPU world where onboard memory is at a premium. I still need to think more about how I’m going to deal with memory footprints larger than on-device memory, but at the moment my plan is to let the renderer allocate and overflow into pinned host memory whenever it detects that the needed footprint is within some margin of total available device memory. This concern is also a major reason why I’ve decided to stick with CUDA for now… until OpenCL gets support for a unified address space for pinned memory, I’m not wholly sure how I’m supposed to deal with memory overflow issues in OpenCL. I haven’t reexamine OpenCL in a little while now though, so perhaps it is time to take another look.

Unfortunately, something I discovered while in the process of extending my material system to support texture driven properties is that my renderer could probably use a bit of refactoring for the sake of organization and readability. Since I now have some time over winter break and am planning on making my Github repo for Takua-RT public soon, I’ll probably undertake a bit of code refactoring over the next few weeks.

Blurred Glossy Reflections

Over the past few months I haven’t been making as much progress on my renderer as I would have liked, mainly because another major project has been occupying most of my attention: TAing/restructuring the GPU Programming course here at Penn. I’ll probably write a post at the end of the semester with detailed thoughts and comments about that later. More on that later!

I recently had a bit of extra time, which I used to tackle a piece of super low hanging fruit: blurred glossy reflections. The simplest brute force approach blurred glossy reflections is to take the reflected ray direction from specular reflection, construct a lobe around that ray, and sample across the lobe instead of only along the reflected direction. The wider the lobe, the blurrier the glossy reflection gets. The following diagram, borrowed from Wikipedia, illustrates this property:

In a normal raytracer or rasterization based renderer, blurred glossy reflections require something of a compromise between speed and visual quality (much like many other effects!), since using a large number of samples within the glossy specular lobe to achieve a high quality reflection can be prohibitively expensive. This cost-quality tradeoff is therefore similar to the tradeoffs that must be made in any distributed raytracing effect. However, in a pathtracer, we’re already using a massive number of samples, so we can fold the blurred glossy reflection work into our existing high sample count. In a GPU renderer, we have massive amounts of compute as well, making blurred glossy reflections far more tractable than in a traditional system.

The image at the top of this post shows three spheres of varying gloss amounts in a modified Cornell box with a glossy floor and reflective colored walls, rendered entirely inside of Takua-RT. Glossy to glossy light transport is an especially inefficient scenario to resolve in pathtracing, but throwing brute force GPU compute at it allows for arriving at a good solution reasonably quickly: the above image took around a minute to render at 800x800 resolution. Here is another test of blurred glossy reflections, this time in a standard diffuse Cornell box:

…and some tests showing varying degrees of gloss, within a modified Cornell box with glossy left and right walls. Needless to say, all of these images were also rendered entirely inside of Takua-RT.

Full specular reflection

Approximately 10% blurred glossy reflection

Approximately 30% blurred glossy reflection

Finally, here’s another version of the first image in this post, but with the camera in the wrong place. You can see a bit of the stand-in sky I have right now. I’m working on a sun & sky system right now, but since its not ready yet, I have a simple gradient serving as a stand-in right now. I’ll post more about sun & sky when I’m closer to finishing with it… I’m not doing anything fancy like Peter Kutz is doing (his sky renderer blog is definitely worth checking out, by the way), just standard Preetham et. al. style.