Code & Visuals

Photography Show at Disney Animation

May 11, 2025 | Tags: Art, Photography

The inside of Disney Animation’s Burbank building is basically one gigantic museum-quality art gallery that happens to have an animation studio embedded within, and one really cool thing that the studio does from time to time is to put on an internal art show with work from various Disney Animation employees. The latest show is a photography show, and I got to be a part of it and show some of my photos! The show, titled HAVE CAMERA, WILL TRAVEL, was coordinated and designed by the amazing Justin Hilden from Disney Animation’s legendary Animation Research Library, and features work from seven Disney Animation photographers: Alisha Andrews, Rehan Butt, Joel Dagang, Brian Gaugler, Ashley Lam, Madison Kennaugh, and myself. My peers in the show are all incredible photographers whose work I find really inspiring; I encourage checking out their photography work online! The show will be up inside of Disney Animation’s Burbank studio for several months.

Ever since my dad gave my brother and me a camera when I was in high school, photography has been a major hobby of mine. Today I have several cameras, a bunch of weird and fun and interesting lenses that I have collected over the years, and I take a lot of photos every year (which has only ramped up even more after I became a dad myself). However, I rarely, if ever, post or share my photos publicly; for me, my photography hobby is purely for myself and my close friends and family. Participating in a photography show was a bit of a leap of faith for me, even within the restricted domain of inside of my workplace instead of in the general public. I think I’m a passable photographer at this point, but certainly nowhere near amazing. However, one advantage of having taken tens of thousands of photos over the past 15 years is that even if only a tiny percentage of my photos are good enough to show, a tiny percentage of tens of thousands is still enough to pull together a small collection to show.

I thought I’d share the photos I have in the show here on my blog as well. There isn’t really a coherent theme; these are just photos I’ve taken that I liked from the past several years. Some are travel photos, some are of my family, and others are just interesting moments that I noticed. I won’t go into my photography and editing process and whatnot here; I’ll save that for a future post.

I color grade my photos for both SDR and HDR; if you are using a device/browser that supports HDR¹ , give the “Enable HDR” toggle below a try! If your device/browser doesn’t support HDR for this site, a warning message will be displayed below; if there’s no warning message, then that means your device/browser supports HDR for this site and the HDR toggle will work correctly for you.

I wrote a small artist’s statement for the show:

To me, a camera is actually a time machine. Taking photos gives me a way to connect back to moments and places in the past; for this reason I take a lot of photos mostly for my own memory, and every once in a rare while one of them is actually good enough to show other people!

I shoot with whatever camera I happen to have on me at the moment. Sometimes it’s a big fancy DSLR, sometimes it’s the phone in my pocket, sometimes it’s something in between. I learned a long time ago that the best camera is just whatever one is in reach at the moment.

Thanks to Harmony for her patience every time I fumbled a lens in my backpack.

Here are my photos from the show, presented in no particular order:

Enable HDR:

Los Angeles, California | Nikon Z8 | Smena Lomo T-43 40mm ƒ/4 | Display Mode: SDR

Denver, Colorado | Nikon Z8 | Zeiss Planar T* 50mm ƒ/1.4 C/Y | Display Mode: SDR

Mammoth, California | iPhone 14 Pro | Telephoto Lens 77mm ƒ/2.8 | Display Mode: SDR

Burbank, California | Nikon Z8 | Zeiss Kipronar 105mm ƒ/1.9 | Display Mode: SDR

Shanghai, China | Nikon Z8 | Nikon Nikkor Z 24-120mm ƒ/4 S | Display Mode: SDR

Burbank, California | Nikon Z8 | Asahi Pentax Super-Takumar 50mm ƒ/1.4 | Display Mode: SDR

Philadelphia, Pennsylvania | iPhone 5s | Main Lens 29mm ƒ/2.2 | Display Mode: SDR

Shanghai, China | Nikon D5100 | Nikon AF-S DX NIkkor 18-55mm ƒ/3.5-5.6 | Display Mode: SDR

Burbank, California | Nikon Z8 | Nikon Nikkor Z 24-120mm ƒ/4 S | Display Mode: SDR

Additionally, there were a few photos that I had originally picked out for the show but didn’t make the cut in the end due to limited wall space. I thought I’d include them here as well:

Hualien, Taiwan | Nikon Z8 | Nikon Nikkor Z 24-120mm ƒ/4 S | Display Mode: SDR

Burbank, California | Fujifilm X-M1 | Fujifilm Fujinon XF 27mm ƒ/2.8 | Display Mode: SDR

Los Angeles, California | iPhone 5s | Main Lens 29mm ƒ/2.2 | Display Mode: SDR

Here’s some additional commentary for each of the photos, presented in the same order that the photos are in:

The south hall of the Los Angeles Convention Center, taken while walking between sessions at a past SIGGRAPH.
My wife, Harmony Li, at Meow Wolf’s Convergence Station art installation. The lens flares were a total happy accident.
The Panorama Gondola disappearing into a quickly descending blizzard near the top of Mammoth, taken while we were getting off of the mountain as quickly as we could. It doesn’t look like it, but this is actually a color photograph.
Our then-four-month-old daughter hanging out with her grandparents in our backyard. This was the day she held a flower for the first time.
Someone taking a photo from inside of Shanghai’s Museum of Art Pudong. I wonder if I’m in his photo too.
Our half border collie / half golden retriever, Tux, in a Santa hat for a Christmas shoot. I think my wife actually took this one, but she insisted that I include it in the show.
My then-girlfriend now-wife shooting a video project when we were in university. This was in Penn’s Singh Center for Nanotechnology building.
A worker hanging a chandelier in Shanghai’s 1933 Laoyangfang complex. This place used to be a municipal slaughterhouse but now contains creative spaces.
The Los Angeles skyline, as seen from the Stough Canyon trail above Burbank. The tiny dot in the center of the frame is actually a plane on landing approach to LAX.
My friend Alex stopping to take in the waves as a storm was approaching the eastern coast of Taiwan.
Looking past the Roy O. Disney building towards the Team Disney headquarters building on Disney’s Burbank studio lot.
A past SIGGRAPH party somewhere in the fashion district in downtown Los Angeles.

Finally, here’s a few snapshots of what the show looks like, towards the end of the show’s opening. The opening had a great turnout; thanks to everyone that came by!

Justin's awesome logo for the show. | Display Mode: SDR

Crowds dying down towards the end of the show's opening. | Display Mode: SDR

The gallery hallway looking in the other direction. | Display Mode: SDR

My pieces framed and on the wall. | Display Mode: SDR

Footnotes

¹ At time of posting, this post’s HDR mode makes use of browser HDR video support to display HDR pictures as single-frame HDR videos, since no browser has HDR image support enabled by default yet. The following devices/browsers are known to support HDR videos by default:

Safari on iOS 14 or newer, running on the iPhone 12 generation or newer, and on iPhone 11 Pro.
Safari on iPadOS 14 or newer, running on the 12 inch iPad Pros with M1 or M2 chip, and on all iPad Pros with M4 chip or newer.
Safari or Chrome 87 or newer on macOS Big Sur or newer, running on the 2021 14 and 16 inch MacBook Pros or newer, or any Mac using a Pro Display XDR or other compatible HDR monitor.
Chrome 87 or newer, or Edge 87 or newer, on Windows 10 or newer, running on any PC with a compatible DisplayHDR-1000 or higher display (examples: any monitor on this list). You may also need to adjust HDR settings in Windows.
Chrome 87 or newer on Android 14 or newer, running on devices with an Android Ultra HDR compatible display (examples: Google Pixel 8 generation or newer, Samsung Galaxy S21 generation or newer, OnePlus 12 or newer, and various others).

On Apple devices without HDR-capable displays, iOS and macOS’s EDR system may still allow HDR imagery to look correct under specific circumstances. keyboard_return

New Unified Site Design

May 06, 2025 | Tags: Website, Announcements

Over the past month or so, I’ve undertaken another overhaul of my blog and website, this time to address a bunch of niggling things that have annoyed me for a long time. In terms of pure technical change, this round’s changes are not as extensive as the ones I had to make to implement a responsive layout a few years ago. Most of this round was polishing and tweaking and refining things, but enough things were touched that in the aggregate this set of change represents the largest number of visual updates to the site in a long time. Broadly things still look similar to before, but everything is a little bit tighter and more coherent and the details are sweated over just a little bit more. The biggest change this round of updates brings is that the blog and portfolio halves of my site now have a completely unified design, and both halves are now stiched together into one cohesive site instead of feeling and working like two separate sites. So, in the grand tradition of writing about making one’s website on one’s own website, here’s an overview of what’s changed and how I approached building the new unified design.

One unusual quirk of my site is that the portfolio half of the site and the blog half of the site run on completely different tech stacks. Both halves of the site are fundamentally based on static site generators, but pretty much everything under the hood is different, down to the very servers they are hosted on. The blog is built using Jekyll and served from Github Pages, fronted using Cloudflare. The portfolio, meanwhile, is built using a custom minimal static site generator called OmjiiCMS. When I say minimal, I really do mean minimal- OmjiiCMS is essentially just a fancy script that takes in hand-written HTML files containing the raw content of each page and simply glues on the sitewide header, footer, and nav menu. Calling it a CMS is a misnomer because it really doesn’t do any content management at all- the name is a holdover from back when my personal site and blog both ran on a custom PHP-based content management and publishing system that I wrote in high school. I eventually moved my blog to Wordpress briefly, which I found far too complicated for what I needed, and then landed on Blogger for a few years, and then in 2013 I moved to Ghost for approximately one week because Ghost had good Markdown support before I realized that if I wanted to write Markdown files, I should just use Jekyll. The blog has been powered by Jekyll ever since. As a bonus, moving to a static site generator made everything both way faster and way easier. Meanwhile, the portfolio part of the site has always been a completely custom thing because the portfolio site has a lot of specific custom layouts and I always found that building those layouts by hand was easier and simpler than trying to hammer some pre-existing framework into the shape I wanted. Over time I stripped away more and more of the underlying CMS until I realized I didn’t need one at all, at which point I gutted the entire CMS and made the portfolio site just a bunch of hand-written HTML files with a simple script to apply the site’s theming to every page before uploading to my web server. This dual-stack setup has stuck for a long time now because at least for me it allows me to run a blog and personal website with a minimal amount of fuss; the goal is to spend far more time actually writing posts than mucking around with the site’s underlying tech stack.

However, one unfortunate net result of these two different evolutionary paths is that while I have always aimed to make the blog and portfolio sites look similar, they’ve always looked kind of different from each other, sometimes in strange ways. The blog and portfolio have always had different header bars and navigation menus, even if the overall style of the header was similar. Both parts of the site always used the same typefaces, but in different places for different things, with completely inconsistent letter spacing, sizing, line heights, and more. Captions have always worked differently between the two parts of the site as well. Even the responsive layout system worked differently between the blog and portfolio, with layout changes happening at different window widths and different margins and paddings taking effect at the same window widths between the two. These differences have always bothered me, and about a month ago they finally bothered me enough for me to do something about it and finally undertake the effort of visually aligning and unifying both sites, down to the smallest details. Before breaking things down, here’s some before and afters:

Figure 1: Main site home page, before (left) and after (right) applying the new unified theme. For a full screen comparison, click here.

Figure 2: Blog front page, before (left) and after (right) applying the new unified theme. For a full screen comparison, click here.

The process I took to unify the designs for the two halves was to start from near scratch on new CSS files and rebuild the original look of both halves as closely as possible, while resolving differences one by one. The end result is that the blog didn’t just wholesale take on the details of the portfolio, or vice versa- instead, wherever differences arose, I thought about what I wanted the design to accomplish and decided on what to do from there. All of this was pretty easy to do because despite running on different tech stacks, both parts of the site were built using as much adherence to semantic HTML as possible, with all styling provided by two CSS files; one for each half. To me, a single CSS file containing all styling separate from the HTML is the obvious way to build web stuff and is how I learned to do CSS over a decade ago from the CSS Zen Garden, but apparently a bunch of popular alternative methods exist today such as Tailwind, which directly embeds CSS snippets in the HTML markup. I don’t know a whole lot about what the cool web kids do today, but Tailwind seems completely insane to me; if I had built my site with CSS snippets scattered throughout the HTML markup, then this unifying project would have taken absolute ages to complete instead of just a few hours spread over a weekend or two. Instead, this project was easy to do because all I had to do was make new CSS files for both parts of the site and I barely had to touch the HTML at all, aside from an extra wrapper div or two.

The general philosophy of this site’s design has always been to put content first and keep things information dense, all with a modern look and presentation. The last big revision of the site added responsive design as a major element and also pared back some unneeded flourishes with the goal of keeping the site lightweight. For the new unified design, I wanted to keep all of the above and also lean more into a lightweight site and improve general readability and clarity, all while keeping the site true to its preexisting design.

Here’s the list of what went into the new unified design:

Previously the blog’s body text was fairly dense and had very little spacing between lines, while the portfolio’s body text was slightly too large and too spaced out. The unified design now defines a single body text style with a font size somewhere in between what the two halves previously had, and with line spacing that grants the text a bit more room to breathe visually for improved readability while still maintaining relatively high density.
Page titles, section headings, and so on now use the same font size, color, letter spacing, margins, etc. between both halves.
I experimented with some different typefaces, but in the end I still like what I had before, which is Proxima Nova for easy-to-read body text and Futura for titles, section headings, etc; previously how these two typefaces were applied was inconsistent though, and the new unified design makes all of this uniform.
Code and monospaced text is now typeset in Berkeley Mono by US Graphics Company.
Image caption styles are now the same across the entire site and now do a neat trick where if they fit on a single line, they are center aligned, but as soon as the caption spills over onto more than one line, the caption becomes left aligned. While the image caption system does use some simple Javascript to set up, the line count dependent alignment trick is pure CSS. Here is a comparison:

Figure 3: Image caption, before (left) and after (right) applying the new unified theme. Before, captions were always center aligned, whereas now, captions are center aligned if they fit on one line but automatically become left aligned if they spill onto more than one line. For a full screen comparison, click here.

The blog now uses red as its accent color, to match the portfolio site. The old blue accent color was a holdover from when the blog’s theme was first derived from what is now a super old variant of Ghost’s Casper theme.
Links now are underlined on hover for better visibility.
Both sites now share an identical header and navigation bar. Previously the portfolio and blog had different wordmarks and had different navigation items; they now share the same “Code & Visuals” wordmark and same navigation items.
As part of unifying the top navigation bars, the blog’s Atom feed is no longer part of the top navigation but instead is linked to from the blog archive and is in the site’s new footer.
The site now has a footer, which I found useful for delineating the end of pages. The new footer has a minimal amount of information in it: just copyright notice, a link to the site’s colophon, and the Atom feed. The footer always stays at the bottom of the page, unless the page is smaller than the current browser window size, in which case the footer floats at the bottom of the browser window, and the neat thing is that this is implemented entirely using CSS with no Javascript.
Responsive layouts now kick in at the same window widths for both parts of the site, and the margins and various text size changes applied for responsive layouts are the same between both halves as well. As a result, the site now looks the same across both halves at all responsive layout widths across all devices.
All analytics and tracking code has been completely removed from both halves of the site.
The “About” section of the site has been reorganized with several informational slash pages. Navigation between the various subpages of the About section is integrated into the page headings.
The “Projects” section of the site used to just be one giant list of projects; this list is now reorganized into subpages for easier navigation, and navigation is also integrated into the Project section’s page headings.
Footnotes and full screen image comparison pages now include backlinks to where they were linked to from main body text.
Long posts with multiple subsections now include a table of contents at the beginning.

Two big influences on how I’ve approached building and designing my site over the past few years have been Tom Macwright’s site and Craig Mod’s site. From Tom Macwright’s site, I love the ultra-brutalist and super lightweight design, and I also like his site navigation, choice of sections, and slash pages. From Craig Mod’s site, I really admire the typography and how the site presents his various extensive writings with excellent readability and beautiful layouts. My site doesn’t really resemble those two sites at all visually (and I wouldn’t want it to; I like my own thing!), but I drew a lot of influence from both of those sites when it comes to how I thought about an overall approach to design. In addition to the two sites mentioned above, I regularly draw inspiration from a whole bunch of other sites and collections of online work; I keep an ongoing list on my about page if you’re curious.

Hee’s a brief overview of how the portfolio half of the site has changed over the years. The earliest 2011 version was just a temporary site I threw together while I was applying to the Pixar Undergraduate Program internship (and it worked!); in some ways I kind of miss the ultra-brutalist utilitarian design of this version. I actually still keep that old version around for fun. The 2013 version was the first version of the overall design that continues to this day, but was really heavy-handed with both a header and footer that hovered in place when scrolling. The 2014 version consolidated the header and footer into just a single header that still hovered in place but shrunk down when scrolling. The 2017 version added dual-column layouts to the home page and project pages, and the 2018 version cleaned up a bunch of details. The 2021 version was a complete rebuild that introduced responsive design, and the 2022 version was a minor iteration that added things like a image carousel to the home page. The latest version rounds out the evolutionary history up to now:

Meanwhile, the blog has actually seen less change overall. Unfortuantely I don’t have any screenshots or a working version of the codebase for the pre-2011 version of the blog anymore, but by the 2011 version the blog was on Blogger with a custom theme that I spent forever fighting against Blogger’s theming system to implement; that custom theme is actually the origin of my site’s entire look. The 2013 version was a wholesale port to Jekyll and as part of the port I built a new Jekyll theme that carried over much of the previous design. The 2014 version of the blog added an archive page and Atom feed, and then the blog more or less stayed untouched until the 2021 version’s responsve design overhaul. This latest version is the largest overhaul the blog has seen in a very long time:

I’m pretty happy with how the new unified design turned out; both halves of the site now feel like one integrated, cohesive whole, and the fact that the two halves of the site run different tech stacks on different webservers is no longer made obvious to visitors and readers. I named the new unified site theme Einheitsgrafik, which translates roughly to “uniform graphic” or “standard graphic”, which I think is fitting. With this iteration, there are no longer any major things that annoy me every time I visit the site to double check things; hopefully that means that the site is also a better experience for visitors and readers now. I think that this particular iteration of the site is going to last a very long time!

Moana 2

December 18, 2024 | Tags: Films

This fall marked the release of Moana 2, Walt Disney Animation’s 63rd animated feature and the 10th feature film rendered entirely using Disney’s Hyperion Renderer. Moana 2 brings us back to the beautiful world of Moana, but this time on a larger adventure with a larger canoe, a crew to join our heroine, bigger songs, and greater stakes. The first Moana was at the time of its release one of the most beautiful animated films ever made, and Moana 2 lives up to that visual legacy with frames that match or often surpass what we did in the original movie. I got to join Moana 2 about two years ago and this film proved to be an incredibly interesting project!

While we’ve now used Disney’s Hyperion Renderer to make several sequels to previous Disney Animation films, Moana 2 is the first time we’ve used Hyperion to make a sequel to a previous film that also used Hyperion. From a technical perspective, the time between the first and second Moana films is filled with almost a decade of continual advancement in our rendering technology and in our wider production pipeline. At the time that we made the first Moana, Hyperion was only a few years old and we spent a lot of time on the first Moana fleshing out various still-underdeveloped features and systems in the renderer. Going into the second Moana, Hyperion is now an extremely mature, extremely feature rich, battle-tested production renderer with which we can make essentially anything we can imagine. Almost every single feature and system in Hyperion today has seen enormous advancement and improvement over what we had on the first Moana; many of these advancements were in fact driven by hard lessons that we learned on the first Moana! Compared with the first Moana, here’s a short, very incomplete laundry list of improvements made over the past decade that we were able to leverage on Moana 2:

Moana 2 uses a completely new water rendering system that represents an enormous leap in both render-time efficiency and easier artist workflows compared with what we used on the first Moana; more on this later in this post.
After the first Moana, we completely rewrote Hyperion’s previous volume rendering subsystem [Habel 2017] from scratch; the modern system is a state-of-the-art delta-tracking system that required us to make foundational research advancements in order to implement [Kutz et al. 2017, Huang et al. 2021].
Our traversal system was completely rewritten to better handle thread scalability and to incorporate a form of rebraiding to efficiently handle gigantic world-spanning geometry; this was inspired directly by problems we had rendering the huge ocean surfaces and huge islands in the first Moana [Burley et al. 2018].
On the original Moana, ray self-intersection with things like Maui’s feathers presented a major challenge; Moana 2 is the first film using our latest ray self-intersection prevention system that notably does away with any form of ray bias values.
We introduced a limited form of photon mapping on the first Moana that only worked between the sun and water surfaces [Burley et al. 2018].; Moana 2 uses an evolved version of our photon mapper that supports all of our light types, many or our standard lighting features, and even has advanced capabilities like a form of spectral dispersion.
We’ve made a number of advancements [Burley et al. 2017, Chiang et al. 2016, Chiang at al. 2019, Zeltner et al. 2022] to various elements of the Disney BSDF shading model.
Subsurface scattering on the first Moana was done using normalized diffusion; since then we’ve moved all subsurface scattering to use a state-of-the-art brute force path tracing approach [Chiang et al. 2016].
Eyes on the first Moana used our old ad-hoc eye shader; eyes on Moana 2 use our modern physically plausible eye shader that includes state-of-the-art iris caustics calculated using manifold next event estimation [Chiang & Burley 2018].
The emissive mesh importance sampling system that we implemented on the first Moana and our overall many-lights sampling system has seen many efficiency improvements [Li et al. 2024].
Hyperion has gained many more powerful features granting artists an enormous degree of artistic control both in the renderer and post-render in compositing [Burley 2019, Burley et al. 2024].
Since the first Moana, Hyperion’s subdivision/tessellation system has gained an advanced fractured mesh system that makes many of the huge-scale effects in the first Moana movie much easier for us to create today [Burley & Rodriguez 2022].
We’ve introduced path guiding into Hyperion to handle particularly difficult light transport cases [Müller et al. 2017, Müller 2019].
The original Moana used our somewhat ad-hoc first-generation denoiser, while Moana 2 uses our best-in-industry, Academy Award winning¹ second-generation deep learning denoiser jointly developed by Disney Research Studios, Disney Animation, Pixar, and ILM [Vogels et al. 2018, Dahlberg et al. 2019].
Even Hyperion’s internal architecture has changed enormously; Hyperion originally was famous for being a batched wavefront renderer, but this has evolved significantly since then and continues to evolve.

There are many many more changes to Hyperion that there simply isn’t room to list here. To give you some sense of how far Hyperion has evolved between Moana and Moana 2: the Hyperion used on Moana was internally versioned as Hyperion 3.x; the Hyperion used on Moana 2 is internally versioned as Hyperion 16.x, with each version number in between representing major changes. In addition to improvements in Hyperion, our rendering team has also been working for the past few years on a next-generation interactive lighting system that extensively leverages hardware GPU ray tracing; Moana 2 saw the widest deployment yet of this system; I can’t say much more on this topic yet but hopefully we’ll have more to share soon.

Outside of the rendering group, literally everything else about our entire studio production pipeline has changed as well; the first Moana was made mostly on proprietary internal data formats, while Moana 2 was made using the latest iteration of our cutting-edge modern USD pipeline [Miller et al. 2022, Vo et al. 2023, Li et al. 2024]. The modern USD pipeline has granted our pipeline many amazing new capabilities and far more flexibility, to the point where it became possible to move our entire lighting workflow to a new DCC for Moana 2 without needing to blow up the entire pipeline. Our next-generation interactive lighting system is similarly made possible by our modern USD pipeline. I’m sure we’ll have much more about this at SIGGRAPH!

While I get to work on every one of our feature films and get to do fun and interesting things every time, Moana 2 is the most direct and deep I’ve worked on one of our films probably since the original Moana. There are two specific projects I worked on for Moana 2 that I am particularly proud of: a completely new water rendering system that is part of Moana 2’s overall new water FX workflow, and the volume rendering work that was done for the storm battle in the movie’s third act.

On the original Moana, we had to develop a lot of custom water simulation and rendering technology because commercial tools at the time couldn’t quite handle what the movie required. On the simulation side, the original Moana required Disney Animation to invent new techniques such as the APIC (affine particle-in-cell) fluid simulation model [Jiang et al. 2015] and the FAB (fluxed animated boundary) method for integrating procedural and simulated water dynamics [Stomakhin and Selle 2017]. Disney Animation’s general philosophy towards R&D is that we will develop and invent new methods when needed, but will then aim to publish our work with the goal of allowing anything useful we invent to find its way into the wider graphics field and industry; a great outcome is when our publications are adopted by the commercial tools and packages that we build on top of. APIC and FAB were both published and have since become a part of the stock toolset in Houdini, which in turn allowed us to build more on top of Houdini’s built-in SOPs for Moana 2’s water FX workflow.

On the rendering side, the main challenge on the original Moana for rendering water was the need to combine water surfaces from many different sources (procedural, manually animated, and simulated) into a single seamless surface that could be rendered with proper refraction, internal volumetric effects, caustics, and so on. Our solution to combine different water surfaces on the original Moana was to convert all input water elements into signed distance fields, composite all of the signed distance fields together into a single world-spanning levelset, and then mesh that levelset into a triangle mesh for ray intersection [Palmer et al. 2017]. While this process produced great visual results, running this entire world-spanning levelset compositing and meshing operation at renderer startup for each frame proved to be completely untenable due to how slow it made interaction for artists, so an extensive system for pre-caching ocean surfaces overnight to disk had to be built out. All in all, the water rendering and caching system on the first Moana required a dedicated team of over half a dozen developers and TDs to maintain, and setting up the levelset compositing system correctly proved to be challenging for artists.

For Moana 2, we decided to revisit water rendering with the goal of coming up with something much easier for artists to use, much faster to render, and much easier to maintain by a smaller group of engineers and TDs. Over the course of about half a year, we completely replaced the old levelset compositing and meshing system with a new ray-intersection-time CSG system. Our new system requires almost zero work for artists to set up, requires zero preprocessing time before renderer startup and zero on-disk caching, renders with negligible impact on ray tracing speed, and required zero dedicated TDs and only part of my time as an engineer to support once primary development was completed. In addition to all of the above, the new system also allows for both better looking and more memory efficient water because the level of detail that water meshes have to exist at is no longer constrained by the resolution of a world-size meshed levelset, even with an adaptive levelset meshing. I think this was a great example where by revisiting a world that we already knew how to make, we were given an opportunity to reevaluate what we learned on Moana in order to come up with something better by every metric for Moana 2.

We knew that returning to the world of Moana was likely going to require a heavy lift from a volume rendering perspective. With a mind towards this, we worked closely with Disney Research Studios in Zürich to implement next-generation volume path guiding techniques in Hyperion, which wound up not seeing wide deployment this time but nonetheless proved to be a fun and interesting project from which we learned a lot. We also realized that the third act’s storm battle was going to be incredibly challenging from both an FX and rendering perspective. My last few months on Moana 2 were spent helping get the storm battle sequences finished; one extremely unusual thing we wound up doing was providing custom builds of Hyperion with specific optimizations tailored to the specific requirements of the storm sequence, sometimes going as far as to provide specific builds and settings tailored on a per-shot basis. Normally this is something any production rendering team tries to avoid if possible, but one of the benefits of having our own in-house team and our own in-house renderer is that we are still able to do this when the need arises. From a personal perspective, being able to point at specific shots and say “I wrote code for that specific thing” is pretty neat!

From both a story and a technical perspective, Moana 2 is everything we loved from Moana brought back, plus a lot of fun, big, bold new stuff. Making Moana 2 both gave us new challenges to solve and allowed us to revisit and come up with better solutions to old challenges from Moana. I’m incredibly proud of the work that my teammates and I were able to do on Moana 2; I’m sure we’ll have a lot more to share at SIGGRAPH 2025, and in the meantime I strongly encourage you to see Moana 2 on the biggest screen you can find!

To give you a taste of how beautiful this film looks, here are some frames from Moana 2, 100% created using Disney’s Hyperion Renderer by our amazing artists. These are presented in no particular order:

All images in this post are courtesy of and the property of Walt Disney Animation Studios.

References

Brent Burley, David Adler, Matt Jen-Yuan Chiang, Ralf Habel, Patrick Kelly, Peter Kutz, Yining Karl Li, and Daniel Teece. 2017. Recent Advancements in Disney’s Hyperion Renderer. In ACM SIGGRAPH 2017 Course Notes: Path Tracing in Production Part 1. 26-34.

Brent Burley, David Adler, Matt Jen-Yuan Chiang, Hank Driskill, Ralf Habel, Patrick Kelly, Peter Kutz, Yining Karl Li, and Daniel Teece. 2018. The Design and Evolution of Disney’s Hyperion Renderer. ACM Transactions on Graphics. 37, 3 (2018), 33:1-33:22.

Brent Burley. 2019. On Histogram-Preserving Blending for Randomized Texture Tiling. Journal of Computer Graphics Techniques, 8, 4 (2019), 31-53.

Brent Burley and Francisco Rodriguez. 2022. Fracture-Aware Tessellation of Subdivision Surfaces. In ACM SIGGRAPH 2022 Talks, 10:1-10:2.

Brent Burley, Brian Green, and Daniel Teece. 2024. Dynamic Screen Space Textures for Coherent Stylization. In ACM SIGGRAPH 2024 Talks, 50:1-50:2.

Matt Jen-Yuan Chiang, Peter Kutz, and Brent Burley. 2016. Practical and Controllable Subsurface Scattering for Production Path Tracing. In ACM SIGGRAPH 2016 Talks. 49:1-49:2.

Matt Jen-Yuan Chiang and Brent Burley. 2018. Plausible Iris Caustics and Limbal Arc Rendering. In ACM SIGGRAPH 2018 Talks, 15:1-15:2.

Matt Jen-Yuan Chiang, Yining Karl Li, and Brent Burley. 2019. Taming the Shadow Terminator. In ACM SIGGRAPH 2019 Talks. 71:1-71:2.

Henrik Dahlberg, David Adler, and Jeremy Newlin. 2019. Machine-Learning Denoising in Feature Film Production. In ACM SIGGRAPH 2019 Talks. 21:1-21:2.

Ralf Habel. 2017. Volume Rendering in Hyperion. In ACM SIGGRAPH 2017 Course Notes: Production Volume Rendering. 91-96.

Wei-Feng Wayne Huang, Peter Kutz, Yining Karl Li, and Matt Jen-Yuan Chiang. 2021. Unbiased Emission and Scattering Importance Sampling for Heterogeneous Volumes. In ACM SIGGRAPH 2021 Talks. 3:1-3:2.

Chenfafu Jiang, Craig Schroeder, Andrew Selle, Joseph Teran, and Alexey Stomakhin. 2015. The Affine Particle-in-Cell Method. ACM Transactions on Graphics. 34, 4 (2015), 51:1-51:10.

Peter Kutz, Ralf Habel, Yining Karl Li, and Jan Novák. 2017. Spectral and Decomposition Tracking for Rendering Heterogeneous Volumes. ACM Transactions on Graphics. 36, 4 (2017), 111:1-111:16.

Harmony M. Li, George Rieckenberg, Neelima Karanam, Emily Vo, and Kelsey Hurley. 2024. Optimizing Assets for Authoring and Consumption in USD. In ACM SIGGRAPH 2024 Talks. 30:1-30:2.

Yining Karl Li, Charlotte Zhu, Gregory Nichols, Peter Kutz, Wei-Feng Wayne Huang, David Adler, Brent Burley, and Daniel Teece. 2024. Cache Points for Production-Scale Occlusion-Aware Many-Lights Sampling and Volumetric Scattering. In DigiPro 2024. 6:1-6:19.

Tad Miller, Harmony M. Li, Neelima Karanam, Nadim Sinno, and Todd Scopio. 2022. Making Encanto with USD: Rebuilding a Production Pipeline Working from Home. In ACM SIGGRAPH 2022 Talks. 12:1-12:2.

Thomas Müller. 2019. Practical Path Guiding in Production. In ACM SIGGRAPH 2019 Course Notes: Path Guiding in Production. 37-50.

Thomas Müller, Markus Gross, and Jan Novák. 2017. Practical Path Guiding for Efficient Light-Transport Simulation. Computer Graphics Forum. 36, 4 (2017), 91-100.

Sean Palmer, Jonathan Garcia, Sara Drakeley, Patrick Kelly, and Ralf Habel. 2017. The Ocean and Water Pipeline of Disney’s Moana. In ACM SIGGRAPH 2017 Talks. 29:1-29:2.

Alexey Stomakhin and Andy Selle. 2017. Fluxed Animated Boundary Method. ACM Transactions on Graphics. 36, 4 (2017), 68:1-68:8.

Emily Vo, George Rieckenberg, and Ernest Petti. 2023. Honing USD: Lessons Learned and Workflow Enhancements at Walt Disney Animation Studios. In ACM SIGGRAPH 2023 Talks. 13:1-13:2.

Thijs Vogels, Fabrice Rousselle, Brian McWilliams, Gerhard Röthlin, Alex Harvill, David Adler, Mark Meyer, and Jan Novák. 2018. Denoising with Kernel Prediction and Asymmetric Loss Functions. ACM Transactions on Graphics. 37, 4 (2018), 124:1-124:15.

Tizian Zeltner, Brent Burley, and Matt Jen-Yuan Chiang. 2022. Practical Multiple-Scattering Sheen Using Linearly Transformed Cosines. In ACM SIGGRAPH 2022 Talks. 7:1-7:2.

Footnotes

¹ Our deep learning denoiser technology is one of the 2025 Academy of Motion Picture Arts and Sciences Scientific and Engineering Award winners. keyboard_return

DigiPro 2024 Paper- Cache Points For Production-Scale Occlusion-Aware Many-Lights Sampling And Volumetric Scattering

July 30, 2024 | Tags: Hyperion, Light Transport, DigiPro

This year at DigiPro 2024, we had a conference paper that presents a deep dive into Hyperion’s unique solution to the many-light sampling problem; we call this system “cache points”. DigiPro is one of my favorite computer graphics conferences precisely because of the emphasis the conference places on sharing how ideas work in the real world of production, and with this paper we’ve tried to combine a more traditional academic theory paper with DigiPro’s production-forward mindset. Instead of presenting some new thing that we’ve recently come up with and have maybe only used on one or two productions so far, this paper presents something that we’ve now actually had in the renderer and evolved for over a decade, and along with the core technique, the paper also goes into lessons we’ve learned from over a decade of production experience.

Here is the paper abstract:

A hallmark capability that defines a renderer as a production renderer is the ability to scale to handle scenes with extreme complexity, including complex illumination cast by a vast number of light sources. In this paper, we present Cache Points, the system used by Disney’s Hyperion Renderer to perform efficient unbiased importance sampling of direct illumination in scenes containing up to millions of light sources. Our cache points system includes a number of novel features. We build a spatial data structure over points that light sampling will occur from instead of over the lights themselves. We do online learning of occlusion and factor this into our importance sampling distribution. We also accelerate sampling in difficult volume scattering cases.

Over the past decade, our cache points system has seen extensive production usage on every feature film and animated short produced by Walt Disney Animation Studios, enabling artists to design lighting environments without concern for complexity. In this paper, we will survey how the cache points system is built, works, impacts production lighting and artist workflows, and factors into the future of production rendering at Disney Animation.

The paper and related materials can be found at:

One extremely important thing that I tried to get across in the acknowledgements section of the paper and presentation and that I want to really emphasize here is: although I’m the lead author of this paper, I am not at all the lead developer or primary inventor of the cache points system. Over the past decade, many developers have since contributed to the system and the system has evolved significantly, but the core of cache points system was originally invented by Gregory Nichols and Peter Kutz, and the volume scattering extensions were primarily developed by Wei-Feng Wayne Huang. Since Greg, Peter, and Wayne are no longer at Disney Animation, Charlotte and I wound up spearheading the paper because we’re the developers who currently have the most experience working in the cache points system and therefore were in the best position to write about it.

The way this paper came about was somewhat circuitous and unplanned. This paper actually originated as a section in what was intended to have been a course at SIGGRAPH a few years ago on path guiding techniques, to have been presented by Intel’s graphics research group, Disney Research Studios, Disney Animation’s Hyperion team, WetaFX’s Manuka team, and Chaos Czech’s Corona team. However, because of scheduling and travel difficulties for several of the course presenters, the course wound up having to be withdrawn, and the material we had put together for presenting cache points got shelved. Then, as the DigiPro deadline started to approach this year, we were asked by higher ups in the studio if we had anything that could make a good DigiPro submission. After some thought, we realized that DigiPro was actually a great venue for presenting the cache points system because we could structure the paper and presentation as a combination of technical breakdown and production perspective from a decade’s worth of production usage. The final paper is a composed from three sources: a reworked version of what we had originally prepared for the abandoned course, a greatly expanded version of the material from our 2021 SIGGRAPH talk on our cache point based volume importance sampling techniques [Huang et al. 2021], and a bunch of new material consisting of production case studies and results on production scenes.

Overall I hope that the final paper is an interesting and useful read for anyone interested in light transport and production rendering, but I have to admit, I think that there are a couple of things I would have liked to rework and improve in the paper if we had more time. I think the largest missing piece from the paper is a direct head-to-head comparison with a light BVH approach [Estevez and Kulla 2018]; in the paper and presentation we discuss how our approach differs from light BVH approaches and why we chose our approach over a light BVH, but we don’t actually present any direct comparisons in the results. In the past we actually have more directly compared cache points to a light BVH implementation, but in the window we had to write this paper, we simply didn’t have enough time to resurrect that old test, bring it up to date with the latest code in the production renderer, and conduct a thorough performance comparison. Similarly, in the paper we mention that we actually implemented Vevoda et al. [2018]’s Bayesian online regression approach in Hyperion as a comparison point, but again, in the writing window for this paper, we just didn’t have time to put together a fair up-to-date performance comparison. I think that even without these comparisons our paper brings a lot of valuable information and insights (and evidently the paper referees agreed!), but I do think that the paper would be stronger had we found the time to include those direct comparisons. Hopefully at some point in the near future I can find time to do those direct comparisons as a followup and put out the results in a supplemental followup or something.

Another detail of the paper that sits in the back of my head for revisting is the fact that even though cache points provides correct unbiased results, a lot of the internal implementation details depend on essentially empirically derived properties. Nothing in cache points is totally arbitrary per se; in the paper we try to provide a strong rationale or justification for how we arrived upon each empirical property through logic and production experience. However, at least from an abstract mathematical perspective, the empirically derived stuff is nonetheless somewhat unsatisfying! On the other hand, however, in a great many ways this property is simply part of practical reality- what puts the production in production rendering.

A topic that I think would be a really interesting bit of future work is combining cache points with ReSTIR [Bitterli et al. 2020]. One of the interesting things we’ve found with ReSTIR is that in terms of absolute quality, ReSTIR generally can benefit significantly from higher quality initial input samples (as opposed to just uniform sampling), but the quality benefit is usually more than offset by the greatly increased cost of drawing better initial samples from something like a light BVH. Walking a light BVH on the GPU is a lot more computationally expensive than just drawing a uniform random number! One thought that I’ve had is that because cache points aren’t hierarchical, we could store them in a hash grid instead of a tree, allowing for fast constant-time lookups that might provide a better quality-vs-cost tradeoff that in turn might make use with ReSTIR feasible.

The presentation for this paper was an interesting challenge and a lot of fun to put together. Our paper is very much written with a core rendering audience in mind, but the presentation at the DigiPro conference had to be built for a more general audience because the audience at DigiPro includes a wide, diverse selection of people from all across computer graphics, animation, and VFX, with varying levels of technical background and varying levels of familiarity with rendering. The approach we took for the presentation was to keep things at a much higher level than the paper and try to convey the broad strokes of how cache points work and focus more on production results and lessons, while referring to the paper for the more nitty gritty details. We put a lot of work into including a lot of animations in the presentation to better illustrate how each step of cache points works; the way we used animations was directly inspired by Alexander Rath’s amazing SIGGRAPH 2023 presentation on Focal Path Guiding [Rath et al. 2023]. However, instead of building custom presentation software with a built-in 2D ray tracer like Alex did, I just made all of our animations the hard and dumb way in Keynote.

Another nice thing the presentation includes is a better visual presentation (and somewhat expanded version) of the paper’s results section. A recording of the presentation is available on both my project page for the paper and on the official Disney Animation website’s page for the paper. I am very grateful to Dayna Meltzer, Munira Tayabji, and Nick Cannon at Disney Animation for granting permission and making it possible for us to share the presentation recording publicly. The presentation is a bit on the long side (30 minutes), but hopefully is a useful and interesting watch!

References

Benedikt Bitterli, Chris Wyman, Matt Pharr, Peter Shirley, Aaron Lefohn, and Wojciech Jarosz. 2020. Spatiotemporal Reservoir Sampling for Real-Time Ray Tracing with Dynamic Direct Lighting. ACM Transactions on Graphics. 39, 4 (2020), 148:1-148:17.

Alejandro Conty Estevez and Christopher Kulla. 2018. Importance Sampling of Many Lights with Adaptive Tree Splitting. Proc. of the ACM on Computer Graphics and Interactive Techniques (Proc. of HPG). 1, 2 (2018), 25:1-25:17..

Alexander Rath, Ömercan Yazici, and Philipp Slusallek. 2023. Focus Path Guiding for Light Transport Simulation. In ACM SIGGRAPH 2023 Conference Proceedings. 30:1-30:10.

Petr Vévoda, Ivo Kondapaneni, and Jaroslav Křivánek. 2018. Bayesian Online Regression for Adaptive Direct Illumination Sampling. ACM Transactions on Graphics. 37, 4 (2018), 125:1-125:12.

Porting Takua Renderer to Windows on Arm

June 07, 2024 | Tags: Coding, Renderer

1. Introduction
2. OpenGL on arm64 Windows 11
3. Building Embree on arm64 Windows 11

4. Running x86-64 code on arm64 Windows 11
5. Conclusion
6. References

Introduction

A few years ago I ported Takua Renderer to build and run on arm64 systems. Porting to arm64 proved to be a major effort (see Parts 1, 2, 3, and 4) which wound up paying off in spades; I learned a lot, found and fixed various longstanding platform-specific bugs in the renderer, and wound up being perfectly timed for Apple transitioning the Mac to arm64-based Apple Silicon. As a result, for the past few years I have been routinely building and running Takua Renderer on arm64 Linux and macOS, in addition to building and runninng on x86-64 Linux/Mac/Windows. Even though I take somewhat of a Mac-first approach for personal projects since I daily drive macOS, I make a point of maintaining robust cross-platform support for Takua Renderer for reasons I wrote about in the first part of this series.

Up until recently though, my supported platforms list for Takua Renderer notably did not include Windows on Arm. There are two main reasons why I never ported Takua Renderer to build and run on Windows on Arm. The first reason is that Microsoft’s own support for Windows on Arm has up until recently been in a fairly nascent state. Windows RT added Arm support in 2012 but only for 32-bit processors, and Windows 10 added arm64 support in 2016 but lacked a lot of native applications and developer support; notably, Visual Studio didn’t gain native arm64 support until late in 2022. The second reason I never got around to adding Windows on Arm support is simply that I don’t have any Windows on Arm hardware sitting around and generally there just have not been many good Windows on Arm devices available in the market. However, with the advent of Qualcomm’s Oryon-based Snapdragon X SoCs and Microsoft’s push for a new generation of arm64 PCs using the Snapdragon X SoCs, all of the above finally seems to be changing. Microsoft also authorized arm64 editions of Windows 11 for use in virtual machines on Apple Silicon Macs at the beginning of this year. With Windows on Arm now clearly signaled as a major part of the future of Windows and clearly signaled as here to stay, and now that spinning up a Windows 11 on Arm VM is both formally supported and easy to do, a few weeks ago I finally got around to getting Takua Renderer up and running on native arm64 Windows 11.

Overall this process was very easy compared with my previous efforts to add support for arm64 Mac and Linux. This was not because porting architectures is easier on Windows but rather is a consequence of the fact that I had already solved all of the major architecture-related porting problems for Mac and Linux; the Windows 11 on Arm port just piggy-backed on those efforts. Because of how relatively straightforward this process was, this will be a shorter post, but there were a few interesting gotchas and details that I think are worth noting in case they’re useful to anyone else porting graphics stuff to Windows on Arm.

Note that everything in this post uses arm64 Windows 11 Pro 23H2 and Visual Studio 2022 17.10.x. Noting the specific versions used here is important since Microsoft is still actively fleshing out arm64 support in Windows 11 and Visual Studio 2022; later versions will likely see improvements to problems discussed in this post.

OpenGL on arm64 Windows 11

Takua has two user interface systems: a macOS-specific UI written using a combination of Dear Imgui, Metal, and AppKit, and a cross-platform UI written using a combination of Dear Imgui, OpenGL, and GLFW. On macOS, OpenGL is provided by the operating system itself as part of the standard system frameworks. On most desktop Linux distributions, OpenGL can be provided by several different sources: one option is entirely through the operating system’s provided Mesa graphics stack, another option is through a combination of Mesa for the graphics API and a proprietary driver for the backend hardware support, and the last option is entirely through a proprietary driver (such as with Nvidia’s official drivers). On Windows, however, the operating system does not provide modern OpenGL (“modern” meaning OpenGL 3.3 or newer), support whatsoever and the OpenGL 1.1 support that is available is a wrapper around Direct3D; modern OpenGL support on Windows has to be provided entirely by the graphics driver.

I don’t actually have any native arm64 Windows 11 hardware, so for this porting project, I ran arm64 Windows 11 as a virtual machine on two of my Apple Silicon Macs. I used the excellent UTM app (which under the hood uses QEMU) as the hypervisor. However, UTM does not provide any kind of GPU emulation/virtualization to Windows virtual machines, so the first problem I ran into was that my arm64 Windows 11 environment did not have any kind of modern OpenGL support due to the lack of a GPU driver with OpenGL. Therefore, I had no way to build and run Takua’s UI system.

Fortunately, because OpenGL is so widespread in commonly used applications and games, this is a problem that Microsoft has already anticipated and come up with a solution for. A few years ago, Microsoft developed and released an OpenGL/OpenCL Compatability Pack for Windows on Arm, and they’ve since also added Vulkan support to the compatability pack as well. The compatability pack is available for free on the Windows Store. Under the hood, the compatability pack uses a combination of Microsoft-developed client drivers and a bunch of components from Mesa to translate from OpenGL/OpenCL/Vulkan to Direct3D [Jiang 2020]. This system was originally developed to provide support for specifically Photoshop on arm64 Windows, but has since been expanded to provide general OpenGL 3.3, OpenCL 3.0, and Vulkan 1.2 support to all applications on arm64 Windows. Installing the compatability pack allowed me to get GLFW building and to get GLFW’s example demos working.

Takua’s cross-platform UI is capable of running either using OpenGL 4.5 on systems with support for the latest fanciest OpenGL API version, or using OpenGL 3.3 on systems that only have older OpenGL support (examples include macOS when not using the native Metal-based UI and include many SBC Linux devices such as Raspberry Pi). Since the arm64 Windows compatability pack only fully supports up to OpenGL 3.3, I set up Takua’s arm64 Windows build to fall back to only use the OpenGL 3.3 code path, which was enough to get things up and running. However, I immediately noticed that everything in the UI looked wrong; specifically, everything was clearly not in the correct color space.

The problem turned out to be that the Windows OpenGL/OpenCL/Vulkan compatability pack doesn’t seem to correctly implement GL_FRAMEBUFFER_SRGB; calling glEnable(GL_FRAMEBUFFER_SRGB) did not have any impact on the actual color space that the framebuffer rendered with. To work around this problem, I simply added software sRGB emulation to the output fragment shader and added some code to detect if GL_FRAMEBUFFER_SRGB was working or not and if not, fall back to the fragment shader’s implementation. Implementing the sRGB transform is extremely easy and is something that every graphics programmer inevitably ends up doing a bunch of times throughout one’s career:

float sRGB(float x) {
    if (x <= 0.00031308)
        return 12.92 * x;
    else
        return 1.055*pow(x,(1.0 / 2.4) ) - 0.055;
}

With this fix, Takua’s UI now fully works on arm64 Windows 11 and displays renders correctly:

Building Embree on arm64 Windows 11

Takua has a moderately sized dependency base, and getting all of the dependency base compiled during my ports to arm64 Linux and arm64 macOS was a very large part of the overall effort since arm64 support across the board was still in an early stage in the graphics field three years ago. However, now that libraries such as Embree and OpenEXR and even TBB have been building and running on arm64 for years now, I was expecting that getting Takua’s full dependency base brought up on Windows on Arm would be straightforward. Indeed this was the case for everything except Embree, which proved to be somewhat tricky to get working. I was surprised that Embree proved to be difficult, since Embree for a few years now has had excellent arm64 support on macOS and Linux. Thanks to a contribution from Apple’s Developer Ecosystem Engineer team, arm64 Embree now even has a neat double-pumped NEON option for emulating AVX2 instructions.

As of the time of writing this post, compiling Embree 4.3.1 for arm64 using MSVC 19.x (which ships with Visual Studio 2022) simply does not work. Initially just to get the renderer up and running in some form at all, I disabled Embree in the build. Takua has both an Embree-based traversal system and a standalone traversal system that uses my own custom BVH implementation; I keep both systems at parity with each other because Takua at the end of the day is a hobby renderer that I work on for fun, and writing BVH code is fun! However, a secondary reason for keeping both traversal systems around is because in the past having a non-Embree code path has been useful for getting the renderer bootstrapped on platforms that Embree doesn’t fully support yet, and this was another case of that.

Right off the bat, building Embree with MSVC runs into a bunch of problems with detecting the platform as being a 64-bit platform and also runs into all kinds of problems with including immintrin.h, which is where vector data types and other x86-64 intrinsics stuff is defined. After hacking my way through solving those problems, the next issue I ran into is that MSVC really does not like how Embree carries out static initialisation of NEON datatypes; this is a known problem in MSVC. Supposedly this issue was fixed in MSVC some time ago, but I haven’t been able to get it to work at all. Fixing this issue requires some extensive reworking of how Embree does static initialisation of vector datatypes, which is not a very trivial task; Anthony Roberts previously attempted to actually make these changes in support of getting Embree on Windows on Arm working for use in Blender, but eventually gave up since making these changes while also making sure Embree still passes all of its internal tests proved to be challenging.

In the end, I found a much easier solution to be to just compile Embree using Visual Studio’s version of clang instead of MSVC. This has to be done from the command line; I wasn’t able to get this to work from within Visual Studio’s regular GUI. From within a Developer PowerShell for Visual Studio session, the following worked for me:

cmake -G "Ninja" ../../ -DCMAKE_C_COMPILER="clang-cl" `
                        -DCMAKE_CXX_COMPILER="clang-cl" ` 
                        -DCMAKE_C_FLAGS_INIT="--target=arm64-pc-windows-msvc" `
                        -DCMAKE_CXX_FLAGS_INIT="--target=arm64-pc-windows-msvc" `
                        -DCMAKE_BUILD_TYPE=Release `
                        -DTBB_ROOT="[TBB LOCATION HERE]" `
                        -DCMAKE_INSTALL_PREFIX="[INSTALL PREFIX HERE]"

To do the above, of course you will need both CMake and Ninja installed; fortunately both come with pre-built arm64 Windows binaries on their respective websites. You will also need to install the “C++ Clang Compiler for Windows” component in the Visual Studio Installer application if you haven’t already.

Just building with clang is also the solution that Blender eventually settled on for Windows on Arm, although Blender’s version of this solution is a bit more complex since Blender builds Embree using its own internal clang and LLVM build instead of just using the clang that ships with Visual Studio.

An additional limitation in compiling Embree 4.3.1 for arm64 on Windows right now is that ISPC support seems to be broken. On arm64 macOS and Linux this works just fine; the ISPC project provides prebuilt arm64 binaries on both platforms, and even without a prebuilt arm64 binary, I found that running the x86-64 build of ISPC on arm64 macOS via Rosetta 2 worked without a problem when building Embree. However, on arm64 Windows 11, even though the x86-64 emulation system ran the x86-64 build of ISPC just fine standalone, trying to run it as part of the Embree build didn’t work for me despite me trying a variety of ways to get it to work. I’m not sure if this works with a native arm64 build of ISPC; building ISPC is a sufficiently involved process that I decided it was out of scope for this project.

Running x86-64 code on arm64 Windows 11

Much like how Apple provides Rosetta 2 for running x86-64 applications on arm64 macOS, Microsoft provides a translation layer for running x86 and x86-64 applications on arm64 Windows 11. In my post on porting to arm64 macOS, I included a lengthy section discussing and performance testing Rosetta 2. This time around, I haven’t looked as deeply into x86-64 emulation on arm64 Windows, but I did do some basic testing. Part of why I didn’t go as deeply into this area on Windows is because I’m running arm64 Windows 11 in a virtual machine instead of on native hardware- the comparison won’t be super fair anyway. Another part of why I didn’t go in as deeply is because x86-64 emulation is something that continues to be in an active state of development on Windows; Windows 11 24H2 is supposed to introduce a new x86-64 emulation system called Prism that Microsoft promises to be much faster than the current system in 23H2 [Mehdi 2024]. As of writing though, little to no information is available yet on how Prism works and how it improves on the current system.

The current system for emulating x86 and x86-64 on arm64 Windows is a fairly complex system that differs greatly from Rosetta 2 in a lot of ways. First, arm64 Windows 11 supports emulating both 32-bit x86 and 64-bit x86-64, whereas macOS dropped any kind of 32-bit support long ago and only needs to support 64-bit x86-64 on 64-bit arm64. Windows actually handles 32-bit x86 and 64-bit x86-64 through two basically completely different systems. 32-bit x86 is handled through an extension of the WoW64 (Windows 32-bit on Windows 64-bit) system, while 64-bit x86-64 uses a different system. The 32-bit system uses a JIT compiler called xtajit.dll [Radich et al. 2020, Beneš 2018] to translate blocks of x86 assembly to arm64 assembly and has a caching mechanism for JITed code blocks similar to Rosetta 2 to speed up execution of x86 code that has already been run through the emulation system before [Cylance Research Team 2019]. In the 32-bit system, overall support for providing system calls and whatnot are handled as part of the larger WoW64 system.

The 64-bit system relies on a newer mechanism. The core binary translation system is similar to the 32-bit system, but providing system calls and support for the rest of the surrounding operatin system doesn’t happen through WoW64 at all and instead relies on something that is in some ways similar to Rosetta 2, but is in other crucial ways radically different from Rosetta 2 or the 32-bit WoW64 approach. In Rosetta 2, arm64 code that comes from translation uses a completely different ABI from native arm64 code; the translated arm64 ABI contains a direct mapping between x86-64 and arm64 registers. Microsoft similarly uses a different ABI for translated arm64 code compared with native arm64 code; in Windows, translated arm64 code uses the arm64EC (EC for “Emulation Compatible”) ABI. Here though we find the first major difference between the macOS and Windows 11 approaches. In Rosetta 2, the translated arm64 ABI is an internal implementation detail that is not exposed to users or developers whatsoever; by default there is no way to compile source code against the translated arm64 ABI in Xcode. In the Windows 11 system though, the arm64EC ABI is directly available to developers; Visual Studio 2022 supports compiling source code against either the native arm64 or the translation-focused arm64EC ABI. Code built as arm64EC is capable of interoperating with emulated x86-64 code within the same process, the idea being that this approach allows developers to incrementally port applications to arm64 piece-by-piece while leaving other pieces as x86-64 [Sweetgall et al. 2023]. This… is actually kind of wild if you think about it!

The second major difference between the macOS and Windows 11 approaches is even bigger than the first. On macOS, application binaries can be fat binaries (Apple calls these universal binaries), which contain both full arm64 and x86-64 versions of an application and share non-code resources within a single universal binary file. The entirety of macOS’s core system and frameworks ship as universal binaries, such that at runtime Rosetta 2 can simply translate both the entirety of the user application and all system libraries that the application calls out to into arm64. Windows 11 takes a different approach- on arm64, Windows 11 extends the standard Windows portable executable format (aka .exe files) to be a hybrid binary format called arm64X (X for eXtension). The arm64X format allows for arm64 code compiled against the arm64EC ABI and emulated x86-64 code to interoperate within the same binary; x86-64 code in the binary is translated to arm64EC as needed. Pretty much every 64-bit system component of Windows 11 on Arm ships as arm64X binaries [Niehaus 2021]. Darek Mihocka has a fantastic article that goes into extensive depth about how arm64EC and arm64X work, and Koh Nakagawa has done an extensive analysis of this system as well.

One thing that Windows 11’s emulation system does not seem to be able to do is make special accomodations for TSO memory ordering. As I explored previously, Rosetta 2 gains a very significant performance boost from Apple Silicon’s hardware-level support for emulating x86-64’s strong memory ordering. However, since Microsoft cannot control and custom tailor the hardware that Windows 11 will be running on, arm64 Windows 11 can’t make any guarantees about hardware-level TSO memory ordering support. I don’t know if this situation is any different with the new Prism emulator running on the Snapdragon X Pro/Elite, but in the case of the current emulation framework, the lack of hardware TSO support is likely a huge problem for performance. In my testing of Rosetta 2, I found that Takua typically ran about 10-15% slower as x86-64 under Rosetta 2 with TSO mode enabled (the default) compared with native arm64, but ran 40-50% slower as x86-64 under Rosetta 2 with TSO mode disabled compared with native arm64.

Below are some numbers comparing running Takua on arm64 Windows 11 as a native arm64 application versus as an emulated x86-64 application. The tests used are the same as the ones I used in my Rosetta 2 tests, with the same settings as before. In this case though, because this was all running in a virtual machine (with 6 allocated cores) instead of directly on hardware, the absolute numbers are not as important as the relative difference between native and emulated modes:

	CORNELL BOX
	1024x1024, PT
Test:	Wall Time:	Core-Seconds:
Native arm64 (VM):	60.219 s	approx 361.314 s
Emulated x86-64 (VM):	202.242 s	approx 1273.45 s

	TEA CUP
	1920x1080, VCM
Test:	Wall Time:	Core-Seconds:
Native arm64 (VM):	244.37 s	approx 1466.22 s
Emulated x86-64 (VM):	681.539 s	approx 4089.24 s

	BEDROOM
	1920x1080, PT
Test:	Wall Time:	Core-Seconds:
Native arm64 (VM):	530.261 s	approx 3181.57 s
Emulated x86-64 (VM):	1578.76 s	approx 9472.57 s

	SCANDINAVIAN ROOM
	1920x1080, PT
Test:	Wall Time:	Core-Seconds:
Native arm64 (VM):	993.075 s	approx 5958.45 s
Emulated x86-64 (VM):	1745.5 s	approx 10473.0 s

The emulated results are… not great; for compute-heavy workloads like path tracing, x86-64 emulation on arm64 Windows 11 seems to to be around 1.7x to 3x slower than native arm64 code. These results are much slower compared with how Rosetta 2 performs, which generally sees only a 10-15% performance penalty over native arm64 when running Takua Renderer. However, a critical caveat has to be pointed out here: reportedly Windows 11’s x86-64 emulation works worse in a VM on Apple Silicon than it does on native hardware because Arm RCpc instructions on Apple Silicon are relatively slow. For Rosetta 2 this behavior doesn’t matter because Rosetta 2 uses TSO mode instead of RCpc instructions for emulating strong memory ordering, but since Windows on Arm does rely on RCpc for emulating strong memory ordering, this means that the results above are likely not fully representative of emulation performance on native Windows on Arm hardware. Nonetheless though, having any form of x86-64 emulation at all is an important part of making Windows on Arm viable for mainstream adoption, and I’m looking forward to see how much of an improvement the new Prism emulation system in Windows 11 24H2 brings. I’ll update these results with the Prism emulator once 24H2 is released, and I’ll also update these results to show comparisons on real Windows on Arm hardware whenever I actually get some real hardware to try out.

Conclusion

I don’t think that x86-64 is going away any time soon, but at the same time, the era of mainstream desktop arm64 adoption is here to stay. Apple’s transition to arm64-based Apple Silicon already made the viability of desktop arm64 unquestionable, and now that Windows on Arm is finally ready for the mainstream as well, I think we will now be living in a multi-architecture world in the desktop computing space for a long time. Having more competitors driving innovation ultimately is a good thing, and as new interesting Windows on Arm devices enter the market alongside Apple Silicon Macs, Takua Renderer is ready to go!

References

ARM Holdings. 2022. Load-Acquire and Store-Release instructions. Retrieved June 7, 2024.

Petr Beneš. 2018. Wow64 Internals: Re-Discovering Heaven’s Gate on ARM. Retrieved June 5, 2024.

Cylance Research Team. 2019. Teardown: Windows 10 on ARM - x86 Emulation. In BlackBerry Blog. Retrieved June 5, 2024.

Angela Jiang. 2020. Announcing the OpenCL™ and OpenGL® Compatibility Pack for Windows 10 on ARM. In DirectX Developer Blog. Retrieved June 5, 2024.

Yusuf Mehdi. 2024. Introducing Copilot+ PCs. In Official Microsoft Blog. Retrieved June 5, 2024.

Derek Mihocka. 2024. ARM64 Boot Camp. Retrieved June 5, 2024.

Koh M. Nakagawa. 2021. Discovering a new relocation entry of ARM64X in recent Windows 10 on Arm. In Project Chameleon. Retrieved June 5, 2024.

Koh M. Nakagawa. 2021. Relock 3.0: Relocation-based obfuscation revisited in Windows 11 on Arm. In Project Chameleon. Retrieved June 5, 2024.

Michael Niehaus. 2021. Running x64 on Windows 10 ARM64: How the heck does that work?. In Out of Office Hours. Retrieved June 5, 2024.

Quinn Radich, Karl Bridge, David Coulter, and Michael Satran. 2020. WOW64 Implementation Details. In Programming Guide for 64-bit Windows. Retrieved June 5, 2024.

Marc Sweetgall, Drew Batchelor, Scott Jones, and Matt Wojciakowski. 2023. Arm64EC - Build and port apps for native performance on ARM. Retrieved June 5, 2024.

Wikipedia. 2024. WoW64. Retrieved June 5, 2024.

SIGGRAPH 2023 Conference Paper- Progressive Null-tracking for Volumetric Rendering

August 13, 2023 | Tags: Hyperion, Volume Rendering, SIGGRAPH

This year at SIGGRAPH 2023, we have a conference-track technical paper in collaboration with Zackary Misso and Wojciech Jarosz from Dartmouth College! The paper is titled “Progressive Null-tracking for Volumetric Rendering” and is the result of work that Zackary did while he was a summer intern with the Hyperion development team last summer. On the Disney Animation side, Brent Burley, Dan Teece, and I oversaw Zack’s internship work, while on the the Dartmouth side, Wojciech was involved in the project as both Zack’s PhD advisor and as a consultant to Disney Animation.

Here is the paper abstract:

Null-collision approaches for estimating transmittance and sampling free-flight distances are the current state-of-the-art for unbiased rendering of general heterogeneous participating media. However, null-collision approaches have a strict requirement for specifying a tightly bounding total extinction in order to remain both robust and performant; in practice this requirement restricts the use of null-collision techniques to only participating media where the density of the medium at every possible point in space is known a-priori. In production rendering, a common case is a medium in which density is defined by a black-box procedural function for which a bounding extinction cannot be determined beforehand. Typically in this case, a bounding extinction must be approximated by using an overly loose and therefore computation- ally inefficient conservative estimate. We present an analysis of how null-collision techniques degrade when a more aggressive initial guess for a bounding extinction underestimates the true maximum density and turns out to be non-bounding. We then build upon this analysis to arrive at two new techniques: first, a practical, efficient, consistent progressive algorithm that allows us to robustly adapt null-collision techniques for use with procedural media with unknown bounding extinctions, and second, a new importance sampling technique that improves ratio-tracking based on zero-variance sampling.

The paper and related materials can be found at:

One cool thing about this project is that this project both served as a direct extension of Zack’s PhD research area and served as a direct extension of the approach we’ve been taking to volume rendering in Disney’s Hyperion Renderer over the past 6 years. Hyperion has always used unbiased transmittance estimators for volume rendering (as opposed to biased ray marching) [Fong et al. 2017], and Hyperion’s modern volume rendering system is heavily based on null-collision theory [Woodcock et al. 1965]. We’ve put significant effort into making a null-collision based volume rendering system robust and practical in production, which led to projects such as residual ratio tracking [Novák et al. 2014], spectral and decomposition tracking [Kutz et al. 2017] and approaches for unbiased emission and scattering importance sampling in heterogeneous volumes [Huang et al. 2021]. Over the past decade, many other production renderers [Christensen et al. 2018, Gamito 2018, Novák et al. 2018] have similarly made the shift to null-collision based volume rendering because of the many benefits that the null-collision framework brings, such as unbiased volume rendering and efficient handling of volumes with lots of high-order scattering due to the null-collision framework’s ability to cheaply perform distance sampling. Vanilla null-collision volume rendering does have shortcomings, such as difficulty in efficiently sampling optically thin volumes due to the fact that null-collision tracking techniques produce a binary transmittance estimate that is super noisy. A lot of progress has been made in improving null-collision volume rendering’s efficiency and robustness in these thin volumes cases [Villemin and Hery 2013, Villemin et al. 2018, Herholz et al. 2019, Miller et al. 2019]; the intro to the paper goes into much more extensive detail about these advancements.

However, one major limitation of null-collision volume rendering that remained unsolved until this paper is that the null-collision framework requires knowing the maximum density, or bounding majorant of a heterogeneous volume beforehand. This is a fundamental requirement of null-collision volume rendering that makes using procedurally defined volumes difficult, since the maximum possible density value of a procedurally defined volume cannot be known a-priori without either putting into place a hard clamp or densely evaluating the procedural function. As a result, renderers that use null-collision volume rendering typically only support procedurally defined volumes by pre-rasterizing the procedural function onto a fixed voxel grid, à la the volume pre-shading in Manuka [Fascione et al. 2018]. The need to pre-rasterize procedural volumes negates a lot of the workflow and artistic benefits of using procedural volumes; this is one of several reasons why other renderers continue to use ray-marching based integrators for volumes despite the bias and loss of efficiency at handling high-order scattering. Inspired by ongoing challenges we were facing with rendering huge volume-scapes on Strange World at the time, we gave Zack a very open-ended challenge for his internship: brainstorm and experiment with ways to lift this limitation in null-collision volume rendering.

Zack’s PhD research coming into this internship revolved around deeply investigating the math behind modern volume rendering theory, and from these investigations, Zack had previously found deep new insights into how to formulate volumetric transmittance [Georgiev et al. 2019] and cool new ways to de-bias previously biased techniques such as ray marching [Misso et al. 2022]. Zack’s solution to the procedural volumes in null-collision volume rendering problem very much follows in the same trend as his previous papers; after initially attempting to find ways to adapt de-biased ray marching to fit into a null-collision system, Zack went back to first principles and had the insight that a better solution was to find a way to de-bias the result that one gets from clamping the majorant of a procedural function. This idea really surprised me when he first proposed it; I had never thought about the problem from this perspective before. Dan, Brent, and I were highly impressed!

In addition to the acknowledgements in the paper, I wanted to acknowledge here Henrik Falt and Jesse Erickson from Disney Animation, who spoke with Zack and us early in the project to help us better understand how better procedural volumes support in Hyperion could benefit FX artist workflows. We are also very grateful to Disney Animation’s CTO, Nick Cannon, for granting us permission to include example code implemented in Mitsuba as part of the paper’s supplemental materials.

A bit of a postscript: during the Q&A session after Zack’s paper presentation at SIGGRAPH, Zack and I had a chat with Wenzel Jakob, Merlin Nimier-David, Delio Vicini, and Sébastien Speierer from EPFL’s Realistic Graphics Lab. Wenzel’s group brought up a potential use case for this paper that we hadn’t originally thought of. Neural radiance fields (NeRFs) [Mildenhall et al. 2020, Takikawa et al. 2023] are typically rendered using ray marching, but this is often inefficient. Rendering NeRFs using null tracking instead of ray marching is an interesting idea, but the neural networks that underpin NeRFs are essentially similar to procedural functions as far as null-collision tracking is concerned because there’s no way to know a tight bounding majorant for a neural network a-priori without densely evaluating the neural network. Progressive null tracking solves this problem and potentially opens the door to more efficient and interesting new ways to render NeRFs! If you happen to be interested in this problem, please feel free to reach out to Zack, Wojciech, and myself.

Getting to work with Zack and Wojciech on this project was an honor and a blast; I count myself as very lucky that working at Disney Animation continues to allow me to meet and work with rendering folks from across our field!

References

Per H. Christensen, Julian Fong, Jonathan Shade, Wayne L Wooten, Brenden Schubert, Andrew Kensler, Stephen Friedman, Charlie Kilpatrick, Cliff Ramshaw, Marc Bannister, Brenton Rayner, Jonathan Brouillat, and Max Liani. 2018. RenderMan: An Advanced Path Tracing Architecture for Movie Rendering. ACM Transactions on Graphics. 37, 3 (2018), 30:1-30:21.

Luca Fascione, Johannes Hanika, Mark Leone, Marc Droske, Jorge Schwarzhaupt, Tomáš Davidovič, Andrea Weidlich, and Johannes Meng. 2018. Manuka: A Batch-Shading Architecture for Spectral Path Tracing in Movie Production. ACM Transactions on Graphics. 37, 3 (2018), 31:1-31:18.

Julian Fong, Magnus Wrenninge, Christopher Kulla, and Ralf Habel. 2017. Production Volume Rendering. In ACM SIGGRAPH 2021 Courses. 2:1-2:97.

Manuel Gamito. 2018. Path Tracing the Framestorian Way. In SIGGRAPH 2018 Course Notes: Path Tracing in Production. 52-61.

Sebastian Herholz, Yangyang Zhao, Oskar Elek, Derek Nowrouzezahrai, Hendrik P A Lensch, and Jaroslav Křivánek. 2019. Volume Path Guiding Based on Zero-Variance Random Walk Theory. ACM Transactions on Graphics. 38, 3 (2019), 24:1-24:19.

Wei-Feng Wayne Huang, Peter Kutz, Yining Karl Li, and Matt Jen-Yuan Chiang. 2021. Unbiased Emission and Scattering Importance Sampling For Heterogeneous Volumes. In ACM SIGGRAPH 2021 Talks. 3:1-3:2.

Peter Kutz, Ralf Habel, Yining Karl Li, and Jan Novák. 2017. Spectral and Decomposition Tracking for Rendering Heterogeneous Volumes. ACM Transactions on Graphics. 36, 4 (2017), 111:1-111:16.

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV 2020: Proceedings of the 16th European Conference on Computer Vision. 405-421.

Bailey Miller, Iliyan Georgiev, and Wojciech Jarosz. 2019. A Null-Scattering Path Integral Formulation of Light Transport. ACM Transactions on Graphics. 38, 4 (@019), 44:1-44:13.

Jan Novák, Iliyan Georgiev, Johannes Hanika, and Wojciech Jarosz. 2018. Monte Carlo Methods for Volumetric Light Transport Simulation. Computer Graphics Forum. 37, 2 (2018), 551-576.

Jan Novák, Andrew Selle and Wojciech Jarosz. 2014. Residual Ratio Tracking for Estimating Attenuation in Participating Media. ACM Transactions on Graphics. 33, 6 (2014), 179:1-179:11.

Towaki Takikawa, Shunsuke Saito, James Tompkin, Vincent Sitzmann, Srinath Sridhar, Or Litany, and Alex Yu. 2023. Neural Fields for Visual Computing. In ACM SIGGRAPH 2023 Courses. 10:1-10:227.

Ryusuke Villemin and Christophe Hery. 2013. Practical Illumination from Flames. Journal of Computer Graphics Techniques. 2, 2 (2013), 142-155.

Ryusuke Villemin, Magnus Wrenninge, and Julian Fong. 2018. Efficient Unbiased Rendering of Thin Participating Media. Journal of Computer Graphics Techniques. 7, 3 (2018), 50-65.

E. R. Woodcock, T. Murphy, P. J. Hemmings, and T. C. Longworth. 1965. Techniques used in the GEM code for Monte Carlo neutronics calculations in reactors and other systems of complex geometry. In Applications of Computing Methods to Reactor Problems. Argonne National Laboratory.

Code & Visuals

By Yining Karl Li

Blog | Archive | Projects | About

Photography Show at Disney Animation

Footnotes

New Unified Site Design

Moana 2

References

Footnotes

DigiPro 2024 Paper- Cache Points For Production-Scale Occlusion-Aware Many-Lights Sampling And Volumetric Scattering

References

Porting Takua Renderer to Windows on Arm

Table of Contents

Introduction

OpenGL on arm64 Windows 11

Building Embree on arm64 Windows 11

Running x86-64 code on arm64 Windows 11

Conclusion

References

SIGGRAPH 2023 Conference Paper- Progressive Null-tracking for Volumetric Rendering

References