All nonderivative source code files published on this website should be clearly labelled with a license at the top. In most cases, that will be the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license: You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.
Derivative works retain their original license.
The latest and greatest Xmas Demo! Contains source code that should compile for both Windows and Linux. It runs in 60 fps on the NVidia Jetson AGX Xavier: Use the tricks mentioned in the article.
A respin of the 2017 Xmas Demo. Full source code that should compile for both Windows and Linux. We replaced the fonts, the graphics, the music, and fixed a busload of bugs. Shaders no longer NVidia only. Tested on a couple of AMD cards.
Generator for the video Dreamscape: The Missing Scene. The supplied video file has to be converted to raw YUV444. FFmpeg can do that.
Originally released at The Gathering 1994. I recovered the source in 2020 and fixed an old timing bug, so it should run smoothly now. On an Amiga 1200 or 4000, of course.
This work was done in 2013 but wasn't released until 2018. Old ARM code added in 2019. I'm pretty certain that the method used is unique: Fit 8 lines in L1 cache and transpose in place to make the horizontal pass simpler. The ARM Neon assembler version is written as inline assembler. It wasn't a stellar idea. I spent some time deciding how to stack the instructions. Didn't seem to matter on the TX1.
Generator for the video Quaternion HQ Tech Demo. It's the old Julia quaternion code ported to straight C, a couple of bugs fixed, and quality turned up to 11. Windows multi-threaded: Each thread renders 16x16 blocks using fetchadd(). Probably trivial to port to Linux. It outputs jpg files using libjpeg-turbo, so not real-time at all. It's a generator.
Generator for the videos Superhero and Wobblerne 2. YUV dump from the VHS tape not included. It's quite large, around 30 GB. Contact me if you want a copy.
Generator for the video Dreamscape & Eclipse: The Final Cut. The supplied video files have to be converted to raw YUV444. FFmpeg can do that.
A fast AVX2 Bayer to I420 routine. It cuts a lot of corners, literally.
Several dubious GLSL optimization techniques are used to make this demo run in 60 fps on the TX2 256-core GPU. If you're looking for precise math, look elsewhere. Close enough will have to do on that GPU! Everything should fly by in 60 fps. Most of it is based on GPU code found on Shadertoy.
I looked at assorted binary fixed point libraries available on the internet. They sucked. The TILE-Gx multicore CPU has some floating-point support, so I combine those with some classic optimizations from Doom. I use the parallelization library from 2014 that uses the internal network for blisteringly fast communication between the cores.
I once attended a course about writing code for GPUs. They guy had no clue about how GPUs work. This crap runs fast as hell, by exploiting the fact that the CPU can predict how things are gonna execute.
A fast way to calculate 2x4 atan2() values using ARM Neon. Also, the name is hilarious.
Simplifies the AES CTR mode constant buffer calculations by eliminating the common terms. It runs much faster, 30 percent to be precise.
Early GPU work, superseded by newer works. The fractal routine should still be valid.
Early multi-threaded raytracer that splits the picture into lumps of lines. Wasn't a very bright idea.
I love the TILE-Gx CPU! This thing can replace the OpenSSL AES encrypt function by using the mega-awesome tblidx instructions. Also, I redid the final pass with a 64-bit merge. This could probably be done on x64-based CPUs too, or any other 64-bit CPU with such an instruction.
A fast Scharr (Sobel-type) filter for TILE-Gx. Exploits the fact that the vertical filters can be expressed as sums of horizontal filters.
Calculate the average of an area quickly using ARM Neon intrinsics. It does this by masking out the left and right parts, so there's just a single X-loop.
First version of the parallelization library with example code: A crap bilinear scaler.
Quake for the TILE-Gx multicore CPU. The 36-core version has no problems running 1 per core.
MultiDoom for a TILE-Gx host like the SX80. The 36-core version has no problems running 1 Doom per core.
Early parallel floating point raytracer for the TILE-Gx.
A fast RGB to YUV conversion routine for the TILE-Gx.
AWK script to convert GCC TILE-Gx assembler output to readable code.
MultiDoom for the TILE-Gx PCI Express card.
A fast YUV to RGB conversion routine for SSE2. Uses more multipliers for fewer stalls. Or that's the idea.
A fast YUV to RGB conversion routine for TILE-Gx.
Halide's 3x3 box filter example code is broken, and their test systems too old. Here's my take on it.
Superseded by later TILE-Gx AES releases. Doesn't have the nifty solution to the last round problem.
Superseded by later SRTP AES releases, but it was pretty neat in 2010.
Early MultiDoom for the TILE64 PCI Express card.
Unfortunately, I haven't been able to recover much from this period. Pictures of some fun stuff we did for TANDBERG can be found here: TANDBERG Secrets.
Assorted stuff I had checked out on my last day there. I suggest reading the article for more details.
It's a functional compiler that generates 68000 code. Written in C++ version 2 in 1996. I compiled and tried it in 2020. It still works.
Amiga 64K intro from The Gathering 1995. Features a raytracer written by a friend of mine.
I didn't write any of this! The code was done by Warp and Smeagol.