Home -- News: 2 January 2021 -- Source Code -- Videos -- LinkedIn -- About

Source Code Archive

Important Licensing Information

All nonderivative source code files published on this website should be clearly labelled with a license at the top. In most cases, that will be the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license: You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.

Derivative works retain their original license.


xmasdemo_v2_2020.zip - Article: The Xmas Demo 2020 Edition - C, GLSL

A respin of the 2017 Xmas Demo. Full source code that should compile for both Windows and Linux. We replaced the fonts, the graphics, the music, and fixed a busload of bugs. Shaders no longer NVidia only. Tested on a couple of AMD cards.

recycler_2020.zip - Article: Dreamscape & Eclipse: The Final Cut - C

Generator for the video Dreamscape: The Missing Scene. The supplied video file has to be converted to raw YUV444. FFmpeg can do that.

Invention_TG94_2020.zip - 68020 Assembler

Originally released at The Gathering 1994. I recovered the source in 2020 and fixed an old timing bug, so it should run smoothly now. On an Amiga 1200 or 4000, of course.


imagetrans_2019.zip - Article: A Different Method for Image Transformations - C, SSE2 intrinsics, ARM Neon assembler

This work was done in 2013, but wasn't released until 2018. Old ARM code added in 2019. I'm pretty certain that the method used is unique: Fit 8 lines in L1 cache and transpose in place to make the horizontal pass simpler. The ARM Neon assembler version is written as inline assembler. It wasn't a stellar idea. I spent some time deciding how to stack the instructions. Didn't seem to matter on the TX1.

quat_2019.zip - C++

Undocumented: Generator for the video Quaternion HQ Tech Demo. It's the old Julia quaternion code ported to straight C, a couple of bugs fixed, and quality turned up to 11. Windows multi-threaded: Each thread renders 16x16 blocks using fetchadd(). Probably trivial to port to Linux. It outputs jpg files using libjpeg-turbo, so not realtime at all. It's a generator.

cliffy_2019.zip - Section: Triumph - C

Generator for the videos Superhero and Wobblerne 2. YUV dump from the VHS tape not included. It's quite large, around 30 GB. Contact me if you want a copy.

remix2_2019.zip - Article: Dreamscape & Eclipse: The Final Cut - C

Generator for the video Dreamscape & Eclipse: The Final Cut. The supplied video files have to be converted to raw YUV444. FFmpeg can do that.


bayer_2018.c - C, AVX2 intrinsics

Undocumented: A fast AVX2 Bayer to I420 routine. It cuts a lot of corners, literally.


xmasdemo_2017.zip - Article: Nvidia Jetson TX2 Xmas Demo 2017 - C, ARM Neon intrinsics, GLSL

Several dubious GLSL optimization techniques are used to make this demo run in 60 fps on the TX2 256-core GPU. If you're looking for precise math, look elsewhere. Close enough will have to do on that GPU! Everything should fly by in 60 fps. Most of it is based on GPU code found on Shadertoy.

tilegx_int_raytracer_2017.zip - Article: Integer Raytracing on Mellanox TILE-Gx - C, TILE-Gx intrinsics

I looked at assorted binary fixed point libraries available on the internet. They sucked. The TILE-Gx multicore CPU has some floating-point support, so I combine those with some classic optimizations from Doom. I use the parallelization library from 2014 that uses the internal network for blisteringly fast communication between the cores.

gpuray_2017.zip - Article: GPURay - GPU+CPU Raytracer for NVidia Tegra X1 - C, GLSL

I once attended a course about writing code for GPUs. They guy had no clue about how GPUs work. This crap runs fast as hell, by exploiting the fact that the CPU can predict how things are gonna execute.

satan2_2017.cpp - C, ARM Neon intrinsics

Undocumented: A fast way to calculate 2x4 atan2() values using ARM Neon. Also, the name is hilarious.


srtp_aes_revisited_2016.zip - Article: SRTP AES Optimization Revisited - C

Simplifies the AES CTR mode constant buffer calculations by eliminating the common terms. It runs much faster, 30 percent to be precise.

gpu_hacks_2016.zip - Article: GPU Hacks: Fractals, a Raytracer and Raytraced Quaternion Julia Sets - GLSL

Early GPU work, superseded by newer works. The fractal routine should still be valid.

raytracer_2016.zip - News: 27 June 2016 - C

Early multi-threaded raytracer that splits the picture into lumps of lines. Wasn't a very bright idea.


tilegx_aes_openssl_2015.zip - Article: OpenSSL aes_core.c Replacement for EZchip TILE-Gx - C, TILE-Gx intrinsics

I love the TILE-Gx CPU! This thing can replace the OpenSSL AES encrypt function by using the mega-awesome tblidx instructions. Also, I redid the final pass with a 64-bit merge. This could probably be done on x64-based CPUs too, or any other 64-bit CPU with such an instruction.


scharr_tilegx_2014.c - C, TILE-Gx intrinsics

Undocumented: A fast Scharr (Sobel-type) filter for TILE-Gx. Exploits the fact that the vertical filters can be expressed as sums of horizontal filters.

average_neon_2014.c - C, ARM Neon intrinsics

Undocumented: Calculate the average of an area quickly using ARM Neon intrinsics. It does this by masking out the left and right parts, so there's just a single X-loop.

tilegx_bilinear_par_2014.zip - Article: Bilinear Picture Scaling on Tilera TILE-Gx - C, TILE-Gx intrinsics

First version of the parallelization library with example code: A crap bilinear scaler.

multiquake_2014.zip - Article: MultiQuake - Quake for Mellanox TILE-Gx CPUs - C, TILE-Gx intrinsics

Quake for the TILE-Gx multicore CPU. The 36-core version has no problems running 1 per core.

multidoom_2014.zip - MultiDoom - Doom for Mellanox TILE-Gx CPUs - C, TILE-Gx intrinsics

My first foray into programming the TILE-Gx multicore CPU: The initial version was released in late 2009. The 36-core version has no problems running 1 Doom per core. I also had a version running on the TILE64 PCIe development board before that.

tilegx_raytracer_2014.zip - Raytracing on Tilera TILE-Gx - C, TILE-Gx intrinsics

Early parallel floating point raytracer for the TILE-Gx.


rgb2yuv_tilegx_2013.cpp - Article: RGB to YUV conversion on Tilera TILE-Gx - C, TILE-Gx intrinsics

A fast RGB to YUV conversion routine for the TILE-Gx.

fix_2013.awk - Article: RGB to YUV conversion on Tilera TILE-Gx - AWK

AWK script to convert GCC TILE-Gx assembler output to readable code.


yuv2rgb_sse2_2012.cpp - Article: YUV to RGB Conversion using SSE2 - C, SSE2 intrinsics

A fast YUV to RGB conversion routine for SSE2. Uses more multipliers for fewer stalls. Or that's the idea.

yuv2rgb_tilegx_2012.c - Article: YUV to RGB Conversion on Tilera TILE-Gx - C, TILE-Gx intrinsics

A fast YUV to RGB conversion routine for TILE-Gx.

box_sse2_2012.cpp - Article: A look at Halide's SSE2 3x3 Box Filter - C, SSE2 intrinsics

Halide's 3x3 box filter example code is broken, and their test systems too old. Here's my take on it.


aes_tilegx_2011.c - Article: AES Optmization on Tilera TILE-Gx - C, TILE-Gx intrinsics

Superseded by later TILE-Gx AES releases. Doesn't have the nifty solution to the last round problem.


srtp_aes_2010.c - Article: SRTP AES Optimization - C

Superseded by later SRTP AES releases, but it was pretty neat in 2010.


Unfortunately, no code is available for this period since I worked for the greatest company in the world. Pictures of some fun stuff we did at the factory can be found here: TANDBERG Secrets.

The 90s

homepilotsrc_1999.zip - Article: Source Code and Schematics for PCTVNet HomePilot Set Top Box - C, x86 assembler

Assorted stuff I had checked out on my last day there. I suggest reading the article for more details.

Logla_02.zip - C++

It's a functional compiler that generates 68000 code, written in C++ version 2 in 1996.

Speed_TG95.zip - C, C++ and 68020 assembler

Features a raytracer written by a friend of mine.

NoTemptations_part1.lha - 68000 Assembler

I didn't write any of this! The code was done by Warp and Smeagol.