Article: The Xmas Demo 2022 - C, GLSL
Archive: xmasdemo_shaders_2022.zip
Files: xmasdemo_shaders_2022/
All the modified GLSL shaders used in the Xmas Demo 2022.
Runs in 60 fps on the NVidia Jetson AGX Orin.
Article: The Xmas Demo 2021 - C, GLSL
Archive: xmasdemo_v2_2021.zip
Contains full source code that should compile on both Windows and Linux.
It runs in 60 fps on the NVidia Jetson AGX Xavier: Use the tricks mentioned in the article.
Article: The Xmas Demo 2020 Edition - C, GLSL
Archive: xmasdemo_v5_2020.zip
A respin of the 2017 Xmas Demo. Full source code that should compile on both Windows and Linux.
We replaced the fonts, the graphics, the music, and fixed a busload of bugs.
Shaders no longer NVidia only. Tested on a couple of AMD cards.
Article: Dreamscape & Eclipse: The Final Cut - C
Archive: recycler_2020.zip
Generator for the video
Dreamscape: The Missing Scene [youtube.com].
The supplied video file has to be converted to raw YUV444. FFmpeg can do that.
Article: Real Programming - 68020 Assembler
Archive: Invention_TG94_2020.zip
Files: Invention_TG94_2020/
40KB Amiga intro released at The Gathering 1994.
I recovered the source in 2020 and fixed an old timing bug, so it should run smoothly now.
On an Amiga 1200 or 4000, of course.
Article: A Different Method for Image Transformations - C, SSE2 intrinsics, ARM Neon assembler
Archive: imagetrans_2019.zip
Files: imagetrans_2019/
Demonstrates a unique and extremely fast method for applying separable filters. A Lanczos-2 picture scaler is used as example.
SSE2 intrinsics and ARM Neon Assembler solutions provided.
The main work was done in 2013, but the code wasn't recovered until 2018/2019.
Article: Real Programming - C++
Archive: quat_2019.zip
Generator for the video
Quaternion HQ Tech Demo [youtube.com].
It's the old Julia quaternion code ported to straight C, a couple of bugs fixed, and quality turned up to 11.
Windows multi-threaded version. Probably trivial to port to Linux. Not real-time at all. It's a generator.
Article: Real Programming - C
Archive: cliffy_2019.zip
Generator for the videos
Superhero [youtube.com]
and
Wobblerne 2 [youtube.com].
YUV dump from the VHS tape not included. It's quite large, around 30 GB. Contact me if you want a copy.
Article: Dreamscape & Eclipse: The Final Cut - C
Archive: remix2_2019.zip
Generator for the video
Dreamscape & Eclipse: The Final Cut [youtube.com].
The supplied video files have to be converted to raw YUV444. FFmpeg can do that.
Article: Real Programming - C, AVX2 intrinsics
File: bayer_2018.c
It cuts a lot of corners, literally. Uses the ultra-cool maddubs intrinsic.
Article: Nvidia Jetson TX2 Xmas Demo 2017 - C, ARM Neon intrinsics, GLSL
Archive: xmasdemo_2017.zip
Files: xmasdemo_2017/
Xmas Demo for the NVidia Jetson TX2 with just 256 GPU cores. In cooperation with Tor Ringstad. The original
where we implemented a series of insane (and sometimes inane) optimizations to make fancy ShaderToy effects run in 60 fps.
Article: Integer Raytracing on Tilera TILE-Gx - C, TILE-Gx intrinsics
Archive: tilegx_int_raytracer_2017.zip
Files: tilegx_int_raytracer_2017/
Floating-point raytracing on the TILE-Gx wasn't a big hit. Parts of the fp support can be
combined with integer calculations and classic optimizations from Quake to trace 40 spheres
in 1080p60.
Article: GPURay - GPU+CPU Raytracer for NVidia Tegra X1 - C, GLSL
Archive: gpuray_2017.zip
Files: gpuray_2017/
128 spheres in 1080p60 on the NVidia Tegra TX1.
Among other nifty optimizations,
it exploits the fact that the CPU can predict how things are gonna execute on the GPU for a
cool 60 percent speedup.
Article: Real Programming - C, ARM Neon intrinsics
File: satan2_2017.cpp
A fast way to calculate 2x4 atan2() values using ARM Neon. Also, the name is legendary.
Article: SRTP AES Optimization Revisited - C
Archive: srtp_aes_revisited_2016.zip
Simplifies the AES CTR mode constant buffer calculations by eliminating the common terms.
It runs much faster, 30 percent to be precise.
Article: GPU Hacks: Fractals, a Raytracer and Raytraced Quaternion Julia Sets - GLSL
Archive: gpu_hacks_2016.zip
Files: gpu_hacks_2016/
Early GPU work, superseded by newer works. The fractal routine is still valid:
Don't bother with early cutoff and always optimize for max depth: Gives
1080p60 with depth 256 on the TX1 even when all pixels are at max depth. Really.
Article: News: 27 June 2016 - C
Archive: raytracer_2016.zip
Early multi-threaded raytracer that splits the picture into lumps of lines. Wasn't a very bright idea.
Splitting into squares is much better.
Article: OpenSSL aes_core.c Replacement for Tilera TILE-Gx - C, TILE-Gx intrinsics
Archive: tilegx_aes_openssl_2015.zip
I love the TILE-Gx CPU! This thing can replace the OpenSSL AES encrypt function by using the mega-awesome tblidx instructions.
Also, I redid the final pass with a 64-bit merge. This could probably be done on x64-based CPUs too, or any other 64-bit
CPU with such an instruction.
Article: Real Programming- C, TILE-Gx intrinsics
File: scharr_tilegx_2014.c
A fast Scharr (Sobel-type) filter for TILE-Gx. Exploits the fact that the vertical filters can be expressed as sums of horizontal filters.
Article: Real Programming- C, ARM Neon intrinsics
File: average_neon_2014.c
Calculate the average of an area quickly using ARM Neon intrinsics.
It does this by masking out the left and right parts, so there's just a single X-loop.
Article: Bilinear Picture Scaling on Tilera TILE-Gx - C, TILE-Gx intrinsics
Archive: tilegx_bilinear_par_2014.zip
First version of the parallelization library with example code: A crap bilinear scaler.
Article: MultiQuake - Quake for Tilera TILE-Gx CPUs - C, TILE-Gx intrinsics
Archive: multiquake_2014.zip
Quake for the TILE-Gx multicore CPU. The 36-core version has no problems running 1 per core.
Article: MultiDoom - Doom for Tilera TILE-Gx CPUs - C, TILE-Gx intrinsics
Archive: multidoom_2014.zip
MultiDoom for a TILE-Gx host like the SX80.
The 36-core version has no problems running 1 Doom per core.
Article: Raytracing on Tilera TILE-Gx - C, TILE-Gx intrinsics
Archive: tilegx_raytracer_2014.zip
Early parallel floating point raytracer for the TILE-Gx. It wasn't very quick.
Article: RGB to YUV conversion on Tilera TILE-Gx - C, TILE-Gx intrinsics
File: rgb2yuv_tilegx_2013.cpp
A fast RGB to YUV conversion routine for the TILE-Gx.
Article: Real Programming - AWK
File: fix_2013.awk
AWK script to convert GCC TILE-Gx assembler output to readable code.
Article: MultiDoom - Doom for Tilera TILE-Gx CPUs - C, TILE-Gx intrinsics
Archive: multidoomgx_2013.zip
MultiDoom for the TILE-Gx PCI Express card.
Article: YUV to RGB Conversion using SSE2 - C, SSE2 intrinsics
File: yuv2rgb_sse2_2012.cpp
A fast YUV to RGB conversion routine for SSE2. Uses more multipliers for fewer stalls. Or that's the idea.
Article: YUV to RGB Conversion on Tilera TILE-Gx - C, TILE-Gx intrinsics
File: yuv2rgb_tilegx_2012.c
A fast YUV to RGB conversion routine for the TILE-Gx.
Article: A look at Halide's SSE2 3x3 Box Filter - C, SSE2 intrinsics
File: box_sse2_2012.cpp
Halide's 3x3 box filter example code is broken, and their test systems too old. Here's my take on it.
Article: AES Optimization on Tilera TILE-Gx - C, TILE-Gx intrinsics
File: aes_tilegx_2011.c
Superseded by later TILE-Gx AES releases. Doesn't have the nifty solution to the last round problem.
Article: SRTP AES Optimization - C
File: srtp_aes_2010.c
Superseded by later SRTP AES releases, but it was pretty neat in 2010.
Article: MultiDoom - Doom for Tilera TILE-Gx CPUs - C, TILE-Gx intrinsics
Archive: multidoom_tile64_2009.zip
Early MultiDoom for the TILE64 PCI Express card.
Unfortunately, I haven't been able to recover much from this period. Pictures of some fun stuff we did for TANDBERG can be found here: TANDBERG Secrets.
Article: Source Code and Schematics for PCTVNet HomePilot Set-Top Box - C, x86 assembler
Archive: homepilotsrc_1999.zip
Assorted stuff I had checked out on my last day there. I suggest reading the article for more details.
C++
Archive: Logla_02_1996.zip
Files: Logla_02_1996/
A compiler that generates 68000 code for a simple programming language.
Written in C++ version 2 in 1996.
C++ version 2 was the last version that was useful. They really crapped it up after that.
I compiled the compiler (ha) and tested it in 2020. It still works.
C, C++ and 68020 assembler
Archive: Speed_TG95.zip
64KB Amiga intro released at The Gathering 1995. Features a raytracer written by a friend of mine.
68000 Assembler
Archive: NoTemptations_part1.lha
I didn't write any of this! The code was done by Warp and Smeagol.
All nonderivative source code files published on this website that are not labeled with a specific license are covered by this: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license. Here's a short, and by no means complete, summary:
The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.
Derivative works retain their original license.