Archive: nano_fractals_2023.zip (browse) - C, Assembler, GLSL
Article: Edgehog: 1080p60 Nano Fractals
1080p60 depth 256 fractals on a dainty Jetson Nano. The Edgehog method runs in parallel on all 4 ARM cores
and is implemented in Neon intrinsics. Compiles and runs on other ARM targets, obviously.
Archive: neon_scaler_2023.zip (browse) - C, Assembler
Article: A Fast Image Scaler for ARM Neon
A fast Lanczos-2 image scaler for ARM Neon written in C and intrinsics. The ARMv7 version from 2013 is updated.
Archive: xmasdemo_shaders_2022.zip (browse) - GLSL
Article: The Xmas Demo 2022
All the modified GLSL shaders used in the Xmas Demo 2022.
Runs in 60 fps on the NVidia Jetson AGX Orin.
Archive: xmasdemo_v2_2021.zip (browse) - C, GLSL
Article: The Xmas Demo 2021
Contains full source code that should compile on both Windows and Linux.
It runs in 60 fps on the NVidia Jetson AGX Xavier: Use the tricks mentioned in the article.
Archive: xmasdemo_v5_2020.zip (browse) - C, GLSL
Article: The Xmas Demo 2020
A respin of the 2017 Xmas Demo. Full source code that should compile on both Windows and Linux.
We replaced the fonts, the graphics, the music, and fixed a busload of bugs.
Shaders no longer NVidia only. Tested on a couple of AMD cards.
Archive: recycler_2020.zip (browse) - C
Article: Dreamscape & Eclipse: The Final Cut
Generator for the video
Dreamscape: The Missing Scene [youtube.com].
The supplied video file has to be converted to raw YUV444. FFmpeg can do that.
Archive: Invention_TG94_2020.zip (browse) - 68020 Assembler
Article: Real Programming
40KB Amiga intro released at The Gathering 1994.
I recovered the source in 2020 and fixed an old timing bug, so it should run smoothly now.
On an Amiga 1200 or 4000, of course. Uses the copper to provide a 92x92 12-bit truecolor screen.
Archive: quat_2019.zip (browse) - C++
Article: Real Programming
Generator for the video
Quaternion HQ Tech Demo [youtube.com].
It's the old Julia quaternion code ported to straight C, a couple of bugs fixed, and quality turned up to 11.
Windows multi-threaded version. Probably trivial to port to Linux. Not real-time at all. It's a generator.
Archive: cliffy_2019.zip (browse) - C
Article: Real Programming
Generator for the videos
Superhero [youtube.com]
and
Wobblerne 2 [youtube.com].
YUV dump from the VHS tape not included. It's quite large, around 30 GB. Contact me if you want a copy.
Archive: remix2_2019.zip (browse) - C
Article: Dreamscape & Eclipse: The Final Cut
Generator for the video
Dreamscape & Eclipse: The Final Cut [youtube.com].
The supplied video files have to be converted to raw YUV444. FFmpeg can do that.
File: bayer_2018.c - C
Article: Real Programming
It cuts a lot of corners, literally. Uses AVX2 intrinsics including the ultra-cool maddubs.
Intel occasionally gets their naming right.
Archive: imagetrans_2018.zip (browse) - C, Assembler
Article: Exploiting the Cache: Faster Separable Filters
Introduces a unique and extremely fast method for applying separable filters. A Lanczos-2 picture scaler is used as example.
SSE2 intrinsics and ARM Neon Assembler solutions are provided.
The main work was done in 2013, but the code wasn't recovered until 2018/2019.
Archive: xmasdemo_2017.zip (browse) - C, GLSL
Article: The Xmas Demo 2017
Xmas Demo for the NVidia Jetson TX2 with just 256 GPU cores. In cooperation with Tor Ringstad. The original
where we implemented a series of insane (and sometimes inane) optimizations to make fancy ShaderToy effects run in 60 fps.
Archive: tilegx_int_raytracer_2017.zip (browse) - C
Article: Integer Raytracing on Tilera TILE-Gx
Floating-point raytracing on the TILE-Gx wasn't a big hit. Parts of the fp support can be
combined with integer calculations and classic optimizations from Quake to trace 40 spheres
in 1080p60.
Archive: gpuray_2017.zip (browse) - C, GLSL
Article: GPURay - GPU+CPU Raytracer for NVidia Tegra X1
128 spheres in 1080p60 on the NVidia Tegra TX1.
Among other nifty optimizations,
it exploits the fact that the CPU can predict how things are gonna execute on the GPU for a
cool 60 percent speedup.
File: satan2_2017.cpp - C
Article: Real Programming
A fast way to calculate 2x4 atan2() values using ARM Neon. Also, the name is legendary.
Archive: srtp_aes_revisited_2016.zip (browse) - C
Article: SRTP AES Optimization Revisited
Simplifies the AES CTR mode constant buffer calculations by eliminating the common terms.
It runs much faster, 30 percent to be precise.
Archive: gpu_hacks_2016.zip (browse) - GLSL
Article: GPU Hacks: Fractals, a Raytracer and Raytraced Quaternion Julia Sets
Early GPU work, superseded by newer works. The fractal routine is still valid:
Don't bother with early cutoff and always optimize for max depth. That gives
1080p60 with depth 256 on the TX2 even when all pixels are at max depth. Really.
Archive: raytracer_2016.zip (browse) - C
Article: News: 27 June 2016
Early multi-threaded raytracer that splits the picture into lumps of lines. Wasn't a very bright idea.
Splitting into squares is much better.
Archive: tilegx_aes_openssl_2015.zip (browse) - C
Article: OpenSSL aes_core.c Replacement for Tilera TILE-Gx
I love the TILE-Gx CPU! This thing can replace the OpenSSL AES encrypt function by using the mega-awesome tblidx instructions.
Also, I redid the final pass with a 64-bit merge. This could probably be done on x64-based CPUs too, or any other 64-bit
CPU with such an instruction.
File: scharr_tilegx_2014.c - C
Article: Real Programming
A fast Scharr (Sobel-type) filter for TILE-Gx. Exploits the fact that the vertical filters can be expressed as sums of horizontal filters.
File: average_neon_2014.c - C
Article: Real Programming
Calculate the average of an area quickly using ARM Neon intrinsics.
It does this by masking out the left and right parts, so there's just a single X-loop.
Archive: tilegx_bilinear_par_2014.zip (browse) - C
Article: Bilinear Picture Scaling on Tilera TILE-Gx
First version of the parallelization library with example code: A crap bilinear scaler.
Archive: multiquake_2014.zip (browse) - C
Article: MultiQuake - Quake for Tilera TILE-Gx CPUs
Quake for the TILE-Gx multicore CPU. The 36-core version has no problems running 1 per core.
Archive: multidoom_2014.zip (browse) - C
Article: MultiDoom - Doom for Tilera TILE-Gx CPUs
MultiDoom for a TILE-Gx host like the SX80.
The 36-core version has no problems running 1 Doom per core.
Archive: tilegx_raytracer_2014.zip (browse) - C
Article: Raytracing on Tilera TILE-Gx
Early parallel floating point raytracer for the TILE-Gx. It wasn't very quick.
File: rgb2yuv_tilegx_2013.cpp - C
Article: RGB to YUV conversion on Tilera TILE-Gx
A fast RGB to YUV conversion routine for the TILE-Gx.
File: fix_2013.awk - AWK
Article: Real Programming
AWK script to convert GCC TILE-Gx assembler output to readable code.
File: yuv2rgb_sse2_2012.cpp - C
Article: YUV to RGB Conversion using SSE2
A fast YUV to RGB conversion routine for SSE2. Uses more multipliers for fewer stalls. Or that's the idea.
File: yuv2rgb_tilegx_2012.c - C
Article: YUV to RGB Conversion on Tilera TILE-Gx
A fast YUV to RGB conversion routine for the TILE-Gx.
File: box_sse2_2012.cpp - C
Article: A look at Halide's SSE2 3x3 Box Filter
Halide's 3x3 box filter example code is broken, and their test systems too old. Here's my take on it.
File: aes_tilegx_2011.c - C
Article: AES Optimization on Tilera TILE-Gx
Superseded by later TILE-Gx AES releases. Doesn't have the nifty solution to the last round problem.
File: srtp_aes_2010.c - C
Article: SRTP AES Optimization
Superseded by later SRTP AES releases, but it was pretty neat in 2010.
Unfortunately, I haven't been able to recover much from this period. Pictures of some fun stuff we did for TANDBERG can be found here: TANDBERG Secrets.
Archive: homepilotsrc_1999.zip (browse) - C, x86 assembler
Article: PCTVNet HomePilot Set-Top Box
I made various drivers and programs for the HomePilot set-top box. This is a collection of assorted stuff I had
checked out on my last day there. (Code archaeologists may find my MIDI player interesting since it uses a
completely different approach.)
Archive: Logla_02_1996.zip (browse) - C++
A compiler that generates 68000 code for a simple programming language.
Written in C++ version 2 in 1996.
C++ version 2 was the last version that was useful. They really crapped it up after that.
I compiled the compiler (ha) and tested it in 2020. It still works.
Archive: Speed_TG95.zip (browse) - C, C++ and 68020 assembler
Article: Real Programming
64KB Amiga intro released at The Gathering 1995. Features a raytracer written by a friend of mine.
Archives: NoTemptations_part1.lha, NoTemptations_part1.zip (browse) - 68000 Assembler
Article: Triumph Amiga Demos and Source Code
I didn't write any of this! The code was done by Warp and Smeagol.
All articles that are not labeled with a specific license are covered by this: Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0).
All nonderivative source code files that are not labeled with a specific license are covered by this:
CC0 1.0 Universal (CC0 1.0).
Derivative works retain their original license.
All images that are not labeled with a specific license are covered by this: CC0 1.0 Universal (CC0 1.0).
All videos that are hosted on YouTube are bound by the "Standard YouTube License". I am unable to find the actual license text. It's probably a good idea to follow it. Caveat lector.
This page uses a modified panel from the webcomic
Abstruse Goose, strip 483:
bad boy
Credits: "The Abstruse Goose comic is a subsidiary of the powerful and evil Abstruse Goose Corporation."
License:
Attribution-NonCommercial 3.0 United States (CC BY-NC 3.0 US)