File: average_neon_2024.c - C
Article: News 17 April 2024
An update of the old 2014 version with cleaned up mask calculation and support for 1920x1080.
Archive: xmasdemo_src_2023.zip (browse) - C, GLSL
Archive: xmasdemo_data_2023.zip (browse)
Article: The Xmas Demo 2023
Should compile on both Windows and Linux. Runs in 1080p60 on the NVidia Jetson AGX Orin Developer
Kit.
Archive: nano_fractals_2023.zip (browse) - C, Assembler, GLSL
Article: Edgehog: 1080p60 Nano Fractals
1080p60 depth 256 fractals on a dainty Jetson Nano. The Edgehog method runs in parallel on all 4 ARM
cores and is implemented in Neon intrinsics. Compiles and runs on other ARM targets, obviously.
Archive: neon_scaler_2023.zip (browse) - C, Assembler
Article: A Fast Image Scaler for ARM Neon
A fast Lanczos-2 image scaler for ARM Neon written in C and intrinsics. The ARMv7 version from 2013
is updated.
Archive: xmasdemo_shaders_2022.zip (browse) - GLSL
Article: The Xmas Demo 2022
All the modified GLSL shaders used in the Xmas Demo 2022. Runs in 60 fps on the NVidia Jetson AGX
Orin.
Archive: xmasdemo_v2_2021.zip (browse) - C, GLSL
Article: The Xmas Demo 2021
Contains full source code that should compile on both Windows and Linux. It runs in 60 fps on the
NVidia Jetson AGX Xavier: Use the tricks mentioned in the article.
Archive: xmasdemo_v5_2020.zip (browse) - C, GLSL
Article: The Xmas Demo 2020
A respin of the 2017 Xmas Demo. Full source code that should compile on both Windows and Linux. We
replaced the fonts, the graphics, the music, and fixed a busload of bugs. Shaders no longer NVidia
only. Tested on a couple of AMD cards.
Archive: recycler_2020.zip (browse) - C
Article: Dreamscape & Eclipse: The Final Cut
Generator for the video
Dreamscape: The Missing Scene [youtube.com].
The supplied video file has to be converted to raw YUV444. FFmpeg can do that.
Archive: Invention_TG94_2020.zip (browse) - 68020 Assembler
Article: Real Programming
40KB Amiga intro released at The Gathering 1994. I recovered the source in 2020 and fixed an old
timing bug, so it should run smoothly now. On an Amiga 1200 or 4000, of course. Uses the copper to
provide a 92x92 12-bit truecolor screen.
Archive: quat_2019.zip (browse) - C++
Article: Real Programming
Generator for the video
Quaternion HQ Tech Demo [youtube.com].
It's the old Julia quaternion code ported to straight C, a couple of bugs fixed, and quality turned
up to 11. Windows multi-threaded version. Probably trivial to port to Linux. Not real-time at all.
It's a generator.
Archive: cliffy_2019.zip (browse) - C
Article: Real Programming
Generator for the videos
Superhero [youtube.com]
and
Wobblerne 2 [youtube.com].
YUV dump from the VHS tape not included. It's quite large, around 30 GB. Contact me if you want a
copy.
Archive: remix2_2019.zip (browse) - C
Article: Dreamscape & Eclipse: The Final Cut
Generator for the video
Dreamscape & Eclipse: The Final Cut [youtube.com].
The supplied video files have to be converted to raw YUV444. FFmpeg can do that.
File: bayer_2018.c - C
Article: Real Programming
It cuts a lot of corners, literally. Uses AVX2 intrinsics including the ultra-cool maddubs. Intel
occasionally gets their naming right.
Archive: imagetrans_2018.zip (browse) - C, Assembler
Article: Exploiting the Cache: Faster Separable Filters
Introduces a unique and extremely fast method for applying separable filters. A Lanczos-2 picture
scaler is used as example. SSE2 intrinsics and ARM Neon Assembler solutions are provided. The main
work was done in 2013, but the code wasn't recovered until 2018/2019.
Archive: xmasdemo_2017.zip (browse) - C, GLSL
Article: The Xmas Demo 2017
Xmas Demo for the NVidia Jetson TX2 with just 256 GPU cores. In cooperation with Tor Ringstad. The
original where we implemented a series of insane (and sometimes inane) optimizations to make fancy
ShaderToy effects run in 60 fps.
Archive: tilegx_int_raytracer_2017.zip (browse) - C
Article: Integer Raytracing on Tilera TILE-Gx
Floating-point raytracing on the TILE-Gx wasn't a big hit. Parts of the fp support can be combined
with integer calculations and classic optimizations from Quake to trace 40 spheres in 1080p60.
Archive: gpuray_2017.zip (browse) - C, GLSL
Article: GPURay - GPU+CPU Raytracer for NVidia Tegra X1
128 spheres in 1080p60 on the NVidia Tegra TX1. Among other nifty optimizations, it exploits the
fact that the CPU can predict how things are gonna execute on the GPU for a cool 60 percent speedup.
File: satan2_2017.cpp - C
Article: Real Programming
A fast way to calculate 2x4 atan2() values using ARM Neon. Also, the name is legendary.
Archive: srtp_aes_revisited_2016.zip (browse) - C
Article: SRTP AES Optimization Revisited
Simplifies the AES CTR mode constant buffer calculations by eliminating the common terms. It runs
much faster, 30 percent to be precise.
Archive: gpu_hacks_2016.zip (browse) - GLSL
Article: GPU Hacks: Fractals, a Raytracer and Raytraced Quaternion Julia Sets
Early GPU work, superseded by newer works. The fractal routine is still valid: Don't bother with
early cutoff and always optimize for max depth. That gives 1080p60 with depth 256 on the TX2 even
when all pixels are at max depth. Really.
Archive: raytracer_2016.zip (browse) - C
Article: News 27 June 2016
Early multi-threaded raytracer that splits the picture into lumps of lines. Wasn't a very bright
idea. Splitting into squares is much better.
Archive: tilegx_aes_openssl_2015.zip (browse) - C
Article: OpenSSL aes_core.c Replacement for Tilera TILE-Gx
I love the TILE-Gx CPU! This thing can replace the OpenSSL AES encrypt function by using the
mega-awesome tblidx instructions. Also, I redid the final pass with a 64-bit merge. This could
probably be done on x64-based CPUs too, or any other 64-bit CPU with such an instruction.
File: scharr_tilegx_2014.c - C
Article: Real Programming
A fast Scharr (Sobel-type) filter for TILE-Gx. Exploits the fact that the vertical filters can be
expressed as sums of horizontal filters.
File: average_neon_2014.c - C
Article: Real Programming
Calculate the average of an area quickly using ARM Neon intrinsics. It does this by masking out the
left and right parts, so there's just a single X-loop.
Archive: tilegx_bilinear_par_2014.zip (browse) - C
Article: Bilinear Picture Scaling on Tilera TILE-Gx
First version of the parallelization library with example code: A crap bilinear scaler.
Archive: multiquake_2014.zip (browse) - C
Article: MultiQuake - Quake for Tilera TILE-Gx CPUs
Quake for the TILE-Gx multicore CPU. The 36-core version has no problems running 1 per core.
Archive: multidoom_2014.zip (browse) - C
Article: MultiDoom - Doom for Tilera TILE-Gx CPUs
MultiDoom for a TILE-Gx host like the SX80. The 36-core version has no problems running 1 Doom per
core.
Archive: tilegx_raytracer_2014.zip (browse) - C
Article: Raytracing on Tilera TILE-Gx
Early parallel floating point raytracer for the TILE-Gx. It wasn't very quick.
File: rgb2yuv_tilegx_2013.cpp - C
Article: RGB to YUV conversion on Tilera TILE-Gx
A fast RGB to YUV conversion routine for the TILE-Gx.
File: fix_2013.awk - AWK
Article: Real Programming
AWK script to convert GCC TILE-Gx assembler output to readable code.
File: yuv2rgb_sse2_2012.cpp - C
Article: YUV to RGB Conversion using SSE2
A fast YUV to RGB conversion routine for SSE2. Uses more multipliers for fewer stalls. Or that's
the idea.
File: yuv2rgb_tilegx_2012.c - C
Article: YUV to RGB Conversion on Tilera TILE-Gx
A fast YUV to RGB conversion routine for the TILE-Gx.
File: box_sse2_2012.cpp - C
Article: A look at Halide's SSE2 3x3 Box Filter
Halide's 3x3 box filter example code is broken, and their test systems too old. Here's my take on
it.
File: aes_tilegx_2011.c - C
Article: AES Optimization on Tilera TILE-Gx
Superseded by later TILE-Gx AES releases. Doesn't have the nifty solution to the last round problem.
File: srtp_aes_2010.c - C
Article: SRTP AES Optimization
Superseded by later SRTP AES releases, but it was pretty neat in 2010.
Unfortunately, I haven't been able to recover much from this period. Pictures of some fun stuff we did for TANDBERG can be found here: TANDBERG Secrets.
Archive: homepilotsrc_1999.zip (browse) - C, x86 assembler
Article: PCTVNet HomePilot Set-Top Box
I made various drivers and programs for the HomePilot set-top box. This is a collection of assorted
stuff I had checked out on my last day there. (Code archaeologists may find my MIDI player
interesting since it uses a completely different approach.)
Archive: Logla_02_1996.zip (browse) - C++
A compiler that generates 68000 code for a simple programming language.
Written in C++ version 2 in 1996.
C++ version 2 was the last version that was useful. They really crapped it up after that. I compiled
the compiler (ha) and tested it in 2020. It still works.
Archive: Speed_TG95.zip (browse) - C, C++ and 68020 assembler
Article: Real Programming
64KB Amiga intro released at The Gathering 1995. Features a raytracer written by a friend of mine.
Archives: NoTemptations_part1.lha, NoTemptations_part1.zip (browse) - 68000 Assembler
Article: Triumph Amiga Demos and Source Code
I didn't write any of this! The code was done by Warp and Smeagol.
All articles and source code files published on this website should have their licensing requirements clearly stated in the document. If licensing requirements are missing or incomplete, contact me and I'll fix it asap.
I do my best to follow licensing requirements on items used on this website, be it source code, images, or quotes from articles. If you mean that the licensing requirements for a specific item are not met and (this is important) you are the original artist or author, please contact me to have this fixed asap.
This page uses a modified panel from the webcomic
Abstruse Goose, strip 483:
bad boy
Credits: "The Abstruse Goose comic is a subsidiary of the powerful and evil Abstruse Goose Corporation."
License:
Attribution-NonCommercial 3.0 United States (CC BY-NC 3.0 US)