Al Green
Al Green's Brother
Home -- News -- Articles -- Books -- Source Code -- Videos -- Xmas -- LinkedIn -- About

Source Code

Bad Boy

2024

Neon Average of Area Calculation Using a Single X-Loop v2

File: average_neon_2024.c - C
Article: News 17 April 2024
An update of the old 2014 version with cleaned up mask calculation and support for 1920x1080.

2023

The Xmas Demo 2023

Archive: xmasdemo_src_2023.zip (browse) - C, GLSL
Archive: xmasdemo_data_2023.zip (browse)
Article: The Xmas Demo 2023
Should compile on both Windows and Linux. Runs in 1080p60 on the NVidia Jetson AGX Orin Developer Kit.

Edgehog: 1080p60 Nano Fractals

Archive: nano_fractals_2023.zip (browse) - C, Assembler, GLSL
Article: Edgehog: 1080p60 Nano Fractals
1080p60 depth 256 fractals on a dainty Jetson Nano. The Edgehog method runs in parallel on all 4 ARM cores and is implemented in Neon intrinsics. Compiles and runs on other ARM targets, obviously.

A Fast 4-Tap Image Scaler for ARM Neon

Archive: neon_scaler_2023.zip (browse) - C, Assembler
Article: A Fast Image Scaler for ARM Neon
A fast Lanczos-2 image scaler for ARM Neon written in C and intrinsics. The ARMv7 version from 2013 is updated.

2022

The Xmas Demo 2022 Shaders

Archive: xmasdemo_shaders_2022.zip (browse) - GLSL
Article: The Xmas Demo 2022
All the modified GLSL shaders used in the Xmas Demo 2022. Runs in 60 fps on the NVidia Jetson AGX Orin.

2021

The Xmas Demo 2021

Archive: xmasdemo_v2_2021.zip (browse) - C, GLSL
Article: The Xmas Demo 2021
Contains full source code that should compile on both Windows and Linux. It runs in 60 fps on the NVidia Jetson AGX Xavier: Use the tricks mentioned in the article.

2020

The Xmas Demo 2020

Archive: xmasdemo_v5_2020.zip (browse) - C, GLSL
Article: The Xmas Demo 2020
A respin of the 2017 Xmas Demo. Full source code that should compile on both Windows and Linux. We replaced the fonts, the graphics, the music, and fixed a busload of bugs. Shaders no longer NVidia only. Tested on a couple of AMD cards.

Recycler: Generator for The Missing Scene

Archive: recycler_2020.zip (browse) - C
Article: Dreamscape & Eclipse: The Final Cut
Generator for the video Dreamscape: The Missing Scene [youtube.com]. The supplied video file has to be converted to raw YUV444. FFmpeg can do that.

Invention (TG94) Bugfix

Archive: Invention_TG94_2020.zip (browse) - 68020 Assembler
Article: Real Programming
40KB Amiga intro released at The Gathering 1994. I recovered the source in 2020 and fixed an old timing bug, so it should run smoothly now. On an Amiga 1200 or 4000, of course. Uses the copper to provide a 92x92 12-bit truecolor screen.

2019

Julia Quaternions for Multi-Threaded CPUs

Archive: quat_2019.zip (browse) - C++
Article: Real Programming
Generator for the video Quaternion HQ Tech Demo [youtube.com]. It's the old Julia quaternion code ported to straight C, a couple of bugs fixed, and quality turned up to 11. Windows multi-threaded version. Probably trivial to port to Linux. Not real-time at all. It's a generator.

Cliffy: Generator for Superhero and Wobblerne 2

Archive: cliffy_2019.zip (browse) - C
Article: Real Programming
Generator for the videos Superhero [youtube.com] and Wobblerne 2 [youtube.com]. YUV dump from the VHS tape not included. It's quite large, around 30 GB. Contact me if you want a copy.

Remix 2: Generator for The Final Cut

Archive: remix2_2019.zip (browse) - C
Article: Dreamscape & Eclipse: The Final Cut
Generator for the video Dreamscape & Eclipse: The Final Cut [youtube.com]. The supplied video files have to be converted to raw YUV444. FFmpeg can do that.

2018

AVX2 Ultra-fast Bayer to I420 Conversion

File: bayer_2018.c - C
Article: Real Programming
It cuts a lot of corners, literally. Uses AVX2 intrinsics including the ultra-cool maddubs. Intel occasionally gets their naming right.

Separable Filters Done Right

Archive: imagetrans_2018.zip (browse) - C, Assembler
Article: Exploiting the Cache: Faster Separable Filters
Introduces a unique and extremely fast method for applying separable filters. A Lanczos-2 picture scaler is used as example. SSE2 intrinsics and ARM Neon Assembler solutions are provided. The main work was done in 2013, but the code wasn't recovered until 2018/2019.

2017

The Xmas Demo 2017

Archive: xmasdemo_2017.zip (browse) - C, GLSL
Article: The Xmas Demo 2017
Xmas Demo for the NVidia Jetson TX2 with just 256 GPU cores. In cooperation with Tor Ringstad. The original where we implemented a series of insane (and sometimes inane) optimizations to make fancy ShaderToy effects run in 60 fps.

TILE-Gx Integer Raytracer

Archive: tilegx_int_raytracer_2017.zip (browse) - C
Article: Integer Raytracing on Tilera TILE-Gx
Floating-point raytracing on the TILE-Gx wasn't a big hit. Parts of the fp support can be combined with integer calculations and classic optimizations from Quake to trace 40 spheres in 1080p60.

GPURay

Archive: gpuray_2017.zip (browse) - C, GLSL
Article: GPURay - GPU+CPU Raytracer for NVidia Tegra X1
128 spheres in 1080p60 on the NVidia Tegra TX1. Among other nifty optimizations, it exploits the fact that the CPU can predict how things are gonna execute on the GPU for a cool 60 percent speedup.

Neon Fast SIMD Atan Calculation

File: satan2_2017.cpp - C
Article: Real Programming
A fast way to calculate 2x4 atan2() values using ARM Neon. Also, the name is legendary.

2016

A Wicked Fast (SRTP) AES CTR Mode Implementation

Archive: srtp_aes_revisited_2016.zip (browse) - C
Article: SRTP AES Optimization Revisited
Simplifies the AES CTR mode constant buffer calculations by eliminating the common terms. It runs much faster, 30 percent to be precise.

Some GPU Hacks and a New Way to Brute-Force Fractal Calculations

Archive: gpu_hacks_2016.zip (browse) - GLSL
Article: GPU Hacks: Fractals, a Raytracer and Raytraced Quaternion Julia Sets
Early GPU work, superseded by newer works. The fractal routine is still valid: Don't bother with early cutoff and always optimize for max depth. That gives 1080p60 with depth 256 on the TX2 even when all pixels are at max depth. Really.

A Crap Multi-Threaded Raytracer

Archive: raytracer_2016.zip (browse) - C
Article: News 27 June 2016
Early multi-threaded raytracer that splits the picture into lumps of lines. Wasn't a very bright idea. Splitting into squares is much better.

2015

TILE-Gx OpenSSL AES Replacement and a New Last Round

Archive: tilegx_aes_openssl_2015.zip (browse) - C
Article: OpenSSL aes_core.c Replacement for Tilera TILE-Gx
I love the TILE-Gx CPU! This thing can replace the OpenSSL AES encrypt function by using the mega-awesome tblidx instructions. Also, I redid the final pass with a 64-bit merge. This could probably be done on x64-based CPUs too, or any other 64-bit CPU with such an instruction.

2014

TILE-Gx Sobel Filter Alternative Solution

File: scharr_tilegx_2014.c - C
Article: Real Programming
A fast Scharr (Sobel-type) filter for TILE-Gx. Exploits the fact that the vertical filters can be expressed as sums of horizontal filters.

Neon Average of Area Calculation Using a Single X-Loop

File: average_neon_2014.c - C
Article: Real Programming
Calculate the average of an area quickly using ARM Neon intrinsics. It does this by masking out the left and right parts, so there's just a single X-loop.

TILE-Gx Bilinear Scaler and Parallelization Library

Archive: tilegx_bilinear_par_2014.zip (browse) - C
Article: Bilinear Picture Scaling on Tilera TILE-Gx
First version of the parallelization library with example code: A crap bilinear scaler.

TILE-Gx MultiQuake

Archive: multiquake_2014.zip (browse) - C
Article: MultiQuake - Quake for Tilera TILE-Gx CPUs
Quake for the TILE-Gx multicore CPU. The 36-core version has no problems running 1 per core.

TILE-Gx MultiDoom

Archive: multidoom_2014.zip (browse) - C
Article: MultiDoom - Doom for Tilera TILE-Gx CPUs
MultiDoom for a TILE-Gx host like the SX80. The 36-core version has no problems running 1 Doom per core.

TILE-Gx Floating-Point Raytracer

Archive: tilegx_raytracer_2014.zip (browse) - C
Article: Raytracing on Tilera TILE-Gx
Early parallel floating point raytracer for the TILE-Gx. It wasn't very quick.

2013

TILE-Gx RGB to YUV

File: rgb2yuv_tilegx_2013.cpp - C
Article: RGB to YUV conversion on Tilera TILE-Gx
A fast RGB to YUV conversion routine for the TILE-Gx.

AWK FTW

File: fix_2013.awk - AWK
Article: Real Programming
AWK script to convert GCC TILE-Gx assembler output to readable code.

2012

SSE2 YUV to RGB Alternative Solution

File: yuv2rgb_sse2_2012.cpp - C
Article: YUV to RGB Conversion using SSE2
A fast YUV to RGB conversion routine for SSE2. Uses more multipliers for fewer stalls. Or that's the idea.

TILE-Gx YUV to RGB

File: yuv2rgb_tilegx_2012.c - C
Article: YUV to RGB Conversion on Tilera TILE-Gx
A fast YUV to RGB conversion routine for the TILE-Gx.

SSE2 Halide's Box Filter Code is Crap

File: box_sse2_2012.cpp - C
Article: A look at Halide's SSE2 3x3 Box Filter
Halide's 3x3 box filter example code is broken, and their test systems too old. Here's my take on it.

2011

TILE-Gx Fast AES Routine

File: aes_tilegx_2011.c - C
Article: AES Optimization on Tilera TILE-Gx
Superseded by later TILE-Gx AES releases. Doesn't have the nifty solution to the last round problem.

2010

The First Fast (SRTP) AES CTR Mode Implementation

File: srtp_aes_2010.c - C
Article: SRTP AES Optimization
Superseded by later SRTP AES releases, but it was pretty neat in 2010.

2000-2009

Unfortunately, I haven't been able to recover much from this period. Pictures of some fun stuff we did for TANDBERG can be found here: TANDBERG Secrets.

The 90s

PCTVNet HomePilot Code and Schematics

Archive: homepilotsrc_1999.zip (browse) - C, x86 assembler
Article: PCTVNet HomePilot Set-Top Box
I made various drivers and programs for the HomePilot set-top box. This is a collection of assorted stuff I had checked out on my last day there. (Code archaeologists may find my MIDI player interesting since it uses a completely different approach.)

A Crap Compiler

Archive: Logla_02_1996.zip (browse) - C++
A compiler that generates 68000 code for a simple programming language. Written in C++ version 2 in 1996. C++ version 2 was the last version that was useful. They really crapped it up after that. I compiled the compiler (ha) and tested it in 2020. It still works.

Speed (TG95)

Archive: Speed_TG95.zip (browse) - C, C++ and 68020 assembler
Article: Real Programming
64KB Amiga intro released at The Gathering 1995. Features a raytracer written by a friend of mine.

No Temptations Part 1

Archives: NoTemptations_part1.lha, NoTemptations_part1.zip (browse) - 68000 Assembler
Article: Triumph Amiga Demos and Source Code
I didn't write any of this! The code was done by Warp and Smeagol.

General Licensing Information

All articles and source code files published on this website should have their licensing requirements clearly stated in the document. If licensing requirements are missing or incomplete, contact me and I'll fix it asap.

I do my best to follow licensing requirements on items used on this website, be it source code, images, or quotes from articles. If you mean that the licensing requirements for a specific item are not met and (this is important) you are the original artist or author, please contact me to have this fixed asap.

Licensed Items

This page uses a modified panel from the webcomic Abstruse Goose, strip 483: bad boy
Credits: "The Abstruse Goose comic is a subsidiary of the powerful and evil Abstruse Goose Corporation."
License: Attribution-NonCommercial 3.0 United States (CC BY-NC 3.0 US)


Ekte Programmering Norwegian flag
American flag Real Programming
Ignorantus AS