Al Green
Al Green's Brother
Home -- News -- Archive -- Source Code -- Videos -- LinkedIn -- About

Source Code

2022

The Xmas Demo 2022 Shaders

Article: The Xmas Demo 2022 - C, GLSL
Archive: xmasdemo_shaders_2022.zip
Files: xmasdemo_shaders_2022/
All the modified GLSL shaders used in the Xmas Demo 2022. Runs in 60 fps on the NVidia Jetson AGX Orin.

2021

The Xmas Demo 2021

Article: The Xmas Demo 2021 - C, GLSL
Archive: xmasdemo_v2_2021.zip
Contains full source code that should compile on both Windows and Linux. It runs in 60 fps on the NVidia Jetson AGX Xavier: Use the tricks mentioned in the article.

2020

The Xmas Demo 2020

Article: The Xmas Demo 2020 Edition - C, GLSL
Archive: xmasdemo_v5_2020.zip
A respin of the 2017 Xmas Demo. Full source code that should compile on both Windows and Linux. We replaced the fonts, the graphics, the music, and fixed a busload of bugs. Shaders no longer NVidia only. Tested on a couple of AMD cards.

Recycler

Article: Dreamscape & Eclipse: The Final Cut - C
Archive: recycler_2020.zip
Generator for the video Dreamscape: The Missing Scene [youtube.com]. The supplied video file has to be converted to raw YUV444. FFmpeg can do that.

Invention (TG94) Bugfix

Article: Real Programming - 68020 Assembler
Archive: Invention_TG94_2020.zip
Files: Invention_TG94_2020/
40KB Amiga intro released at The Gathering 1994. I recovered the source in 2020 and fixed an old timing bug, so it should run smoothly now. On an Amiga 1200 or 4000, of course.

2019

Separable Filters Done Right

Article: A Different Method for Image Transformations - C, SSE2 intrinsics, ARM Neon assembler
Archive: imagetrans_2019.zip
Files: imagetrans_2019/
Demonstrates a unique and extremely fast method for applying separable filters. A Lanczos-2 picture scaler is used as example. SSE2 intrinsics and ARM Neon Assembler solutions provided. The main work was done in 2013, but the code wasn't recovered until 2018/2019.

Julia Quaternions for Multi-Threaded CPUs

Article: Real Programming - C++
Archive: quat_2019.zip
Generator for the video Quaternion HQ Tech Demo [youtube.com]. It's the old Julia quaternion code ported to straight C, a couple of bugs fixed, and quality turned up to 11. Windows multi-threaded version. Probably trivial to port to Linux. Not real-time at all. It's a generator.

Cliffy

Article: Real Programming - C
Archive: cliffy_2019.zip
Generator for the videos Superhero [youtube.com] and Wobblerne 2 [youtube.com]. YUV dump from the VHS tape not included. It's quite large, around 30 GB. Contact me if you want a copy.

Remix 2

Article: Dreamscape & Eclipse: The Final Cut - C
Archive: remix2_2019.zip
Generator for the video Dreamscape & Eclipse: The Final Cut [youtube.com]. The supplied video files have to be converted to raw YUV444. FFmpeg can do that.

2018

AVX2 Ultra-fast Bayer to I420 Conversion

Article: Real Programming - C, AVX2 intrinsics
File: bayer_2018.c
It cuts a lot of corners, literally. Uses the ultra-cool maddubs intrinsic.

2017

The Xmas Demo 2017

Article: Nvidia Jetson TX2 Xmas Demo 2017 - C, ARM Neon intrinsics, GLSL
Archive: xmasdemo_2017.zip
Files: xmasdemo_2017/
Xmas Demo for the NVidia Jetson TX2 with just 256 GPU cores. In cooperation with Tor Ringstad. The original where we implemented a series of insane (and sometimes inane) optimizations to make fancy ShaderToy effects run in 60 fps.

TILE-Gx Integer Raytracer

Article: Integer Raytracing on Tilera TILE-Gx - C, TILE-Gx intrinsics
Archive: tilegx_int_raytracer_2017.zip
Files: tilegx_int_raytracer_2017/
Floating-point raytracing on the TILE-Gx wasn't a big hit. Parts of the fp support can be combined with integer calculations and classic optimizations from Quake to trace 40 spheres in 1080p60.

GPURay

Article: GPURay - GPU+CPU Raytracer for NVidia Tegra X1 - C, GLSL
Archive: gpuray_2017.zip
Files: gpuray_2017/
128 spheres in 1080p60 on the NVidia Tegra TX1. Among other nifty optimizations, it exploits the fact that the CPU can predict how things are gonna execute on the GPU for a cool 60 percent speedup.

NEON Fast SIMD Atan Calculation

Article: Real Programming - C, ARM Neon intrinsics
File: satan2_2017.cpp
A fast way to calculate 2x4 atan2() values using ARM Neon. Also, the name is legendary.

2016

Snappy SRTP AES CTR Mode Implementation

Article: SRTP AES Optimization Revisited - C
Archive: srtp_aes_revisited_2016.zip
Simplifies the AES CTR mode constant buffer calculations by eliminating the common terms. It runs much faster, 30 percent to be precise.

Some GPU Hacks and a New Way to Brute-Force Fractal Calculations

Article: GPU Hacks: Fractals, a Raytracer and Raytraced Quaternion Julia Sets - GLSL
Archive: gpu_hacks_2016.zip
Files: gpu_hacks_2016/
Early GPU work, superseded by newer works. The fractal routine is still valid: Don't bother with early cutoff and always optimize for max depth: Gives 1080p60 with depth 256 on the TX1 even when all pixels are at max depth. Really.

A Crap Multi-Threaded Raytracer

Article: News: 27 June 2016 - C
Archive: raytracer_2016.zip
Early multi-threaded raytracer that splits the picture into lumps of lines. Wasn't a very bright idea. Splitting into squares is much better.

2015

TILE-Gx OpenSSL AES Replacement and a New Last Round

Article: OpenSSL aes_core.c Replacement for Tilera TILE-Gx - C, TILE-Gx intrinsics
Archive: tilegx_aes_openssl_2015.zip
I love the TILE-Gx CPU! This thing can replace the OpenSSL AES encrypt function by using the mega-awesome tblidx instructions. Also, I redid the final pass with a 64-bit merge. This could probably be done on x64-based CPUs too, or any other 64-bit CPU with such an instruction.

2014

TILE-Gx Sobel Filter Alternative Solution

Article: Real Programming- C, TILE-Gx intrinsics
File: scharr_tilegx_2014.c
A fast Scharr (Sobel-type) filter for TILE-Gx. Exploits the fact that the vertical filters can be expressed as sums of horizontal filters.

NEON Average of Area Calculation Using a Single X-Loop

Article: Real Programming- C, ARM Neon intrinsics
File: average_neon_2014.c
Calculate the average of an area quickly using ARM Neon intrinsics. It does this by masking out the left and right parts, so there's just a single X-loop.

TILE-Gx Bilinear Scaler and Parallelization Library

Article: Bilinear Picture Scaling on Tilera TILE-Gx - C, TILE-Gx intrinsics
Archive: tilegx_bilinear_par_2014.zip
First version of the parallelization library with example code: A crap bilinear scaler.

TILE-Gx MultiQuake

Article: MultiQuake - Quake for Tilera TILE-Gx CPUs - C, TILE-Gx intrinsics
Archive: multiquake_2014.zip
Quake for the TILE-Gx multicore CPU. The 36-core version has no problems running 1 per core.

TILE-Gx MultiDoom

Article: MultiDoom - Doom for Tilera TILE-Gx CPUs - C, TILE-Gx intrinsics
Archive: multidoom_2014.zip
MultiDoom for a TILE-Gx host like the SX80. The 36-core version has no problems running 1 Doom per core.

TILE-Gx Floating-Point Raytracer

Article: Raytracing on Tilera TILE-Gx - C, TILE-Gx intrinsics
Archive: tilegx_raytracer_2014.zip
Early parallel floating point raytracer for the TILE-Gx. It wasn't very quick.

2013

TILE-Gx RGB to YUV

Article: RGB to YUV conversion on Tilera TILE-Gx - C, TILE-Gx intrinsics
File: rgb2yuv_tilegx_2013.cpp
A fast RGB to YUV conversion routine for the TILE-Gx.

AWK FTW

Article: Real Programming - AWK
File: fix_2013.awk
AWK script to convert GCC TILE-Gx assembler output to readable code.

TILE-Gx MultiDoom for Expansion Card

Article: MultiDoom - Doom for Tilera TILE-Gx CPUs - C, TILE-Gx intrinsics
Archive: multidoomgx_2013.zip
MultiDoom for the TILE-Gx PCI Express card.

2012

SSE2 YUV to RGB Alternative Solution

Article: YUV to RGB Conversion using SSE2 - C, SSE2 intrinsics
File: yuv2rgb_sse2_2012.cpp
A fast YUV to RGB conversion routine for SSE2. Uses more multipliers for fewer stalls. Or that's the idea.

TILE-Gx YUV to RGB

Article: YUV to RGB Conversion on Tilera TILE-Gx - C, TILE-Gx intrinsics
File: yuv2rgb_tilegx_2012.c
A fast YUV to RGB conversion routine for the TILE-Gx.

SSE2 Halide's Box Filter Code is Crap

Article: A look at Halide's SSE2 3x3 Box Filter - C, SSE2 intrinsics
File: box_sse2_2012.cpp
Halide's 3x3 box filter example code is broken, and their test systems too old. Here's my take on it.

2011

TILE-Gx Fast AES Routine

Article: AES Optimization on Tilera TILE-Gx - C, TILE-Gx intrinsics
File: aes_tilegx_2011.c
Superseded by later TILE-Gx AES releases. Doesn't have the nifty solution to the last round problem.

2010

The First Fast SRTP AES CTR Mode Implementation

Article: SRTP AES Optimization - C
File: srtp_aes_2010.c
Superseded by later SRTP AES releases, but it was pretty neat in 2010.

2009

TILE64 MultiDoom for Expansion Card

Article: MultiDoom - Doom for Tilera TILE-Gx CPUs - C, TILE-Gx intrinsics
Archive: multidoom_tile64_2009.zip
Early MultiDoom for the TILE64 PCI Express card.

2000-2008

Unfortunately, I haven't been able to recover much from this period. Pictures of some fun stuff we did for TANDBERG can be found here: TANDBERG Secrets.

The 90s

PCTVNet HomePilot Code and Schematics

Article: Source Code and Schematics for PCTVNet HomePilot Set-Top Box - C, x86 assembler
Archive: homepilotsrc_1999.zip
Assorted stuff I had checked out on my last day there. I suggest reading the article for more details.

A Crap Compiler

C++
Archive: Logla_02_1996.zip
Files: Logla_02_1996/
A compiler that generates 68000 code for a simple programming language. Written in C++ version 2 in 1996. C++ version 2 was the last version that was useful. They really crapped it up after that. I compiled the compiler (ha) and tested it in 2020. It still works.

Speed (TG95)

C, C++ and 68020 assembler
Archive: Speed_TG95.zip
64KB Amiga intro released at The Gathering 1995. Features a raytracer written by a friend of mine.

No Temptations Part 1

68000 Assembler
Archive: NoTemptations_part1.lha
I didn't write any of this! The code was done by Warp and Smeagol.

Source Code License

All nonderivative source code files published on this website that are not labeled with a specific license are covered by this: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license. Here's a short, and by no means complete, summary:

The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.

Derivative works retain their original license.


Ekte Programmering Norwegian flag
American flag Real Programming
Ignorantus AS