Al Green
Al Green's Brother
Home -- News -- Articles -- Books -- Source Code -- Videos -- Xmas -- LinkedIn -- About


under the hood


The Xmas Demo 2023 (20 Dec 2023)

The Xmas Demo 2023 for the NVidia Jetson AGX Orin Developer Kit. How would the TX2-based 2017 and 2020 Xmas Demos look on 8 times more powerful hardware? We retweaked all the old shaders, replaced some crap ones, and tied it all together in a 4 minutes long hiqh quality video. Full source code included.

Edgehog: 1080p60 Nano Fractals (26 June 2023)

Presenting Edgehog: A method for doing fractal rectangle checking quickly on multiple ARM Neon cores. The output is rendered by a GPU using the Depth First Algorithm (DFA) described in the article "GPU Hacks" (Corneliusen 2016, 2017). This makes it possible to render 1080p60 depth 256 fractals on a dainty NVidia Jetson Nano. Full source code included.

A Fast Image Scaler for ARM Neon (15 May 2023, updated 25 Oct 2023)

Introducing a fast Lanczos-2 image scaler for ARM Neon written in C and intrinsics. It uses a unique method for applying separable filters described in the article "Exploiting the Cache: Faster Separable Filters" (Corneliusen 2018). An ARMv7 Neon Assembler implementation from 2014 gets a makeover. Full source code included.

Real Programming Condensed (9 Feb 2023, updated 18 Mar 2023)

It's been two years since the launch of the discordant book Real Programming. This 10-page article sums up all the main themes and looks at what has happened since then.


The Xmas Demo 2022 (22 Dec 2022)

The Xmas Demo 2022 for the NVidia Jetson AGX Orin Developer Kit. Shader code available for download. A more polished version of last year's demo that exploits the extra power in the Orin.


The Xmas Demo 2021 (23 Dec 2021, updated 7 Mar 2022)

The Xmas Demo 2021 for the NVidia Jetson AGX Xavier. Full source code available. It demonstrates Sjur's new 3D engine called V73D and has animated objects using inverse kinematics. And a bunch of recycled and optimized shaders and stuff, and some music too.

Real Programming (English version, 5 Apr 2021)

Ekte Programmering (Norwegian version, 9 Feb 2021)

A book about programming, programmers, programs, and pop culture. Co-authored with Sjur Julin. Tired of crap books promising to teach you the latest programming fad in 21 days? This one is the polar opposite! It contains a lot of real code written in real programming languages like C and assembler. A recurring theme is criticism of modern development methods, software management, languages, and compilers.


The Xmas Demo 2020 (23 Dec 2020, updated 3 Mar 2021)

The Xmas Demo 2020 is an updated version of the one from 2017. The source code from 2017 was revised to run on any graphics card and CPU. It still runs on the target NVidia Tegra X2 in 60 fps. Full source code available. Music and graphics by Sjur.


Dreamscape and Eclipse: The Final Cut (29 Jul 2019, updated 4 Jan 2020)

A 2019 remastering of the Triumph Amiga demos Dreamscape and Eclipse, released at The Gathering 1996 and 1997. We recovered the original video and music files from my Amiga, including some unused hi-res images. The music was remixed, the font was replaced, the video files were resampled, scaled to 1080p and denoised and retimed to the new music. It's low res, it's gritty, it's 1996 all over again, with a 2019 flair.


Exploiting the Cache: Faster Separable Filters (9 Mar 2018, updated 26 Jul 2019)

Introduces a unique and extremely fast method for applying separable filters. A Lanczos-2 picture scaler is used as example. SSE2 intrinsics and ARM Neon Assembler solutions are provided.


The Xmas Demo 2017 (18 Dec 2017)

Several dubious GLSL optimization techniques were used to make this demo run in 60 fps on the TX2 256-core GPU. If you're looking for precise math, look elsewhere. Close enough will have to do on that GPU! Everything should fly by in 60 fps. Full source code available.

Integer Raytracing on Tilera TILE-Gx (5 Jun 2017)

A second attempt at raytracing on the Tilera TILE-Gx. The TILE-Gx has very limited floating-point support in hardware, so let's try using fixed point math instead. Unfortunately, calculating square roots is very time consuming. An alternative approach is explored where custom conversion routines and integer math does this quickly enough to render 40 spheres in 1080p60. Videos and source code included.

GPURay - GPU+CPU Raytracer for NVidia Tegra X1 (3 Feb 2017)

The GPU only raytracer described in my 2016 article "GPU Hacks" can trace 80 spheres at 60 fps on the NVidia Tegra X1. This new article describes how to use CPU preprocessing to boost the sphere count to 128 with minimal changes to the GPU part. Videos and source code is included.


SRTP AES Optimization Revisited (27 Jul 2016)

In a 2010 article titled "SRTP AES Optimization" I presented a method to make SRTP AES run significantly quicker. Unfortunately, there were some caveats: Packet length had to be 4096 bytes or less and a multiple of 16, and the target CPU was expected to be big endian. Let's try to address these issues in a new and improved version that will run on any 32-bit CPU.

GPU Hacks: Fractals, a Raytracer and Raytraced Quaternion Julia Sets (10 Jun 2016)

The NVidia Tegra X1 has a Maxwell-based GPU with a theoretical FP32 peak of 512 GFLOPS per second. It can easily be programmed using OpenGL GLSL shaders. However, making fast GPU code is different from making fast CPU code. 3 non-typical GPU jobs are implemented in fragment shaders and optimized for better performance. Videos and source code is included.


OpenSSL aes_core.c Replacement for Tilera TILE-Gx (24 Sep 2015)

Drop-in replacement for aes_core.c that's significantly faster. Includes a second look at how to do the last round in less than half the instructions.


Bilinear Picture Scaling on Tilera TILE-Gx (7 Nov 2014)

A look at how to do bilinear picture scaling on the Tilera TILE-Gx. Two different approaches are tried out. Measurements are done on different core counts and data sizes. Uses a new parallelization library, presented in the article, to split the work across multiple cores.

Raytracing on Tilera TILE-Gx (21 Apr 2014)

Raytracing is a job well suited for multicore CPUs. Challenge of the day: Make a raytracer for the Tilera TILE-Gx36 that's quick enough to output 1920x1080p60 video. Source code, pictures, videos, and performance measurements included.

MultiQuake - Quake for TILE-Gx CPUs (7 May 2014)

Port of Quake for the TILE-Gx mega-multicore CPU. Number of Quakes possible to run in parallel is only limited by your screen size. Custom TILE-Gx specific scalers, including 2x and 3x EPX.


RGB to YUV Conversion on Tilera TILE-Gx (12 Apr 2013)

A RGB to YUV conversion routine for Tilera TILE-Gx that uses the new dual dot product instructions for maximum efficiency.


YUV to RGB Conversion on Tilera TILE-Gx (7 Nov 2012)

Optimizing for the Tilera TILE-Gx CPU is very different from Intel SSE2. An attempt to get optimal performance using 8-bit multipliers as much as possible.

YUV to RGB Conversion Using SSE2 (23 Oct 2012)

A common error in this class of conversion routines on SSE2 is too conservative use of multipliers, leading to complicated data shuffling before and/or after the multiplications. SSE2 multipliers are inherently cheap to use, so let's try to maximize their usage instead.

A Look at Halide's SSE2 3x3 Box Filter (24 Aug 2012)

A look at the SSE2 3x3 box filter used as example code in the Halide language specification. I get significantly better results using normal C code and SSE2 intrinsics. The code is also comprehensible.


AES Optimization on Tilera TILE-Gx (23 Dec 2011)

A TILE-Gx core can issue 3 instructions in parallel, given a set of strict restrictions. This paper explores how to exploit that in an AES encryption routine using TILE-Gx intrinsics.


SRTP SHA1 Optimization (13 Dec 2010)

Calculating SHA1 hashes on SRTP packets can be quite costly on low end CPUs. Since lengths etc. are static, let's try to strip out the code that actually does SHA1 calculation in OpenSSL and make it as fast as possible. Tests are performed on a Freescale MPC8270 CPU.

SRTP AES Optimization (2 Apr 2010)

A "feature" in the SRTP specification makes it possible to reduce the CPU cost of AES encryption and decryption by 30%.

TANDBERG Secrets (2010)

A look at the hidden menu in TANDBERG MXP video conferencing units, some obscure prototypes, and some more well-known prototypes.


MultiDoom - Doom for TILE-Gx CPUs (30 Dec 2009, updated 7 May 2014)

Port of Doom for the TILE-Gx mega-multicore CPU. Number of Dooms possible to run in parallel is only limited by your screen size. Custom TILE-Gx specific scalers, including 2x and 3x EPX.


PCTVNet HomePilot Set-Top Box (1999)

Code that might be interesting for code archaeologists and full schematics for HomePilot 2.0. Some random pictures from back then added in 2023. Was the 2.0 version ever released? Signs point to no.

General Licensing Information

All articles and source code files published on this website should have their licensing requirements clearly stated in the document. If licensing requirements are missing or incomplete, contact me and I'll fix it asap.

I do my best to follow licensing requirements on items used on this website, be it source code, images, or quotes from articles. If you mean that the licensing requirements for a specific item are not met and (this is important) you are the original artist or author, please contact me to have this fixed asap.

Licensed Items

This page uses a panel from the webcomic Abstruse Goose, strip 98 (secret archives): under the hood
Credits: "The Abstruse Goose comic is a subsidiary of the powerful and evil Abstruse Goose Corporation."
License: Attribution-NonCommercial 3.0 United States (CC BY-NC 3.0 US)

Ekte Programmering Norwegian flag
American flag Real Programming
Ignorantus AS