The Xmas Demo 2021 for the NVidia Jetson AGX Xavier. Full source code available. It demonstrates Sjur's new 3D engine called V73D and has animated objects using inverse kinematics. And a bunch of recycled and optimized shaders and stuff, and some music too.
A book about programming, programmers, programs, and pop culture. Co-authored with Sjur Julin. Tired of crap books promising to teach you the latest programming fad in 21 days? This one is the polar opposite! It contains a lot of real code written in real programming languages like C and assembler. A recurring theme is criticism of modern development methods, software management, languages, and compilers.
The Xmas Demo 2020 is an updated version of the one from 2017. The source code from 2017 was revised to run on any graphics card and CPU. It still runs on the target NVidia Tegra X2 in 60 fps. Full source code available. Music and graphics by Sjur.
A 2019 remastering of the Triumph Amiga demos Dreamscape and Eclipse, released at The Gathering 1996 and 1997. We recovered the original video and music files from my Amiga, including some unused hi-res images. The music was remixed, the font was replaced, the video files were resampled, scaled to 1080p and denoised and retimed to the new music. It's low res, it's gritty, it's 1996 all over again, with a 2019 flair.
Image transformations using separable filters can be implemented as a vertical pass followed by a horizontal pass. They usually differ in their implementation, and the horizontal one is slower. A new method is presented where the passes are almost similar by ordering and transposing data in a specific manner. This may be faster on architectures where the L1 cache is large enough to hold a temporary dataset of 8 lines. SSE2 and ARM NEON implementations are provided.
A second attempt at raytracing on the Tilera TILE-Gx. The TILE-Gx has very limited floating-point support in hardware, so let's try using fixed point math instead. Unfortunately, calculating square roots is very time consuming. An alternative approach is explored where custom conversion routines and integer math does this quickly enough to render 40 spheres in 1080p60. Videos and source code included.
In a 2010 article titled "SRTP AES Optimization" I presented a method to make SRTP AES run significantly quicker. Unfortunately, there were some caveats: Packet length had to be 4096 bytes or less and a multiple of 16, and the target CPU was expected to be big endian. Let's try to address these issues in a new and improved version that will run on any 32-bit CPU.
Drop-in replacement for aes_core.c that's significantly faster. Includes a second look at how to do the last round in less than half the instructions.
Port of Doom for the TILE-Gx mega-multicore CPU. Number of Dooms possible to run in parallel is only limited by your screen size. Custom TILE-Gx specific scalers, including 2x and 3x EPX.