The Xmas Demo 2023 has been released! Check the link for more info. Have a Merry Xmas [southpark.fandom.com] everybody!
Doom [en.wikipedia.org] is 30 years old today! I remember playing through Doom II on my Amiga back in 1995. Sort of. I ran the Mac version in an emulator. Good times. The first iteration of my Tilera Doom port called MultiDoom saw the light of day in 2009. It's covered in the book, along with some (missing) correspondence with John Carmack.
El Reg article: Doom turns 30, so its creators celebrate seminal first-person shooter’s contribution to IT careers [theregister.com]
Work on The Xmas Demo 2023 is progressing, as always. It's gonna have fewer glitches than last year. Jinx!
In aniticipation of the Xmas Demo 2023, we're releasing another sample chapter from the book Real Programming (Ekte Programmering in Norwegian). It describes the unorthodox optimizations that made the Xmas Demo 2017 run in 1080p60 on a measly TX2. Also, questions have been asked about what happened in Denver. Find the answers here:
English: Chapter 17: The Rip-Off Artist
Norwegian: Kapittel 17: En kopi og litt trigonometri
A new sample chapter from the book Real Programming (Ekte Programmering in Norwegian) has been released.
English: Chapter 2: A Rant About CPUs, Languages, and Development Methods
Norwegian: Kapittel 2: En tirade om prosessorer, språk og utviklingsmetoder
My friend 2pak [goldmetric.com] did a test of run of the ARM Neon scaler on his MacBook Air (M2, 2022):
pak@helvetia scaler % ./scaler -m 10 (...) total: 1261.817993ms, per frame: 1.261818ms, fps: 792.507324 pak@helvetia scaler % ./scaler -m 300 (...) total: 1256.837036ms, per frame: 1.256837ms, fps: 795.648071
Results will be merged into the article soonish.
Work on The Xmas Demo 2023 is progressing! Not sure what kind of hardware to run it yet, though.
To quote the El Reg article [theregister.com]: "Take up less CPU time and memory? What amazing tech is this?!" A team of researchers have documented what's obvious: Webapps/crapps use a lot more energy than native apps. Duh. Nice to see the world coming around.
Paper: Native vs Web Apps: Comparing the Energy Consumption and Performance of Android Apps and their Web Counterparts [arxiv.org]
Addendum: While investigating why the standard Ubuntu Linux apps are so unbelievably crap and slow on my 12 core AMD with a 3070 gfx card, I came across this article by Liam Proven on El Reg: antiX 23: Anarchic for sure, but 'design by committee' isn't always the best for Linux [theregister.com]:
The article about the ARM Neon scaler shows performance results for a number of slightly arcane ARM systems. That's semi-interesting, but what about the cool kid in town: Apple's ARM-based M1 and M2 systems? I came across a spare 2020 M1 Mac Mini, so I did a quick test run on that. Ignore the glaring issues about core types and clocks for now. I'll get back to that.
nils@merrimac scaler % gcc -O3 -Wall -o scaler scale_neon_intrinsics.c bmp_planar.c coeffs.c main.c -lm nils@merrimac scaler % ./scaler -m 10 (...) total: 1365.284058ms, per frame: 1.365284ms, fps: 732.448303 nils@merrimac scaler % ./scaler -m 300 (...) total: 1373.613037ms, per frame: 1.373613ms, fps: 728.007080
I like those numbers. Have to test it on an M2 too. We have top men working on it!
One of my office PCs is connected to a Benchmark DAC3 [benchmarkmedia.com] that now sends the analog output to a TubeCube [tubedepot.com]. It's quirky, it runs hot, it occasionally makes weird noises, but it's cheap and fun! Only thing missing is a headphone jack... and slightly more power output. Can't have it all, I guess.
A new video has been released: Shady Julia Quaternions [youtube.com].
We go to the other extreme this time! The Julia Quaternions shader from The Xmas Demo 2022 has been improved. We implemented a new coloring scheme and added plane shadows. It delivers 1080p60 on an RTX 3070 @ 1.9 Ghz with 50% load, so it's not stellar performance yet, but enough to show it off. The XKCD strip called Actual Progress [xkcd.com] explains how it works.
C and GLSL Code
Nils Liaaen Corneliusen
Quantum Knights Theme [bandcamp.com] by legendary C64 and Amiga musician Chris Huelsbeck [patreon.com].
License: Royalty-free license
2D Clouds [shadertoy.com] by drift.
License: CC BY-NC-SA 3.0 [creativecommons.org]
Picture called Big Bug by Morten Johnsen.
3.5 seconds voice sample from the movie Chappie [imdb.com].
A new article is out: Edgehog: 1080p60 Nano Fractals.
Wrapping up loose ends from the book. The Edgehog method makes the dainty Jetson Nano render 1080p60 fractals depth 256 fractals. It's running in parallel on the 4 ARM cores. DFA is recycled for the GPU part. We made a short tech demo video in classic 1987 style. Why? Because we can. It's recorded straight from the HDMI output. Full source code etc. available, as usual. Trivia: Which picture in the fake browser is not from 1987?
We're looking into doing The Xmas Demo 2023 on an NVidia Jetson Orin Nano [arrow.com]. Don't have access to a full Orin anymore, and the newish Orin Nano is more reasonably priced. 1024 GPU cores should be more than enough. Either that or fall back to the Xavier.
Latelv, I've been trying to answer a question that seems to bother... well, nobody: Can the NVidia Jetson Nano do 1080p60 fractals with all pixels at max depth when max depth is 256? Using my old DFA routine, the Nano peaks out at 23.6 FPS. It isn't too bad, considering the TX1 can do 52 FPS with higher clock and twice the number of GPU cores. As mentioned in the book, rectangle checking may be the answer. The article Adaptive Parallel Computation with CUDA Dynamic Parallelism (Adinets 2014) [developer.nvidia.com] covers the basic theory and hints at how we're not doing it, since a Nano doesn't have a huge GPU. Also, it would be boring. We're using some untapped resources instead. Hope to wrap this up before summer vacation. Jinx!
A new article is out: A Fast Image Scaler for ARM Neon.
It features a completely new Neon implementation of a Lanczos-2 scaler in C and intrinsics. The old ARMv7 Assembler version is updated. Based on the theory presented in the 2018 article Exploiting the Cache: Faster Separable Filters.
We're not just banging rocks together here at Igno Labs!
I've been blowing dust off some old and new ARMs. (Well, they're usually from Sjur's bag of holding.)
The Xavier is running, Sjur's
Pi 3 Pi 2 hasn't melted yet, the Nano is around here somewhere,
and a Rockchip is stuttering along with that
stupid piece of shit that doesn't fucking work [youtube.com]: Android.
Long story short: I've been trying to update the old ARM Neon Assembler image scaler for classic ARMs and make an improved
version for modern ARMs. Should be ready soon. Right now, it's down to a compiler problem. Really. I did see that one coming.
Casey Muratori has published an interesting article called Performance Excuses Debunked [computerenhance.com]. Here's a quote from the introduction:
Whenever I point out that a common software practice is bad for performance, arguments ensue. That’s good! People should argue about these things. It helps illuminate both sides of the issue. It’s productive, and it leads to a better understanding of how software performance fits into the priorities of our industry.
What's not good is that some segments of the developer community don’t even want to have discussions, let alone arguments, about software performance. Among certain developers, there is a pervasive attitude that software simply doesn't have performance concerns anymore. They believe we are past the point in software development history where anyone should still be thinking about performance.
Though I am a staunch supporter of writing everything in Assembly language, I completely agree with him. Gonna print out a copy and keep it in my briefcase for the next time some dingwad claims code performance is not important anymore.
Sjur was interviewed on the Norwegian radio show Nitimen today. He talks about rollerblading, turning 50, 3D graphics, and programming in C. Among other things.
Listen to the entire show here [radio.nrk.no].
(It's a rather long show and I can't figure out how to link directly to the segment or fast forward. Hmm. We have Top Men [youtube.com] working on it!)
Sjur got some new old toys in the mail! An AOC battery-powered screen and NVidia Jetson Nano Developer Kit [sparkfun.com]. It's a cut-down TX1 with 128 GPU cores and 4xA57, integrated cooling, 10W, at a price of just 149 dollars. Time to build a killbot [en.wikipedia.org]! We hooked it up for a test run:
I recovered some old images during the weekend.
From 2018: I rendered the last fractal video [youtube.com] that uses the DFA algorithm, smooth coloring and 32/64 bit CPU/GPU switching on the fly on a remote TX2. Sort of. The axeman was making his way to my office, so I had to wing it: Just start more processes rendering frames spaced 6 apart and hope nothing catches fire. The Denver cores managed to stay ahead of their game. Fun times.
From Xmas 2017: The Xmas Demo running live on prototype TX2 hardware:
From Xmas 2015: Quake on prototype TX1 hardware. RayMan on T150MXP on the left:
More old recovered images from the TANDBERG and PCTVNet days. Warning: Super low quality shots from 1999-2006.
The Norwegian version of Real Programming has received wide coverage in the Norwegian digital press lately. It's been hilarious: Lots of people meaning a lot about a book they have never read. I've quoted some of those articles on the (Norwegian) book page, so you can have a laugh too.
What's not hilarious is that DNB's technology director Nicolay Rygh claims that it's ok to copy code from StackOverflow and call it your own: English by Google Translate, original Norwegian version [kode24.no].
Source code licensing is a serious matter. I sincerely hope that this is not the official stance of DNB. DNB happens to be the largest bank in Norway.
I'm taking steps to improve source code visibility. Most of the code salted down in zip archives should be browsable soon: The browse button will show source files that are not automatically generated. I'm also reviewing licenses to make sure they're correct and readily available. That means updating the comments at the start of several source files.
It has been brought to my attention that some readers seem to have misunderstood the
writing style used in Real Programming: We use
and exaggerations to drive our points through in an entertaining way.
It should be obvious after reading the back cover, but obviously it wasn't.
(It should also be noted that "obvious" is one of the most frequently used words in the book. Obviously.)
To remedy that, I'm adding some warning labels. The code can still be read for free without
being exposed to our acerbic writing style.
Update 2024: Those warnings are gone. What might have been considered exaggerations in 2021 are now the norm. To quote my favorite TV series: "I will not be pushed, filed, stamped, indexed, briefed, debriefed, or numbered."
Kode24 has an article today about our book and methods. Kode24 is one of very few Norwegian-only programming news sites. Have a look if you can read Norwegian!
Dårlige utviklere tror jobben er ferdig klokka fem [kode24.no]
Codon: It never ends, this shit [youtube.com]
I'm not gonna link to it since I'm not gonna promote it more than necessary. Check out some of the popular tech news sites on the interwebs, or scoot over to Github if you missed it.
Here's what they claim in their code pit:
Codon is a high-performance Python compiler that compiles Python code to native machine code without any runtime overhead. Typical speedups over Python are on the order of 10-100x or more, on a single thread. Codon's performance is typically on par with (and sometimes better than) that of C/C++.
One of the persons behind Codon was also behind Halide. Halide was an earlier "faster-than-C" programming language. I published a critique of their rigged example code and suspicious test equipment back in 2012: A look at Halide's SSE2 3x3 Box Filter. The subject was revisited in the book Real Programming. Get the chapter freely here: Chapter 7: Research is Hard.
Now, 11 years later, they're at it again. Claiming that some new programming language or interpreter or "compiler" is faster than C is like claiming it runs faster than the CPU. So, by reduction, a multiply should take 0 cycles, or maybe -1. I hear they're also working on a rocket that goes faster than the speed of light and a car that runs on willpower.
There are no shortcuts in programming. I wonder what they'll come up with in 2034. No, I don't care. Never mind.
Today marks the two year anniversary of the launch of Ekte Programmering, the initial Norwegian version of Real Programming.
To mark the occasion, we have decided to release Real Programming Condensed (PDF, 368KB): A 10-page summary of the book. Read it, send it to friends, copy it, and quote from it.
Happy new year! It's been a fun 2022 with the launch of the new Xmas demo, the remastering of the previous Xmas demo, the guest lecture at the University of Oslo, Real Programming (the Norwegian version) sneaking into the National Library of Norway (it's a mandatory optional process which requires submitting three copies, so not that hard, but it sounds cool), the rebuilding of the lab in the new offices, and the arrival of the NVidia AGX Orin Developer Kit. We also had some (professional) fun at Huddly where we did top secret stuff involving their excellent cameras.
Stay tuned for more awesomely great stuff in 2023! (May be considered regular stuff in some regions)
This article is published under the following license: Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0).
Short summary: You may copy and redistribute the material in any medium or format for any purpose, even commercially. You must give appropriate credit, provide a link to the license, and indicate if changes were made. If you remix, transform, or build upon the material, you may not distribute the modified material.