Random stuff I'm working on, in chronological order.

New article discussing how speed up image transformations using separable filters is available: A Different Method for Image Transformations.

As mentioned earlier, GLSL ES does not allow double precision, so I made a multicore CPU-renderer for the deeper zooms. If the upper left corner of the video says fp32 it's the GPU doing the rendering, fp64 is the CPU. The CPU renderer uses all 6 cores of the Jetson TX2 in parallel. The Denver 2 cores are roughly 18% faster than the A57s on double precision. They still suck on single precision, so no change there.

Some other small changes to finish off this very time consuming project: General zoom level increased, hopefully found some more interesting coordinates, made the intermission slightly less crap.

Need to get back to some realtime stuff.

Even when using the Continuous coloring method, more interesting results are obtained with higher depths. The tricks is to pick a nice palette.

As in the first video, start with a double H=S=1,V palette, ie. run around the edge of the HSV hexagon twice. For depth 1024, quadruple the number of runs to 8. Now, this will look absolutely terrible in 1080p, the aliasing will be much worse than with depth 256. So let's solve that with the standard easy solution: Render in 8k and downscale to 1k.

There's some small missing parts to add: Find some semi-interesting points, add crap intermission, find awesome music... and wait 3 hours for rendering to be done. I knew there was a catch.

That leaves one annoying problem: The quality when float32 precision is reached at deep zoom levels. GLSL ES does not allow double, so let's make a threaded CPU renderer for those parts. Maybe next week.

Music: "Floating Cities" by Kevin McLeod is licensed under a Creative Commons Attribution license: Attribution 4.0 International (CC BY 4.0)

Dynamic julia speed turned out to be a frustrating experience, given the time taken to render a test video. Instead, let's look at smooth mandelbrot fractals with reasonably deep zooms that fit in 32 bit floats.

This article discusses "smooth iteration count for generalized mandelbrot sets". While the theory is interesting, I can't get it to add up in the end. The ShaderToy demo looks cool, but getting a normalized 0-1 range value out was mostly hit and miss. I'm a terrible mathematician, so it's probably my fault. D'oh.

Instead, let's use the Continuous coloring method from Wikipedia. The good part is we don't need much depth (256 is ok), and if a double H=S=1,V palette is used, it starts to look pretty nice. I used the movement code from the first article, rewrote the semi-fast julia shader to save last results, unrolled a bit, and it runs realtime on the TX2. But the aliasing is terrible, so the video is rendered in 8k, scaled etc. at a whopping 2.35 fps.

Second attempt at using dynamic speed when rendering julia fractals. Totally new method: Pack the depth into rgba output, decode on the cpu, count and weigh the "interesting" pixels, use that as the step value for next frame, convert to color, scale, output. So the amount of noise doesn't affect the result as much as in the previous one, and it's somewhat quicker to render a lowres test animation. Still need to render in 8k for the final version to get rid of excess noise, at roughly 0.45 fps on the X2. D'oh.

Rendering interesting julia animations with high max depth requires precise positioning. Would it be possible to adjust speed on the fly by analyzing the amount of noise in the last rendered picture to skip the boring parts? This is an initial attempt using the DFA algorithm with depth 3072 and a double H=S=1,V palette so each color entry is unique and +/- 1 last entry. Rendering in 1080p directly makes the output a noisefest, so render it in 7680x4320 and downscale to 1k.

The speed adjustment is too aggressive, but the idea seems to be working. Only took 4 hours to render, so maybe I'll make a better one next week.

Based on code by Paulo Falcao which is available on ShaderToy. It's licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

1280x720 resolution. The original code gives 35 fps. Replacing the atan() calls increases this to 47 fps. Then it's the usual steps of quality reductions and some unrolling to edge it above 60 fps. Gonna do a writeup about atan() one of these days. The NVidia implementation is ok, but the code is just inlined, not reordered.

Some changes to make all the Xmas demo effects run in 60 fps:

Shader files, the ones changed since the video was released in bold:

- galdance.glsl - based on code by Sinuousity
- glowcity.glsl - based on code by mhnewman
- noise3d.glsl - based on code by revers
- colorful.glsl - based on code by ollj
**seascape.glsl**- based on code by TDM- trans.glsl - based on code by Shane
**twofield.glsl**- based on code by w23**torus.glsl**- based on code by bal-khan- tracer.glsl - based on code by Nils L. Corneliusen
- quat.glsl - based on code by Keenan Crane

It's running in 60 fps except during the transition, so let's save a bit more. The routine is already chopping off the left and right sides of the screen. But more can be removed, just not all the way down. Either mask out the areas using high-level GL or just quickly hack it into the shader:

void main( void ) { float xedge = 256.0f/1920.0f; float yedge2 = 208.0f/1080.0f; float xedge2 = 416.0f/1920.0f; if( (vt.x < xedge || vt.x > 1.0f - xedge) || ((vt.x < xedge2 || vt.x > 1.0f - xedge2) && vt.y > yedge2 && vt.y < 1.0f - yedge2 ) ) { outColor = vec4( 0.0f, 0.0f, 0.0f, colscale ); return; } (...)

Also a trivial reduction in lightball() that the compiler doesn't do:

vec3 lightball( vec3 lpos, vec3 lcolor, vec3 O, vec3 D, float L ) { vec3 ldir = lpos - O; if( dot( ldir, ldir ) > L*L ) return vec3( 0.0f ); float lv = 2.07f - ( ( ( lpos.z / 10.0f + 1.0f ) / 2.0f ) + 1.0f ); float pw = pow( max( 0.0f, dot( normalize( ldir ), D ) ), 20000.0f * lv ); return ( normalize( lcolor ) + 1.0f ) * pw; }

Move ang, orig, m calculation to the CPU. Untangle heightMapTracing(). Discover that one of the two initial map() calls resolves to static value for all iterations: map( ori ). But just using a base value of 1.0f is more than precise enough. So main() becomes simpler:

(...) float hx = map( ori.xyz + dir * 1000.0f ); if( hx > 0.0f ) { outColor = vec4( getSkyColor(dir), colscale ); return; } vec3 p = heightMapTracing(ori.xyz,dir, hx ); (...)

And heightMapTracing() is changed to:

vec3 heightMapTracing( vec3 ori, vec3 dir, out float hx ) { vec3 p; float tm = 0.0; float tx = 1000.0; // float hm = map(ori); float hm = 1.0f; // ori fixed per frame, so close enough float tmid = 0.0; for( int i = 0; i < NUM_STEPS; i++ ) { tmid = mix( tm, tx, hm / (hm-hx) ); p = ori + dir * tmid; float hmid = map( p ); if( hmid < 0.0 ) { tx = tmid; hx = hmid; } else { tm = tmid; hm = hmid; } } return p; }

D'oh of the week: I_MAX is 100. Sounds rather arbitrary, it never gets to 100 iterations anyway. Reduce until framerate is above 60 all the way. A good value of I_MAX seems to be 52. Visual artifacts minimal.