Pipeline

Random stuff I'm working on, in chronological order.

2018 -- 2017 -- 2016

12 June 2018

I started working for TANDBERG, the videoconferencing company, back in 1999. TANDBERG was acquired by Cisco in 2010. The time has now come to move on. I wish most of the people at Cisco Systems Norway good luck in the future. I'm sure they'll be able to make another good product or two.

So it's time for summer break. Was hoping to finish some research that didn't catch on at Cisco. Guess I'll have to order a devkit from NVidia, or just bite the bullet and get a new high-end PC. Getting tired of low performance embedded chips.

From the pile of projects that never made it out of the Cisco lab: There are some floating balls on the opening screen of most Cisco videoconferencing systems, except the TILE-Gx based SX80. So I took the GLSL shader used in the TX1 based systems, converted it to 8-bit math and duct taped it together. It needs exactly 2 - two - of 36 cores, 5.6% total load. It can be optimized more: What you're seeing is a proof of concept. It's doable, at a very low CPU cost.

11 April 2018

Made a full 60 fps version of the NVidia Jetson TX2 Xmas Demo based on the 8 January shaders: The Xmas Demo 2017 Remastered.

Also added a project with all the info about optimizations done in one place: Nvidia Jetson TX2 Xmas Demo 2017.

9 March 2018

New article discussing how to speed up image transformations using separable filters is available: A Different Method for Image Transformations.

1 March 2018

As mentioned earlier, GLSL ES does not allow double precision, so I made a multicore CPU-renderer for the deeper zooms. If the upper left corner of the video says fp32 it's the GPU doing the rendering, fp64 is the CPU. The CPU renderer uses all 6 cores of the Jetson TX2 in parallel. The Denver 2 cores are roughly 18% faster than the A57s on double precision. They still suck on single precision, so no change there.

Some other small changes to finish off this very time consuming project: General zoom level increased, hopefully found some more interesting coordinates, made the intermission slightly less crap.

Need to get back to some realtime stuff.

23 February 2018

Even when using the Continuous coloring method, more interesting results are obtained with higher depths. The tricks is to pick a nice palette.

As in the first video, start with a double H=S=1,V palette, ie. run around the edge of the HSV hexagon twice. For depth 1024, quadruple the number of runs to 8. Now, this will look absolutely terrible in 1080p, the aliasing will be much worse than with depth 256. So let's solve that with the standard easy solution: Render in 8k and downscale to 1k.

There's some small missing parts to add: Find some semi-interesting points, add crap intermission, find awesome music... and wait 3 hours for rendering to be done. I knew there was a catch.

That leaves one annoying problem: The quality when float32 precision is reached at deep zoom levels. GLSL ES does not allow double, so let's make a threaded CPU renderer for those parts. Maybe next week.

Music: "Floating Cities" by Kevin McLeod is licensed under a Creative Commons Attribution license: Attribution 4.0 International (CC BY 4.0)

16 February 2018

Dynamic julia speed turned out to be a frustrating experience, given the time taken to render a test video. Instead, let's look at smooth mandelbrot fractals with reasonably deep zooms that fit in 32 bit floats.

This article discusses "smooth iteration count for generalized mandelbrot sets". While the theory is interesting, I can't get it to add up in the end. The ShaderToy demo looks cool, but getting a normalized 0-1 range value out was mostly hit and miss. I'm a terrible mathematician, so it's probably my fault. D'oh.

Instead, let's use the Continuous coloring method from Wikipedia. The good part is we don't need much depth (256 is ok), and if a double H=S=1,V palette is used, it starts to look pretty nice. I used the movement code from the first article, rewrote the semi-fast julia shader to save last results, unrolled a bit, and it runs realtime on the TX2. But the aliasing is terrible, so the video is rendered in 8k, scaled etc. at a whopping 2.35 fps.

13 February 2018

Second attempt at using dynamic speed when rendering julia fractals. Totally new method: Pack the depth into rgba output, decode on the cpu, count and weigh the "interesting" pixels, use that as the step value for next frame, convert to color, scale, output. So the amount of noise doesn't affect the result as much as in the previous one, and it's somewhat quicker to render a lowres test animation. Still need to render in 8k for the final version to get rid of excess noise, at roughly 0.45 fps on the X2. D'oh.

26 January 2018

Rendering interesting julia animations with high max depth requires precise positioning. Would it be possible to adjust speed on the fly by analyzing the amount of noise in the last rendered picture to skip the boring parts? This is an initial attempt using the DFA algorithm with depth 3072 and a double H=S=1,V palette so each color entry is unique and +/- 1 last entry. Rendering in 1080p directly makes the output a noisefest, so render it in 7680x4320 and downscale to 1k.

The speed adjustment is too aggressive, but the idea seems to be working. Only took 4 hours to render, so maybe I'll make a better one next week.

12 January 2018

Based on code by Paulo Falcao which is available on ShaderToy. It's licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

1280x720 resolution. The original code gives 35 fps. Replacing the atan() calls increases this to 47 fps. Then it's the usual steps of quality reductions and some unrolling to edge it above 60 fps. Gonna do a writeup about atan() one of these days. The NVidia implementation is ok, but the code is just inlined, not reordered.

8 January 2018

Note: Latest info about the Xmas demo is available here.

Some changes to make all the Xmas demo effects run in 60 fps:

Shader files, the ones changed since the video was released in bold:

Changes for 60 fps in Twofield:

It's running in 60 fps except during the transition, so let's save a bit more. The routine is already chopping off the left and right sides of the screen. But more can be removed, just not all the way down. Either mask out the areas using high-level GL or just quickly hack it into the shader:

 void main( void )
 {
    float xedge  = 256.0f/1920.0f;
    float yedge2 = 208.0f/1080.0f;
    float xedge2 = 416.0f/1920.0f;

    if(  (vt.x < xedge  || vt.x > 1.0f - xedge) ||
        ((vt.x < xedge2 || vt.x > 1.0f - xedge2) && vt.y > yedge2 && vt.y < 1.0f - yedge2 ) ) {
         outColor = vec4( 0.0f, 0.0f, 0.0f, colscale );
         return;
    }
    (...)

Also a trivial reduction in lightball() that the compiler doesn't do:

vec3 lightball( vec3 lpos, vec3 lcolor, vec3 O, vec3 D, float L )
{
    vec3 ldir = lpos - O;

    if( dot( ldir, ldir ) > L*L ) return vec3( 0.0f );

    float lv = 2.07f - ( ( ( lpos.z / 10.0f + 1.0f ) / 2.0f ) + 1.0f );

    float pw = pow( max( 0.0f, dot( normalize( ldir ), D ) ), 20000.0f * lv );

    return ( normalize( lcolor ) + 1.0f ) * pw;
}

Changes for 60 fps in Seascape:

Move ang, orig, m calculation to the CPU. Untangle heightMapTracing(). Discover that one of the two initial map() calls resolves to static value for all iterations: map( ori ). But just using a base value of 1.0f is more than precise enough. So main() becomes simpler:

(...)
    float hx = map( ori.xyz + dir * 1000.0f );
    if( hx > 0.0f ) {
        outColor = vec4( getSkyColor(dir), colscale );
        return;
    }

    vec3 p = heightMapTracing(ori.xyz,dir, hx );
(...)

And heightMapTracing() is changed to:

vec3 heightMapTracing( vec3 ori, vec3 dir, out float hx )
{
    vec3 p;
    float tm = 0.0;
    float tx = 1000.0;
//    float hm = map(ori);
    float hm = 1.0f; // ori fixed per frame, so close enough
    float tmid = 0.0;

    for( int i = 0; i < NUM_STEPS; i++ ) {
        tmid = mix( tm, tx, hm / (hm-hx) );
        p = ori + dir * tmid;
        float hmid = map( p );
        if( hmid < 0.0 ) {
            tx = tmid;
            hx = hmid;
        } else {
            tm = tmid;
            hm = hmid;
        }
    }
    return p;
}

Changes for 60 fps in Torus Thingy:

D'oh of the week: I_MAX is 100. Sounds rather arbitrary, it never gets to 100 iterations anyway. Reduce until framerate is above 60 all the way. A good value of I_MAX seems to be 52. Visual artifacts minimal.


www.ignorantus.com