Latest News: Updated 29 July 2019 -- YouTube Channel -- LinkedIn Profile -- Email

Publications Archive

Raytracing on Tilera TILE-Gx (21 April 2014)
Raytracing is a job well suited for multicore CPUs. Challenge of the day: Make a raytracer for the Tilera TILE-Gx36 that's quick enough to output 1920x1080p60 video. Source code, pictures, videos and performance measurements included.
RGB to YUV Conversion on Tilera TILE-Gx (12 April 2013)
A RGB to YUV conversion routine for Tilera TILE-Gx that uses the new dual dot product instructions for maximum efficiency.
YUV to RGB Conversion on Tilera TILE-Gx (7 November 2012)
Optimizing for the Tilera TILE-Gx CPU is very different from Intel SSE2. An attempt to get optimal performance using 8 bit multipliers as much as possible.
YUV to RGB Conversion Using SSE2 (23 October 2012)
A common error in this class of conversion routines on SSE2 is too conservative use of multipliers, leading to complicated data shuffling before and/or after the multiplications. SSE2 multipliers are inherently cheap to use, so let's try to maximize their usage instead.
A Look at Halide's SSE2 3x3 Box Filter (24 August 2012)
A look at the SSE2 3x3 box filter used as example code in the Halide language specification. I get significantly better results using normal C code and SSE2 intrinsics. The code is also comprehensible.
AES Optimization on Tilera TILE-Gx (23 December 2011)
A TILE-Gx core can issue 3 instructions in parallel, given a set of strict restrictions. This paper explores how to exploit that in an AES encryption routine using TILE-Gx intrinsics.
SRTP SHA1 Optimization (13 December 2010)
Calculating SHA1 hashes on SRTP packets can be quite costly on low end CPUs. Since lengths etc. are static, let's try to strip out the code that actually does SHA1 calculation in OpenSSL and make it as fast as possible. Tests are performed on a Freescale MPC8270 CPU.
SRTP AES Optimization (2 April 2010)
A "feature" in the SRTP specification makes it possible to reduce the CPU cost of AES encryption and decryption by 30%.