Thursday, October 23, 2008

Preparing for my son's baptizing



There is a lot of activity in the house as we are preparing for my son's baptizing on November 9th.
We are expecting family from all over Greece and friends from abroad so preparations are underway for sometime now, but we are now moving to "overdrive"

OSLEC code optimization

While working on my first CUDA app, i had a look at OSLEC as i though it would be a good candidate for parallel execution..
OSLEC is an Open Source echo canceler, written by David Row and used by asterisk.
After profiling the code,I found 2 places that i thought could benefit from some optimization. These were 2 loops that OSLEC spends considerable time in.
The first looked like that, and its part of the FIR filter in OSLEC

int i;
int offset1;
int offset2;

fir->history[fir->curr_pos] = sample;

offset2 = fir->curr_pos;
offset1 = fir->taps - offset2;
y = 0;
for (i = fir->taps - 1; i >= offset1; i--)
y += fir->coeffs[i]*fir->history[i - offset1];
for ( ; i >= 0; i--)
y += fir->coeffs[i]*fir->history[i + offset2];



By unrolling it it becomes like that

int i;
int offset1;
int offset2;
int position;

fir->history[fir->curr_pos] = sample;

offset2 = fir->curr_pos;
offset1 = fir->taps - offset2;
y = 0;
// for (i = fir->taps - 1; i >= offset1; i--)
// y += fir->coeffs[i]*fir->history[i - offset1];


i=fir->taps - 1;
while ( i >= offset1 )
{
position=i-offset1;
y += fir->coeffs[i]*fir->history[position];
y += fir->coeffs[i-1]*fir->history[position-1];
y += fir->coeffs[i-2]*fir->history[position-2];
y += fir->coeffs[i-3]*fir->history[position-3];
y += fir->coeffs[i-4]*fir->history[position-4];
y += fir->coeffs[i-5]*fir->history[position-5];
y += fir->coeffs[i-6]*fir->history[position-6];
y += fir->coeffs[i-7]*fir->history[position-7];

y += fir->coeffs[i-8]*fir->history[position-8];
y += fir->coeffs[i-9]*fir->history[position-9];
y += fir->coeffs[i-10]*fir->history[position-10];
y += fir->coeffs[i-11]*fir->history[position-11];
y += fir->coeffs[i-12]*fir->history[position-12];
y += fir->coeffs[i-13]*fir->history[position-13];
y += fir->coeffs[i-14]*fir->history[position-14];
y += fir->coeffs[i-15]*fir->history[position-15];

i=i-16;
}



and guess what,after recompiling there was 10% speed increase in OSLEC execution.
Then i moved to the second part that OSLEC spends a lot of time, the coefficient calculation.
Unrolling this loop gave roughly another 10% speed increase.
Roughly 20% speed increase in total, that's not bad :)

To verify that the increase were not an x86 "fluke" i added the patches to the oslec code in E-IPBX and recompiled the WARP version.
(WARP is a powerpc based PBX appliance from PIKA that we are working on)

I kept the original speedtest program and build 2 more adding each patch.
Bellow are the results first the unmodified speetest,and then with the first patch and the last with both patches.


root@e-ipbx:/$ speedtest

Testing OSLEC with 128 taps (16 ms tail)
CPU executes 527.63 MIPS
-------------------------

Method 1: gettimeofday() at start and end
601 ms for 10s of speech
31.71 MIPS
16.64 instances possible at 100% CPU load
Method 2: samples clock cycles at start and end
31.71 MIPS
16.64 instances possible at 100% CPU load
Method 3: samples clock cycles for each call, IIR average
cycles_worst 140815 cycles_last 3535 cycles_av: 4036
32.29 MIPS
16.34 instances possible at 100% CPU load

root@e-ipbx:/$ ./speedtest2

Testing OSLEC with 128 taps (16 ms tail)
CPU executes 528.91 MIPS
-------------------------

Method 1: gettimeofday() at start and end
497 ms for 10s of speech
26.29 MIPS
20.12 instances possible at 100% CPU load
Method 2: samples clock cycles at start and end
26.29 MIPS
20.12 instances possible at 100% CPU load
Method 3: samples clock cycles for each call, IIR average
cycles_worst 100679 cycles_last 3046 cycles_av: 3933
31.46 MIPS
16.81 instances possible at 100% CPU load
root@e-ipbx:/$ ./speedtest3

Testing OSLEC with 128 taps (16 ms tail)
CPU executes 527.67 MIPS
-------------------------

Method 1: gettimeofday() at start and end
485 ms for 10s of speech
25.59 MIPS
20.62 instances possible at 100% CPU load
Method 2: samples clock cycles at start and end
25.59 MIPS
20.62 instances possible at 100% CPU load
Method 3: samples clock cycles for each call, IIR average
cycles_worst 142386 cycles_last 2824 cycles_av: 2781
22.25 MIPS
23.72 instances possible at 100% CPU load



Not bad for a few lines of code :)

Friday, October 17, 2008

A super-computer in your desktop

Today i decided to spend sometime and get NVIDIA CUDA working on my desktop machine ( A quad Intel Q6600@2.40GHz. with 8 GB of Ram running in 64bit mode)
I had worked with CUDA about a year ago, when it was at the early beta stage but things have moved on a lot since then.
Setting my GeForce 8800 GT on Ubuntu 8.04 took a bit longer than "usual" due to the fact that 8.04 is not supported "out of the box"
Luckily there are some very detailed pages that helped and in less than an hour, CUDA example programs were happily running.
The GeForce 8800 GT is rated at 350+ GFLOPS which is pretty impressive (consider that Cray-2 could do a "meager" 1.9 GFLOPS)
The only problem (as with a lot of the hardware these days) is that there is very little software to take advantage of it.
My reason of getting CUDA to work is to explore the possibilities of accelerating various pieces of code/applications i am interested in.
Echo canceling (OSLEC for example) and transcoding for asterisk, are high on the list and based on the experience i had of porting asterisk to an FPGA there could be some very interesting results.

Thursday, October 16, 2008

Let's start blogging

Up till now i had resisted the idea of starting a blog.
The main reason i had as my excuse, was time or to better put it, the lack of it. Trying to run a small business and a family with a new born is enough to make you wish for more hours in a day.
But the real reason was that i though of blogging as just "hype" that will run its course and sooner or later disappear.

Then i attended the OpenCoffee XV meeting in Athens and heard Jason Calacanis presentation.
At one point he mentioned something like "the more you give to the community the more you will get back" and that really got me thinking.
My company, Digital-OPSiS, is developing embedded Open Source solutions for sometime now, so you can say that we own our existence to the mentality of giving to the community.
If the people who started the Open Source movement haven't done so, probably many of the wonderful things that we now take for granted in the IT/Internet would not exist.
I call it the "avalanche effect" and seen it many times with open source projects.
Someone writes a few lines of code, posts the code online,and boom, developers from all over the world start contributing and before you know it you have a killer app in your hands (Asterisk anyone :)

So the same could be true about ideas or thoughts
If Jason had not said that phrase, probably i wouldn't have started this blog,so just think how many people could start something new,solve a problem or just feel better by things you and i take for granted or just don't mention.
That alone i think is a pretty good reason to start blogging