[XviD-devel] A SSIM Plugin for XviD

Sat Oct 28 19:30:59 CEST 2006

Hi skal and all,

(thats a rhyme :))

skal schrieb:
>    Hi Johannes and all,
>
>   
>> Message du 13/10/06 19:37
>> De : "Johannes Reinhardt" <Johannes.Reinhardt at uni-konstanz.de>
>>     
>
>   
>>>> The SSE2 implementation of consim is not faster than the mmx version 
>>>> with all CPUs
>>>> (Pentium IV and Pentium M) I tested. Is there a chance to speed it up or 
>>>> should I
>>>> disable SSE2? Or is SSE2 perhaps faster on other CPUs?
>>>>         
>
>    Note: the mmx version consim_mmx uses 'pshufw', which is SSE+ instruction.
>   
I replaced it by a copy and a shift. I couldn't think of a better way to 
do it.
> [...]
>
>    Anyway, i had a look at the c version of consim, and am
>    not sure it couldn't be turned into a faster way (and *then*
>    optimized in SSE ;). If get you right, you computing deviates
>    as <a-<a>><b-<b>>, where < > is the average operator \sum_i{a_i} / N
>    (and this is where it could also be \sum_i{a_i w_i } / \sum_i { w_i })
>
>    Now, we have <a-<a>><b-<b>> = <ab> - <a><b> which is lighter (less subs).
>    So the loop could be something like:
>
> ============
>         int valo, valc, devo =0, devc=0, corr=0;
>         int i,j;
>         for(i=0;i< 8;i++){
>                 for(j=0;j< 8;j++){
>                         valo = ptro[j];
>                         valc = ptrc[j];
>                         devo += valo*valo;
>                         devc += valc*valc;
>                         corr += valo*valc;
>                 }
>         ptro += stride;
>         ptrc += stride;
>         }
>         devo -= 64*lumo*lumo;
>         devc -= 64*lumc*lumc;
>         corr -= 64*lumo*lumc;
>         *pdevo = devo;
>         *pdevc = devc;
>         *pcorr = corr;
> ========
>   
I implemented it, and its faster (5% or so). And the MMX/SSE version is 
much simpler. Thanks for the hint.

Is there a better way for finding the right order of operations than 
brute force trying? It seems as the order of instuctions is quite 
important for speed.
>      but we have a precision problem around lumo/lumc which are already
>      descaled by 64 (oh! and btw: using (meanc+32)>>6 instead of just
>      meanc>>6 would be better rounded) (oh, and btw2: at line 267 of
>      plugin_ssim.c, 'fmeanc' and 'fmeano' are not the means per se, but
>      the sum of coeffs, without the /64. So i don't know if the formulae
>      is ok).
>   
Its ok, as numerator and denominator are scaled by 64.

>      Waiting for your updated c-version now :)
>
> Skal
>
>
>
> _______________________________________________
> XviD-devel mailing list
> XviD-devel at xvid.org
> http://list.xvid.org/mailman/listinfo/xvid-devel
>
>   

Patches are here:

http://xvid.ist-dein-freund.de/stuff/ssim_part2.diff
http://xvid.ist-dein-freund.de/stuff/encraw_stats_fix.diff

ssim_part2.diff
faster ssim calculation
th mmx implementation now only uses mmx instructions
let the user choose the accuracy to use

encraw_stats_fix.diff
fixes a bug in encraw. psnr was not calculated if stats are not displayed.

I will try to do the gaussian weighted calculation next week.

Thanks

Johannes