[XviD-devel] A SSIM Plugin for XviD
Johannes Reinhardt
Johannes.Reinhardt at uni-konstanz.de
Sat Oct 28 19:30:59 CEST 2006
Hi skal and all,
(thats a rhyme :))
skal schrieb:
> Hi Johannes and all,
>
>
>> Message du 13/10/06 19:37
>> De : "Johannes Reinhardt" <Johannes.Reinhardt at uni-konstanz.de>
>>
>
>
>>>> The SSE2 implementation of consim is not faster than the mmx version
>>>> with all CPUs
>>>> (Pentium IV and Pentium M) I tested. Is there a chance to speed it up or
>>>> should I
>>>> disable SSE2? Or is SSE2 perhaps faster on other CPUs?
>>>>
>
> Note: the mmx version consim_mmx uses 'pshufw', which is SSE+ instruction.
>
I replaced it by a copy and a shift. I couldn't think of a better way to
do it.
> [...]
>
> Anyway, i had a look at the c version of consim, and am
> not sure it couldn't be turned into a faster way (and *then*
> optimized in SSE ;). If get you right, you computing deviates
> as <a-<a>><b-<b>>, where < > is the average operator \sum_i{a_i} / N
> (and this is where it could also be \sum_i{a_i w_i } / \sum_i { w_i })
>
> Now, we have <a-<a>><b-<b>> = <ab> - <a><b> which is lighter (less subs).
> So the loop could be something like:
>
> ============
> int valo, valc, devo =0, devc=0, corr=0;
> int i,j;
> for(i=0;i< 8;i++){
> for(j=0;j< 8;j++){
> valo = ptro[j];
> valc = ptrc[j];
> devo += valo*valo;
> devc += valc*valc;
> corr += valo*valc;
> }
> ptro += stride;
> ptrc += stride;
> }
> devo -= 64*lumo*lumo;
> devc -= 64*lumc*lumc;
> corr -= 64*lumo*lumc;
> *pdevo = devo;
> *pdevc = devc;
> *pcorr = corr;
> ========
>
I implemented it, and its faster (5% or so). And the MMX/SSE version is
much simpler. Thanks for the hint.
Is there a better way for finding the right order of operations than
brute force trying? It seems as the order of instuctions is quite
important for speed.
> but we have a precision problem around lumo/lumc which are already
> descaled by 64 (oh! and btw: using (meanc+32)>>6 instead of just
> meanc>>6 would be better rounded) (oh, and btw2: at line 267 of
> plugin_ssim.c, 'fmeanc' and 'fmeano' are not the means per se, but
> the sum of coeffs, without the /64. So i don't know if the formulae
> is ok).
>
Its ok, as numerator and denominator are scaled by 64.
> Waiting for your updated c-version now :)
>
> Skal
>
>
>
> _______________________________________________
> XviD-devel mailing list
> XviD-devel at xvid.org
> http://list.xvid.org/mailman/listinfo/xvid-devel
>
>
Patches are here:
http://xvid.ist-dein-freund.de/stuff/ssim_part2.diff
http://xvid.ist-dein-freund.de/stuff/encraw_stats_fix.diff
ssim_part2.diff
faster ssim calculation
th mmx implementation now only uses mmx instructions
let the user choose the accuracy to use
encraw_stats_fix.diff
fixes a bug in encraw. psnr was not calculated if stats are not displayed.
I will try to do the gaussian weighted calculation next week.
Thanks
Johannes
More information about the XviD-devel
mailing list