[XviD-devel] Quality optimization
skal
skal at planet-d.net
Wed Feb 26 12:14:46 CET 2003
Hi,
I forward an answer to a question I received,
since it can be of general interest:
On Tue, 2003-02-25 at 18:20, skal wrote:
> here's a C/MMX/SSE version of the Hadamard transform (16bits).
Q: > Would there be any improvement between SSE and SSE2?
Well, most probably. The obvious improvement is for
the vertical pass: instead of dealing with 4 + 4 columns
subsequently, they could all be done in one pass, replacing:
HADAMARD_VPASS eax
HADAMARD_VPASS eax+8
by a single HADAMARD_VPASS eax where all the 'mm?' registers
are replaced by 'xmm?' in this macro (and taking care of
alignments).
This being said, such heavily SIMD'd functions rapidly hit
the memory bandwith bottleneck. Actually, in the Hadamard
transform I posted, only HALF of the time (tick-wise) is
spent doing the arithmetic computations. The rest is spent
loading/storing data. For the F/Idct, data I/O
is also a great part of the stuff, considering how cheap
are the mults.
Note also that for this in-place funcs, prefetching is
almost useless (haven't tested it, though).
> Without the 'pshufw' re-ordering, output columns are re-ordered
> according to: [03127465]. C-version spits the correct order...
> Note: Output is also scaled by 8.
oops! Output is scaled by 64, not 8!.
bye,
Skal
More information about the XviD-devel
mailing list