[XviD-devel] Forward: [Ffmpeg-devel] Sparse IDCT (fwd)

Wed Feb 26 16:16:40 CET 2003

	Hi,

On Tue, 2003-02-25 at 19:19, Christoph Lampert wrote:
> Hi,
> 
> wouldn't this be possible for XVID, too? Two or three separate 
> dequant/iDCT routines depending on where the last non-zero coefficient
> is? Since data is available at decoding anyway, I can see no drawback. 
> 
	I've tried several similar approaches (for decoder):

	a) During Inter-AC decoding, keep track of occupied rows
	and pass the info to the Idct. 
        b) Inside the Idct, test that a row is non-zero before
	going for the corresponding vertical pass.

	Surprisingly, method b) performs better than a). Note however
	that method a) can be advantageous for the dequantization too.
	And in both cases, the gain is really worth it (~10%-15% for
	decoding highly quantized stuff).

	Note that method a) is harder to implement for Intra blocks,
	because of (vertical) AC-prediction. Mismatch ctrl of Inter
	block is also painful.


	Another (lighter) gain is grouping the copy / add of Idct's
	result toward 'cur' image within the vertical pass. It saves
	memory read/write...

	If you're interested, you can have a look at the 
	file 'src/dsp_src/skl_dct_sse.asm' in my codec sources

( http://skal.planet-d.net/coding/mpeg4codec.html )

	I didn't submitted this Idct here (although it seems
	the fastest around </brag>) because it's deliberately
	not IEEE-1173 compliant. This might hurt some sensibilities :)

	bye!
		Skal


> gruel 
> 
> 
> ---------- Forwarded message ----------
> Date: Tue, 25 Feb 2003 11:08:39 -0700 (MST)
> From: Mike Melanson <melanson at pcisys.net>
> Reply-To: ffmpeg-devel at lists.sourceforge.net
> To: ffmpeg-devel at lists.sourceforge.net
> Subject: [Ffmpeg-devel] Sparse IDCT
> 
> Hi,
> 	A curious feature in the VP3 codec is that the decoder maintains
> statistics about where the last non-zero DCT coefficient lives in a
> particular DCT block. It then uses this information to call 1 of 3
> different IDCT functions. One function is the basic 8x8 IDCT. The other 2
> handle sparse matrices of coefficients. One handles a matrix where the
> non-zero coefficients are concentrated in the upper left corner
> (IDct10()). The other handles a matrix where only the DC coeff. is
> non-zero (IDct1()). The latter transform is particularly simple since
> it copies the dequantized/scaled DC coeff. to every other position in the
> block.
> 
> 	My question is: Would it be worthwhile to support these sparse
> IDCT functions in ffmpeg's DSP context? I know that for IDct1(), an
> AltiVec-capable PPC could use a vector splat instruction followed by a
> series of vector store ops. Would these transforms be useful for other
> codecs?
> 
> 	I wondered how frequently the functions are called. On the
> keyframe where I am doing most of my testing and validation, I found that
> the full 64-element IDCT was never called even once; only the 1- and
> 10-element transforms were called.
> 
> 	Thanks...
> --
> 	-Mike Melanson
> 
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Ffmpeg-devel mailing list
> Ffmpeg-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ffmpeg-devel
> 
> _______________________________________________
> XviD-devel mailing list
> XviD-devel at xvid.org
> http://list.xvid.org/mailman/listinfo/xvid-devel
>