[XviD-devel] PATCH: Per slices rendering

Michael Militzer xvid-devel@xvid.org
Tue, 9 Jul 2002 00:17:19 +0200


----- Original Message -----
From: "peter ross" <suxen_drol@hotmail.com>
To: <xvid-devel@xvid.org>
Sent: Monday, July 08, 2002 2:46 AM
Subject: Re: [XviD-devel] PATCH: Per slices rendering


> >From: "Michael Militzer" <michael@xvid.org>
> >Reply-To: xvid-devel@xvid.org
> >To: <xvid-devel@xvid.org>
> >Subject: Re: [XviD-devel] PATCH: Per slices rendering
> >Date: Mon, 8 Jul 2002 01:27:29 +0200
> >
> >----- Original Message -----
> >From: "peter ross" <suxen_drol@hotmail.com>
> >To: <xvid-devel@xvid.org>
> >Sent: Monday, July 08, 2002 12:26 AM
> >Subject: Re: [XviD-devel] PATCH: Per slices rendering
> >
> >
> > > hey all,
> > >
> > > how about adding a 'dirty' field for each macroblock, which is set if
> >the
> > > macroblock has changed? then at the image_output stage, perform some
> > > macroblock checking and call slice_copy.
> > >
> > > out of curiosity, have you tried rending at the macroblock/block
level?
> >
> >yes, this was my first idea. Bad is that this 'dirty' field is only used
> >for
> >the slice_copy and for nothing else. Then I thought a bit about it and
came
> >to this idea: The idct data is currently stored in a int16_t data[64 * 6]
> >for a MB. Why not make one big array [64*6*mb_width*mb_height] for all
MBs
> >and additionally one status variable for every MB? It should be possible
> >then to skip the final transfer step in mb_intra and mb_inter and instead
> >perform the transfer at once at the end of decoder_iframe and
> >decoder_pframe. Advantage would be that non coded blocks doesn't need to
be
> >copied from ref to current frame, only MBs that have changed
(intra/inter)
> >need to be copied directly over the ref frame.
>
> hmm, this should offer significant improvement. dont forget that bframe
> decoding needs copies of the two most recent refenence frames.
>
> cache really comes into play here. so including the imageout stage inside
> the decoder stage is a sacrifice i'am happy to make.
> i willing to bet macroblock-level is faster. mbs are always 16x16 and a
> mmx/sse2 transfer routine can be written specifically for the task.

Hm, I wonder how big the difference in speed might be in reality between
copying the mbs "in place" (within the decoder functions) or later in
image_out. This should be easy to figure out by testing. If there's really a
significant improvement by integrating the copy functions into
mb_intra/mb_inter, this scheme is fine for me too...

> >Also with the help of the status variable, the "dirty" blocks could then
be
> >copied during image_output using a copy_slice like function...
> >
> >Again: I didn't test this yet, maybe I missed something and it won't
work.
> >I
> >have still some uni work to do now and tomorrow before I can test this.
> >
> >btw: pete, you wrote that you wanted to rewrite the bframe decoding
> >support - anything done already?
>
> some design only; this weekend flew past rather quickly.
>
> basically i want to fix chemns code such that bframes can be decoded
without
> "unnecessary" delay.
>
> there are three types of avis out there.
> 1. xvid/divx4/divx5  : no bframes, low_delay is not specified
> 2. divx500+bframes   : bframes (unpacked), low_delay is not specified
> 3. divx501+bframes   : bframes (packed), low_delay is not specified,
> includes divx 'p' identifier string
>
> notes:
> - the mpeg4 iso docs say, if low_delay is not specified then we must
assume
> low_delay = 0.
> - there is no way to distinguish type 2 from type 1, (other than seeing a
> bframe to decoder)
> - xvid+bframes always specifies the low_delay.
>
> the problem is: some decoder frontends can handle delay whilst other
can't.
> my solution is to add a "use_decoder_delay" global flag which changes the
> way xvid handles bframes.
>
> when enable_decoder_delay = true:
> we will have a "output_valid" flag in the DEC_FRAME struct.
> this is set by the decoder, and tells the frontend to ignore/dont-display
> the output frame.
>
> when enable_decoder_delay = false:
> basically, xvid will do its best to decode the frame without delay.
> if low_delay is not specified we will assume its low_delay=1 (unless the
> divx 'p' identifier is detected, or we come across a bframe).
> this ensures 100% compatibility with types 1 and 3, whilst being roughly
> compatible with type 2.
>
> what do you think?

sounds very reasonable

> > > btw, iam quite pleased to here xvid's decoder is fast.
> >
> >yesterday I committed some additional asm code that was again tweaked by
> >Skal. He especially enhanced the dequant4_mmx code, it's now nearly twice
> >as
> >fast than before (!!). I think we should now always beat ffmpeg and there
> >still seems to be some space left for improvements :-)
> >
>
> wow, that mmx'ing mpeg-4 quantizers was a real headache. this skal person
is
> extrememly talented.

yes, he really is. He's exactly the guy we need. I think none of us (pete
and me) believe we're extraordinarily good asm coders, but it's just amazing
to see how much there can be still improved by a real skilled guy like skal.
He just recently sent me some further optimized quant code and the new
quant4_mmx code is 70% (!) faster now. And I really thought the old code
wasn't that bad...

bye
Michael