Re[4]: [XviD-devel] New Motion Estimation (from sysKin) committed to branch

Wed, 25 Sep 2002 01:15:52 +0930

Hi

> 1) do INTRA/INTER decision as early as possible: I tried yesterday and did
> an early INTRA check just before the halfpel (16) refine. Final filesize
> even got slightly smaller (But I don't think one can generalize this
> behaviour. We all know that the INTER_BIAS isn't always perfect and that
> good results can be achieved with lots of values depending on the input
> material...). Even though one saves the 4MV search + the refine16 +
> 4*refine8 for most INTRA blocks, speed was only slightly higher, don't know
> why... :-(

Well I don't expect the speed to be much faster, but it is worth
checking.

> 2) do INTER/INTER4V decision before any halfpel refine: Idea is to first do
> the normal search16, then do NO halfpel refine (instead maybe do the early
> INTRA/INTER decision mentioned above) and start the 4MV search (search8).

The point is that every 16x16 check is important for inter4v search
and inter/inter4v decision. In fact, halfpel refinement seems to be
most important of all. 16x16 refinemnt is not wasted if inter4v is
used - it's very important for the decision.
What can be done, is skipping 8x8 refinement. However, I don't believe
that it can be done while still keeping the same PSNR, because without
8x8 refinement (which can be turned off by motion flags) inter4v mode
is rarely used.
Please note that 8x8 refinement is very fast now (which desn't mean
that it will be fast for qpel, I admit).

> I quickly tried to implement this idea yesterday into SysKin's ME, but
> because lot of pointers were used (SearchData etc.) I couldn't easily keep
> track which data gets modified at all and obviously the search8 step
> destroyed some information of my earlier search16 :-(

I'm not sure how could this be done, but it is possible.
After 8x8 step in search8() you cannot use CheckCandidate16() because
it will destroy data from search8 - 8x8 blocks will have smaller SAD
then blocks computed by search8 (which are SADs + calc_delta_8) so
they will be overwritten. You can use CheckCandidate16no4v, though.
(with the same SearchData). But I really believe that any 16x16 check
which is not used for 8x8 search is kinda wasted...

> Is it that much slower
> to really store all results within SearchData instead of using pointers to
> common variables?

There is a room for optimization, I'm sure of it. However, I wanted
SeachData to be const * const type so I had to put some pointers there.
Also, things like iMinSAD, currentMV and more can have variable
lenght. Which pointers you mean exactly?

> Maybe it's a good idea not to throw away any information
> at all (best MV before refinement etc.), lots of computing power has been
> invested to calculate those intermediate results and you never know, if it
> might not be useful in the future again...

Sure, if you need them - keep them. I didn't need them so I didn't
keep them ;)
At any point you can do thing like bSAD = iMinSAD[0].

> 3) I'd like to have a small data structure where the SADs of the neighboured
> blocks (top, bottom, left, right) of the current best match are stored.
> Well, while I think about it, even better would be all 8 neighbours. This
> could allow faster qpel refinement...

Yeah, I thought about it. I think this can be done by modifying all
diamonds. It is possible to 'peek' last SAD found by last
CheckCandate16() at this moment (it's under Data->temp[0] or something
like that) and I can modify all CheckCandidate()s to support it.
It will make diamond/square a bit slower, I think...

Radek

PS: another thing to be concidered is Bframe quant ratio. We have to
decide about it, and my current opinion is that b-frames should have
current quant + 2; maybe +1 if P-frame quant is 2. Not more than +3 in
any case.
Once sequences like PBBBBP are quite possible in the movie (with
dynamic P/B) lower b-frame quant may look really bad :(