[XviD-devel] Global Motion Compensation

Christoph Lampert xvid-devel@xvid.org
Fri, 25 Oct 2002 14:21:12 +0200 (CEST)


Hi Radek,

let's combine our effords in this, because I had similar plans:
I guess fast GMC has to work like that: First calculate global 
translation (e.g. from motion vectors or another way), then refine to
more free parameters (sheer/zoom)... 

One crucial step is _which_ motion vectors from ME to consider in getting
translation and which not. There are several ways for that, maybe we can
compare which of our ideas works best before switching to the next step. 

Do you have any fixed test clips in mind such we can compare results? 
I guess the Matrix Lobby scene isn't too bad, agreed, but we should have
other material, too. 

> As I've been thinking about GMC for a longer time, I decided to
> implement something I call 'first step to GME search' - a procedure
> which analyses normal block-based ME and looks for a camera pan in the
> picture. I realize that many people worked on it before, but I decided
> to be stupid enough _not_ to read their papers (never found any), but
> think of something myself. More fun that way ;)

It may be more fun in the beginning, but it's also dangerous because you
might follow tracks that were proven to be bad. So if you want to have 
one or two good general papers, just tell me. 
I won't tell you the URLs right now so you have a choice to say "no" ;-)


> Some of you might say: we can't do block-based ME before GMC, and
> then block-based ME again. The answer is: we already do - P/B/I
> decision is exactly a very fast version of ME and it's results seem to
> be more then enough.

Yes, current ME (e.g. fullpel) is so fast, it can easily be used as a
"preprocessing" step, too. 

> For second pass, where we don't do P/B/I decison anymore (well we do
> now, but I can't wait to have it fixed) - we can use first-pass hinted
> ME data for this function and have the same effect. We win either way.
> 
> I'll tell you something about the results found: very good. I'm pretty
> sure that the precision is +/- one pixel, which is exactly what we
> need to follow the procedure with gradient-descent approach (I also
> have some ideas about this part).
> The procedure successfully found camera panning vectors as long as 100
> pixels (fast-motion parts of 'Matrix' lobby scene) and even bigger,
> when I decimated framerate by 2 (to pretend B-frames). Also, the
> procedure doesn't act stupid and returns GMC vector of 0,0 if it
> couldn't find any panning. It only returns some crappy results in case
> of scene fades... But this is no surprise, is it.
> Flying debris, smaller moving objects, small explosions (like gunfire)
> don't seem to distract it either.
> The function itself doesn't compute any SADs or other stuff, so it's
> very fast (compared to, say, ME). I wonder how second step of GME
> (diamond, subpel refinement) is gonna be slow.
> 
> OK, I just read the letter above and now I see how hurra-optimistic it
> is. Just imagine how emberrased I'm gonna be if it's all crap...
> 
> Anyway, I can't do any real tests (finding correct camera panning is
> one thing, GMC is another), because there is no GMC support in xvid at
> all...
> Come to think of it, I don't even know what is the precision of
> this translation - halfpixels? Better? If better, do we need some
> new super-slow filters like for qpel again?

Precision can be specified up to 1/16 pel, I guess. But interpolation
is bilinear, so the routines for that are fast. Other paramters require 
more difficult matrix operations and will be much slower...

gruel