Frame91 Frame91 - 2 months ago 10
Android Question

Perspective Projection in Android in an augmented reality application

Currently I'm writing an augmented reality app and I have some problems to get the objects on my screen. It's very frustrating for me that I'm not able to transform gps-points to the correspending screen-points on my android device. I've read many articles and many other posts on stackoverflow (I've already asked similar questions) but I still need your help.

I did the perspective projection which is explained in wikipedia.

What do I have to do with the result of the perspective projection to get the resulting screenpoint?

Answer

The Wikipedia article also confused me when I read it some time ago. Here is my attempt to explain it differently:


The Situation

Let's simplify the situation. We have:

  • Our projected point D(x,y,z) - what you call relativePositionX|Y|Z
  • An image plane of size w * h
  • A half-angle of view α

... and we want:

  • The coordinates of B in the image plane (let's call them X and Y)

A schema for the X-screen-coordinates:

E is the position of our "eye" in this configuration, which I chose as origin to simplify.

The focal length f can be estimated knowing that:

  • tan(α) = (w/2) / f (1)

A bit of Geometry

You can see on the picture that the triangles ECD and EBM are similar, so using the Side-Splitter Theorem, we get:

  • MB / CD = EM / EC <=> X / x = f / z (2)

With both (1) and (2), we now have:

  • X = (x / z) * ( (w / 2) / tan(α) )

If we go back to the notation used in the Wikipedia article, our equation is equivalent to:

  • b_x = (d_x / d_z) * r_z

You can notice we are missing the multiplication by s_x / r_x. This is because in our case, the "display size" and the "recording surface" are the same, so s_x / r_x = 1.

Note: Same reasoning for Y.


Practical Use

Some remarks:

  • Usually, α = 90deg is used, which means tan(α) = 1. That's why this term doesn't appear in many implementations.
  • If you want to preserve the ratio of the elements you display, keep f constant for both X and Y, ie instead of calculating:

    • X = (x / z) * ( (w / 2) / tan(α) ) and Y = (y / z) * ( (h / 2) / tan(α) )

    ... do:

    • X = (x / z) * ( (min(w,h) / 2) / tan(α) ) and Y = (y / z) * ( (min(w,h) / 2) / tan(α) )

    Note: when I said that "the "display size" and the "recording surface" are the same", that wasn't quite true, and the min operation is here to compensate this approximation, adapting the square surface r to the potentially-rectangular surface s.

    Note 2: Instead of using min(w,h) / 2, Appunta uses screenRatio= (getWidth()+getHeight())/2 as you noticed. Both solutions preserve the elements ratio. The focal, and thus the angle of view, will simply be a bit different, depending on the screen's own ratio. You can actually use any function you want to define f.

  • As you may have noticed on the picture above, the screen coordinates are here defined between [-w/2 ; w/2] for X and [-h/2 ; h/2] for Y, but you probably want [0 ; w] and [0 ; h] instead. X += w/2 and Y += h/2 - Problem solved.


Conclusion

I hope this will answer your questions. I'll stay near if it needs editions.

Bye!

< Self-promotion Alert > I actually made some time ago an article about 3D projection and rendering. The implementation is in Javascript, but it should be quite easy to translate.