Intersecting a plane with an eye ray

the PCGPE texture mapping article demystified

Sean Barrett, April 18 2015

Introduction

In 1994 I published an article in the PC Game Programmers Encyclopedia about texture mapping. One element of that article are a set of formulae sometimes called "magic cross-products" which provide a method for computing the texture coordinates for a 2D pixel based not on interpolating per-vertex texture coodinates but on computing them directly from texture coordinate basis vectors.

This method was used for performing texture mapping in the software renderers for Quake (e.g. here) and for Thief: the Dark Project (personal knowledge: I wrote it), and possibly others.

The purpose of this article is to re-introduce the method with better notation and modern coordinate systems and to show how it is derived—not because I expect people to perform software rendering now, but both for history's sake and because the technique has other applications; for example, I use it to perform hit-testing of a "3D" user-interface where the user interface elements lie in their own 2D coordinate system in a single arbitrary 3D plane.

I originally "invented" this method just to try to understand how to make texture mapping work from first principles, but it also had advantages over traditional texture-coordinates-at-vertices approach for rendering polygons with more than 3 vertices (e.g. if you were sharing the texture gradients over the whole polygon, if there were colinear vertices to avoid t-junctions, you had to avoid using three colinear vertices to compute the gradients).

Texture Mapping equations

Definitions

Suppose we have a 4x4 matrix M which is our "modelviewprojection" matrix; that is, it transforms from "object space" to "clip space", and the latter allows deriving "normalized device coordinates":

Object space is an arbitrary 3D space in which we'll define a texture-mapped polygon
Normalized device coordinates range from -1 to 1 in x and y (the definition of z does not matter)
Clip space is the space of points (x,y,z,w) such that normalized device coordinates are computed as (x/w,y/w,z/w)

Suppose we have a non-degenerate triangle mapped non-degenerately with a 2D texture map (for example, by defining texture coordinates at the vertices, or defining a projection for each of the coordinates). Then:

The triangle lies in some specific plane, which we can compute.
The non-degeneracy of the texture coordinates means that there is a one-to-one linear mapping (gradient) between every point of that plane and a set of (s,t) coordinates (points outside the triangle will use points outside of the mapping specified by the texture coordinates).
Given an input (s,t) coordinate, we can compute a corresponding point on the plane. Define the function F(s,t) to be whatever function which computes the point.

Precomputation

We can represent the mapping between (s,t) coordinates and the plane in a 3D object space with two basis vectors and a single point.

Let P' = F(0,0), that is, the point on the plane which would be texture mapped with (s,t) = (0,0).
Let S' = F(1,0) - P'; that is, S is the vector in the plane over which s changes by 1 and t doesn't change.
Let T' = F(0,1) - P', similarly to the above.

These vectors are marked as prime so that values derived from them later will be unmarked to make later equations easier to read & write.

Thus F(s,t) = P' + s S' + t T'. (Yes, I just defined F in terms of F. Shhhh.)

If these vectors are not precomputed, they can be computed at run-time, but the point of this method is to assume they're calculated previously.

Run-Time Calculation

Given the above definitions, we want to compute the (s,t) coordinates for a given input normalized-device-coordinates (x,y,z).

First we transform the texture map basis vectors to clip space (where they become 4-vectors):

P = M [P'_x,P'_y,P'_z,1]^T
S = M [S'_x,S'_y,S'_z,0]^T
T = M [T'_x,T'_y,T'_z,0]^T

Next we define three new vectors by computing some 3D cross-products after dropping the 'z' components of the vectors:

H = [ Tx Ty Tw ] ⨯ [ Px Py Pw ]
I = [ Px Py Pw ] ⨯ [ Sx Sy Sw ]
J = [ Sx Sy Sw ] ⨯ [ Tx Ty Tw ]

or, explicitly, as was shown in the original PCGPE (in a different coordinate system):

Hx = Pw Ty - Py Tw
Hy = Px Tw - Pw Tx
Hz = Py Tx - Px Ty
Ix = Sw Py - Sy Pw
Iy = Sx Pw - Sw Px
Iz = Sy Px - Sx Py
Jx = Tw Sy - Ty Sw
Jy = Tx Sw - Tw Sx
Jz = Ty Sx - Tx Sy

These values can be computed once per triangle/polygon since they are all independent of the screen coordinate.

Finally, for each pixel we compute several dot products and two divides:

a = H·[x,y,1] = x Hx + y Hy + Hz
b = I·[x,y,1] = x Ix + y Iy + Iz
c = J·[x,y,1] = x Jx + y Jy + Jz
s = a / c
t = b / c

And so we have computed correct (s,t) coordinates.

Derivation

Problem formulation:

Given the clip-space mapping of the plane above, P + sS + tT, and a screen point with normalized device coordinates (x,y,z), where we don't know z but we're not going to bother solving for it either:

The clip-space coordinates of the the point is (xw,yw,zw,w), where w is also unknown. This defines a ray through the clip space origin (which is the eye location).

Therefore, we know that (xw,yw,zw,w) = P + sS + tT.

Ignoring z, this gives us three unknowns (s, t, and w) and three equations:

P_x + s S_x + t T_x = xw
P_y + s S_y + t T_y = yw
P_w + s S_w + t T_w = w

We group terms to isolate the unknowns:

s S_x + t T_x - xw = -P_x
s S_y + t T_y - yw = -P_y
s S_w + t T_w - w = -P_w

Now we describe this in matrix form:

   [  Sx   Tx   -x   ]  [ s ]     [ -Px ]
   [  Sy   Ty   -y   ]  [ t ]  =  [ -Py ]
   [  Sw   Tw   -1   ]  [ w ]     [ -Pw ]

Define D to be the eye ray direction, (x,y,1), and drop the z component from the vectors. Then we can solve for s and t using Cramer's rule ("det" means "determinant"):

s = det( [ -P T -D ] ) / det( [ S T -D ] )
t = det( [ S -P -D ] ) / det( [ S T -D ] )

We negate each determinant:

s = -det( [ -P T -D ] ) / -det( [ S T -D ] )
t = -det( [ S -P -D ] ) / -det( [ S T -D ] )

This allows us to make the D columns positive:

s = det( [ -P T D ] ) / det( [ S T D ] )
t = det( [ S -P D ] ) / det( [ S T D ] )

We give each determinant a name, and swap columns to remove the negations:

a = det( [ -P T D ] ) = det( [ T P D ] )
b = det( [ S -P D ] ) = det( [ P S D ] )
c = det( [ S T D ] )

Since the value of a scalar triple product E⨯F·G is the same as the determinant of a matrix with those vectors as column vectors, we can rewrite the above as:

a = [ Tx Ty Tw ] ⨯ [ Px Py Pw ] · [ x y 1 ]
b = [ Px Py Pw ] ⨯ [ Sx Sy Sw ] · [ x y 1 ]
c = [ Sx Sy Sw ] ⨯ [ Tx Ty Tw ] · [ x y 1 ]

which expands out to the math described in the previous section.

Note that this derivation gives no geometric explanation for the cross-products, as the cross-products do not appear geometrically in the solution, but are just a consequence of how the determinant is computed when you favor isolating a particular column (in this case. because one column contains values that are computationally variable and the other matrix values are computationally constant). For a discussion of a geometric justification of ratios of determinants (as ratios of volumes), see Kensler and Shirley, "Optimizing Ray-Triangle Intersection via Automated Search".

Another way to view this is that we are have three (non-orthogonal) basis vectors describing the space: S, T, and D. We're computing what weighted sum of D, -S, and -T must be combined to reach P; we use D to reach the plane, then walk "back" from that point along S and T (which stay in the plane) to reach the texture coordinate origin.

This page by Christer Ericson discusses how the scalar triple product can be used to solve this non-orthogonal change-of-basis problem, which would let us skip all the derivation.

Thanks to Fabian Giesen for reviewing and helping me figure out why my first-draft derivation was crazy roundabout, and for providing the references.

home