Gaussian Elimination

Dividing 1 by 3 in binary is a favorite pastime of mine. I sometimes do it accidentally, and then scratch my head for hours until I realize what I did. The same thing happened this time, come along and read on.

I was trying to fit an ellipse to a set of integer coordinates. It turned out that I needed to solve some linear equations. Well, what better way to do it than good old Gaussian elimination?

Initial Implementation

So I set out and wrote something. How precious DD
static void swaprows(int N, double *M, int A, int B)
{
  double *xA, *xB, T;
  int i;
  xA= M + A*(N+1);
  xB= M + B*(N+1);
  for(i=0;i<=N;i++) T= xA[i], xA[i]= xB[i], xB[i]= T;
}

static void muladd(int N, double *M, int D, double F, int S)
{
  double *xD, *xS;
  int i;
  if (F==0.0) return ;
  xD= M+ D*(N+1);
  xS= M+ S*(N+1);
  for(i=S;i<=N;i++) xD[i] += F * xS[i];
}

int gausselim(double *R, int N, double *A)
{
  int col;
  int W;
  int i;
  W= N+1; // width of the matrix

  for(col=0;col<N-1;col++)
  {
    for(i=col;i<N;i++) if (A[i*W+col]!=0.0) break;
    if (i==N) return -1; // system is independent of this variable.
    if (i!=col) swaprows(N, A, i, col);
    for(i=col+1;i<N;i++)
    {
      muladd(N, A, i, -A[i*W+col]/A[col*W+col], col);
    }
  }

  // back substitution
  for(col=N-1;col>=0;col--)
  {
    double V;
    if (A[W*col+col]==0.0)
    {
      if (A[W*col+N]==0.0) return -1; // we got 0*var=0 
                      else return  0; // we got 0*var=something else 
    }
    V= (A[W*col+N] /= A[W*col+col]);
    A[W*col+col]= 1.0;
    for(i=col-1;i>=0;i--)
    {
      A[i*W+N] -= A[i*W+col]*V;
      A[i*W+col]= 0.0;
    }
  } 
  /** we should now have a diagonal 1-matrix in A, 
      followed by the results, like so:

         1 0 0 | x
         0 1 0 | y
         0 0 1 | z
   **/
  for(i=0;i<N;i++) R[i]= A[i*W+N];
  return 1;
}
It is pretty straightforward, but a little explanation doesn't hurt. A is laid out as seen on the screen. Rows first, with N+1 columns per row. Accessing an element of A is done by
  A[ r*(N+1) + c ]
where r is the row index and c is the column index, based at 0.

R is just a vector of N numbers to contain the results.

The algorithm has two parts: elimination and back-substitution. Let's talk about the first. Our goal is to obtain a matrix of the following form during this phase:

    a  .. .. .. .. .....
    0  b  .. .. .. .....
    0  0  c  .. .. .....
    0  0  0  d  .. .....
    ....................
    ....................
The variable 'col' specifies which column is to be eliminated from the rest of the matrix. It advances from 0 to N-2. Within the loop, we first locate a suitable row:
    for(i=col;i<N;i++) if (A[i*W+col]!=0.0) break;
    if (i==N) return -1; // system is independent of this variable.
    if (i!=col) swaprows(N, A, i, col);
Here, we find a row which doesn't have 0 at column 'col'. We only consider rows which have not been processed yet because these are the only ones whose elements are all zeroes until column 'col'. The ones above have non-zero entries for previous columns. When we find one, we put it at the row which has the same index as 'col'. This way, the diagonal is starting to form.

In the second loop, we do the actual elimination

    for(i=col+1;i<N;i++)
    {
      muladd(N, A, i, -A[i*W+col]/A[col*W+col], col);
    }
Here, we multiply the pivot row with -A[i,col]/A[col,col] and add it to row i. So, the row i gets its element at 'col' zeroed. Since all other columns before 'col' were also zeroed by previous iterations, our algorithm advances.
   A[i] =  A[i]  +   -A[i,col]/A[col,col] * A[col]
 
                  expanded
             vvvvvvvvvvvvvvvvvvvvvvv
     
                                     these will cancel out
                                             /        \
     A[i,col]= A[i,col] +    -A[i,col]/A[col,col] * A[col,col]
   A[i,col+1]= A[i,col+1] +  -A[i,col]/A[col,col] * A[col,col+1]
   A[i,col+2]= A[i,col+2] +  -A[i,col]/A[col,col] * A[col,col+2]
   ...

                these cancel out too
                  /             \
     A[i,col]= A[i,col] +    -A[i,col]
   A[i,col+1]= A[i,col+1] +  -A[i,col]/A[col,col] * A[col,col+1]
   A[i,col+2]= A[i,col+2] +  -A[i,col]/A[col,col] * A[col,col+2]
   ...

     A[i,col]= 0
   A[i,col+1]= A[i,col+1] +  -A[i,col]/A[col,col] * A[col,col+1]
   A[i,col+2]= A[i,col+2] +  -A[i,col]/A[col,col] * A[col,col+2]
   ...
We can now move on to back-substituion. At the end of elimination, we will have the following at the bottom row.
  0 0 ... 0 a | b
This means that the last variable of our system has a very simple equation:
   a*x = b
We now simply compute x as b/a and substitute its value in the rest of the matrix. However, we can't do that if a is 0. There are two cases if this happens. We will have one of the two equations below:
    0 * x  =   0
       or
    0 * x  =   b,  b≠0
In the first case, we have an infinite number of solutions because we could substitute anything for x and the system would still be satisfied.

For the second case, there is no solution that satisfies x. Therefore the whole system has no solution.

    if (A[W*col+col]==0.0)
    {
      if (A[W*col+N]==0.0) return -1; // we got 0*var=0 
                      else return  0; // we got 0*var=something else 
    }
This of course applies to all variables, therefore we do this check at every row. Next we fix up the current row by computing b/a and setting the column value to 1.
    V= (A[W*col+N] /= A[W*col+col]);
    A[W*col+col]= 1.0;
Now, A[col,N] has the solution for variablecol and A[col,col] has 1. Rest of the row are all zeroes, because of the previous stage. We now propagate this value to all the rows above so that this column is zeroed for them.
    for(i=col-1;i>=0;i--)
    {
      A[i*W+N] -= A[i*W+col]*V;
      A[i*W+col]= 0.0;
    }
Here is what happens at these two lines. We have a bunch of equations above the current row, all involving our current variable x. We have found its value to be V. So we substitute it.
  [previous row's equation]

       ... .. +  U.x + ... = Y

   replace   x by V

       ... .. +  U.V + ... = Y

   move to the other side

       ... .. +  0   + ... = Y - U.V
We do this back-substitution starting from the bottom, moving to the top and reach the final form of our matrix:
   1  0  . ..  0  |  A
   0  1  . ..  0  |  B
   ...            ..
   0  0  . ..  1  |  K
A,B .. K is our final solution vector.

Not So Fast

While I was playing with it, I came across this infinite-solution example:
 -3    2  -5   -14  
  2   -3   4    10  
  1    1   1     4   
If you do it by hand, you will realize that the last row becomes all zeroes after the elimination step. However, it did not happen with my implementation. At the end of the row, there were two very small numbers. These were remnants of errors resulting from divisions and subtractions. As you guessed it, yes, a division by 3 was involved.

When this happens, the system fails to recognize that these numbers should actually be zero and continues processing them normally, generating solutions when there are none or infinitely many.

The culprit are the two lines, one in the main algorithm and one in the muladd routine:

[in main]
  muladd(N, A, i, -A[i*W+col]/A[col*W+col], col);

[in muladd]
  for(i=S;i<=N;i++) xD[i] += F * xS[i];
When we divide a floating point number with another, the result isn't always precise. We might lose some lower bits because of finite precision. The result of some divisions are infinite, such as 1/3 in binary or decimal. This causes tiny errors to be accumulated. In the above example, in the call to muladd,
         A[i, col]
   F  =  ---------- * -1
         A[col,col]

in the first iteration of the muladd loop, we get
   A[D, S] = A[D,S] + F * A[S,S]

D is i in the outer function and S is col. so, we compute

                           A[i, col]
   A[i, col] = A[i,col] +  ---------   * -1 * A[col, col]
                           A[col,col]

Normally, A[i,col] should be computed as zero, but F*A[col,col] doesn't give exactly -A[i,col] because F has some missing bits. Therefore, the sum doesn't result in zero.

No Divisions Then

If divisions are a problem, why don't we get rid of them :) We could, but now we risk creating too big integers that won't fit into the mantissa of a double floating point number.

One possible solution is to do the division after the multiplication:

muldivsub( T , S, col):
  T =  T - T[col] * S / S[col]

[in main]
   for(i=col+1;i<N;i++)
      muldivsub( A[i], A[col], col)
However, this doesn't solve the problem either. We could do one more step and move the division to the outermost level:
muldivsub( T , S, col):
  T =  (T*S[col] - T[col] * S ) / S[col]
This will definitely succeed in zeroing the target column, but it still introduces errors to other elements due to the unchecked division. We could eliminate it altogether:
muldivsub( T , S, col):  // no longer div :)
  T =  T*S[col] - T[col] * S
Since multiplying a row with a scalar doesn't change the system, we're allowed to do this. This does work OK, but eventually we might end up with very big numbers that don't fit into the mantissa.

Final Form

In the above muldivsub routine, we could do SOME division to keep the numbers from overflowing the mantissa field of the double-fp numbers. We can choose any non-zero number, multiplying or dividing by this number will not effect the validity of the system as long as we don't lose bits doing it.

How to make sure that we don't lose bits? The answer lies in the structure of a double floating point number. These numbers are just integers with some exponent of 2 tacked along. i.e. a number such as 0.3923 is an integer multiple of a 2s power:

    floating-point = sign * integer_mantissa * 2^integer_exponent
All the division and multiplication is done using the mantissa component. The exponent part determines where the binary fraction dot is. When a multiplication or division overflows, the exponent is adjusted to keep the highest bits in the mantissa.

To make sure that we don't overflow the mantissa while doing a division, we must use a factor of the mantissa. Doing this for all the mantissa values in the row, we arrive at the requirement:

muldivsub( T , S, col):  // no longer div :)
  T =  T*S[col] - T[col] * S
  T /= GCD(mantissa(T))
So, if there is a useful integer that divides all the mantissa values cleanly, then we do it. We need to extract the mantissa first:
int mantd(double f, uint64_t *M)
{
  double x;
  int e;
  uint64_t P;
  if (f<0) f= -f; // I don't believe in 'negative' numbers
  x= frexp(f, &e);
  switch(fpclassify(x))
  {
  case FP_NAN: case FP_INFINITE: return 1;
  case FP_ZERO: *M= 0; return 0;
  }
  P= ldexp(x, DBL_MANT_DIG+1);
  while((P&1)==0) P>>=1;
  *M= P;
  return 0;
}
Note that we're getting rid of 2^n factors within P. This is because we just want the actual bits in the mantissa, not the filler 0s padded at the end to fit it in the double-fp format. If we left those in, we would have them for all small integers. Then these numbers would seem to have a nice GCD, which happens to be 2^n. We would then do some divisions by that number which accomplishes absolutely nothing.

You probably don't need a GCD implementation if you have come so far, but I'll provide one for completeness.

uint64_t gcdq(uint64_t a, uint64_t b)
{
  uint64_t t;
  if (a<b) t= a, a=b, b=t;
  if (b<2) return 1;
  while(b!=0) t= a%b, a= b, b= t;
  return a;
}
With all these in place, here is my final version. A little bit of cleanup is necessary. It'd be especially good if I could put them all into one function.
#include <stdio.h>
#include <math.h>
#include <stdint.h>
#include <float.h>
#include <stdlib.h>

uint64_t gcdq(uint64_t a, uint64_t b)
{
  uint64_t t;
  if (a<b) t= a, a=b, b=t;
  if (b<2) return 1;
  while(b!=0) t= a%b, a= b, b= t;
  return a;
}

int mantd(double f, uint64_t *M)
{
  double x; int e;
  uint64_t P;
  if (f<0) f= -f;
  x= frexp(f, &e);
  switch(fpclassify(x))
  {
  case FP_NAN: case FP_INFINITE: return 1;
  case FP_ZERO: *M= 0; return 0;
  }
  P= ldexp(x, DBL_MANT_DIG+1);
  while((P&1)==0) P>>=1;
  *M= P;
  return 0;
}


static void swaprows(int N, double *M, int A, int B)
{
  double *xA, *xB, T;
  int i;
  xA= M + A*(N+1);
  xB= M + B*(N+1);
  for(i=0;i<=N;i++) T= *xA, *xA= *xB, *xB= T, xA++, xB++;
}

static void muldivsub(int N, double *M, int T, double F, double D, int S)
{
  double *xT, *xS;
  int i;

  if (D==0.0 || F==0.0) return ;

  xT= M+ T*(N+1);
  xS= M+ S*(N+1);
  for(i=S;i<=N;i++) xT[i] = (xT[i]*D - F * xS[i]);

  uint64_t gg= 0;

  xT= M+ T*(N+1);
  for(i=S;i<=N;i++)
    if (xT[i]!=0.0) 
    {
      uint64_t man;
      if (mantd(xT[i], &man)) return ;
      if (!gg) gg= man;
          else gg= gcdq(gg, man);
      if (gg<2) return ;
    }

  if (gg<2) return ;
  for(i=S;i<=N;i++) xT[i] /= gg;
}

/**
  A is the augmented matrix representation of the linear
  system. It has N rows and N+1 columns.

  R is the result, a vector of N numbers.

  The return value is 
     1 if there is a solution
     0 if there is no solution
    -1 if there are inifinitely many solutions
 **/
int gausselim(double *R, int N, double *A)
{
  int col;
  int W;
  int i;
  W= N+1; // width of the matrix

  for(col=0;col<N-1;col++)
  {
    for(i=col;i<N;i++) if (A[i*W+col]!=0.0) break;
    if (i==N) return -1; // system is independent of this variable.
    if (i!=col) swaprows(N, A, i, col);
    for(i=col+1;i<N;i++)
    {
      muldivsub(N, A, i, A[i*W+col], A[col*W+col], col);
    }
  }

  // back substitution
  for(col=N-1;col>=0;col--)
  {
    double V;
    if (A[W*col+col]==0.0)
    {
      if (A[W*col+N]==0.0) return -1; // we got 0*var =0 
                      else return  0; // we got 0*var =something else 
    }
    V= (A[W*col+N] /= A[W*col+col]);
    A[W*col+col]= 1.0;
    for(i=col-1;i>=0;i--)
    {
      A[i*W+N] -= A[i*W+col]*V;
      A[i*W+col]= 0.0;
    }
  } 

  /** we should now have a diagonal 1-matrix in A, 
      followed by the results */
  for(i=0;i<N;i++)
    R[i]= A[i*W+N];
  return 1;
}
This should work OK for the purpose it was intended: solving equations involving integer screen coordinates. These are typically below 2K, with at most 11 bits in them. Multiplying 5 of these would give us 55 bits, which is slightly over mantissa bits of a double (52). I'd have to be very unlucky to encounter a situation where all of the 5 eliminations end up generating mutually-prime numbers at the bottom row. For all the other rows, the integers would fit in the mantissa.

If I encounter this, I'll probably make a special function just to process the last row. There I can divide the operation into parts and analyse further.

For the general case, I don't think there is a viable option other than symbolic programming. If the input is not constrained, there is no telling how many bits will be required to handle all numbers precisely. If precision is lost, we're in trouble because we're not doing simple arithmetic, decisions are being made based on the values :)

One Last Word

Yeah, OK. After doing all that work I just realized I could simply use integers in the first place xD. So I converted the first phase to use integers. The results can still be fractional so I kept the output as an array of doubles.
#include <stdint.h>

/**
  M is the augmented matrix representation of the linear system. 
  It has N rows and N+1 columns.

  R is the result, a vector of N numbers.

  The return value is 
     1 if there is a solution
     0 if there is no solution
    -1 if there are inifinitely many solutions

  Upon exit M is in row echelon form.
 **/
int igausselim(double *R, int N, int64_t *M)
{
  int S, T, W, i;
  double V;
  int64_t *xT, *xS, tmp;
  int64_t Tp, Sp, val, gcd;

  W= N+1; // width of the matrix

  for(S=0;S<N-1;S++)
  {
    for(T=S;T<N;T++) if (M[T*W+S]!=0) break;
    if (T==N) return -1; // system is independent of this variable.
    if (T!=S) 
    {
      /** swap rows S and T **/
      for(xT=M + T*W, xS= M + S*W, i=0; i<=N; i++)
         tmp= xT[i], xT[i]= xS[i], xS[i]= tmp;
    }
    xS= M+ S*(N+1);
    Sp= xS[S];
    for(T=S+1;T<N;T++) 
    {
      /** eliminate **/
      xT= M+ T*(N+1);
      Tp= xT[S];
      if (Tp==0) continue; // no need, already zero
      for(i=S;i<=N;i++) xT[i] = xT[i]*Sp - Tp * xS[i];

      /** compute row gcd. xT[S] is already zero, skip it **/
      gcd= 0;
      for(i=S+1;i<=N;i++) if ((val=xT[i]))
      {
        if (val<0) val= -val;
        if (val==1) { gcd=1; break; }
        if (!gcd) { gcd= val; continue; }
        if (gcd<val) tmp= gcd, gcd= val, val= tmp;
        if (val<2) { gcd=1; break; }
        while(val!=0) tmp= gcd%val, gcd=val, val= tmp;
        if (gcd<2) break;
      }

      /** divide by gcd **/
      if (gcd>1) for(i=S+1;i<=N;i++) xT[i] /= gcd;
    }
  }

  if (M[W*(N-1)+(N-1)]==0)
  {
    if (M[W*(N-1)+N]==0) return -1;
                    else return 0;
  }

  /** initialize results **/
  for(i=0;i<N;i++) R[i]= M[i*W+N];

  /** back substitution **/
  for(S=N-1;S>=0;S--)
  {
    V= (R[S] /= M[W*S+S]);
    for(T=S-1;T>=0;T--) R[T] -= M[T*W+S]*V;
  } 

  return 1;
}
Doing so, I realized I was wrong about any row being able to contain a zero for its diagonal element. It's only the last row, which wasn't taken as a pivot during the first phase. All the others were selected to have a non-zero at that position. I fixed that now.

Here is a version without the GCD stuff. This should also work fine if your input is similar to mine. i.e. upto N=5 with numbers below 2K. It's even better than the floating point version because now we have 63 bits to store our integers.

#include <stdint.h>

/**
  M is the augmented matrix representation of the linear system. 
  It has N rows and N+1 columns.

  R is the result, a vector of N numbers.

  The return value is 
     1 if there is a solution
     0 if there is no solution
    -1 if there are inifinitely many solutions

  Upon exit M is in row echelon form.
 **/
int igausselim(double *R, int N, int64_t *M)
{
  int S, T, W, i;
  double V;
  int64_t *xT, *xS, tmp;
  int64_t Tp, Sp;

  W= N+1; // width of the matrix

  for(S=0;S<N-1;S++)
  {
    for(T=S;T<N;T++) if (M[T*W+S]!=0) break;
    if (T==N) return -1; // system is independent of this variable.
    if (T!=S) 
    {
      /** swap rows S and T **/
      for(xT=M + T*W, xS= M + S*W, i=0; i<=N; i++)
         tmp= xT[i], xT[i]= xS[i], xS[i]= tmp;
    }
    xS= M+ S*(N+1);
    Sp= xS[S];
    for(T=S+1;T<N;T++) 
    {
      /** eliminate **/
      xT= M+ T*(N+1);
      Tp= xT[S];
      if (Tp==0) continue; // no need, already zero
      for(i=S;i<=N;i++) xT[i] = xT[i]*Sp - Tp * xS[i];
    }
  }

  if (M[W*(N-1)+(N-1)]==0)
  {
    if (M[W*(N-1)+N]==0) return -1;
                    else return 0;
  }

  /** initialize results **/
  for(i=0;i<N;i++) R[i]= M[i*W+N];

  /** back substitution **/
  for(S=N-1;S>=0;S--)
  {
    V= (R[S] /= M[W*S+S]);
    for(T=S-1;T>=0;T--) R[T] -= M[T*W+S]*V;
  } 

  return 1;
}
I still can't be sure about this because division of two large integers might be less accurate than division of two smaller ones. I don't know the details of how FP-division works..

In any case, here is a package containing some test data and driver code.