Dynamic Programming

Last weekend, I had the opportunity to confront a dynamic programming problem. I always liked dynamic programming algorithms but didn't understand how people came up with them.

My problem was the following: given a piece of text, how do you lay it out in a shape as close as possible to a square? For example, this paragraph would be rendered as:

  My problem was the 
  following: given a piece 
  of text, how do you lay it 
  out in a shape as close as 
  possible to a square? For 
  example, this paragraph 
  would be rendered as:

Here, the maximum width of the rows is approximately equal to the height of the whole paragraph. This is what I wanted.

First thing to do was to separate the text into words, which is pretty straightforward. Then I got the width of each word from the GUI library, since I can't work on length alone. I called the width array W, where W[i] is the width of the word #i, in pixels.

After that, I figured that the following table would be useful. As we are building rows, we add the widths of the words together to find the width of the row. If we tried to do this by trial and error, we would end up adding lots of numbers over and over again. I could store the result of these additions in a table. I don't have to but it would be nice.

Our first table, A is a two dimensional array of integers. The first axis is the number of the word starting a row. The second axis is the number of the word ending the row. The value is the width of the row given the starting and ending words. For example A[3][5] would be the width of a row starting with word #3 and ending with word #5. Here, the word #5 is included in the row. Building this table is pretty straightforward.

  for(i=0;i<N;i++) {
    A[i][i]= W[i];
    for(j=i+1;j<N;j++)
      A[i][j]= A[i][j-1] + width_of_space + W[j];
  }

As I understand it, the first step of creating a dynamic programming algorithm is to decide what you want to compute. In other words, you need to approach the problem in a way that enables you to solve bigger problems using solutions of smaller problems.

My initial instinct was to do a search using the array A. This turned out to require O(2^N) time, but was useful in directing me to a solution. Assume that we're doing a search on the array A. When we have came to the word #k, we have used R rows and have a current maximum width of W. Now we start a row at word #k and use 1 row or 2 rows etc. I realised that when we're at this position, it's sufficient to calculate the case with 1 row only once, the case with 2 rows only once etc. No matter how we came to word #k (using 1 row or 2 rows .. etc), the solution starting from word #k is always the same, it doesn't affect our search from this point on.

So I thought I could cache the results of these computations. I would have a two dimensional array of integers, called M. The first axis would correspond to a word number. So, M[i] would correspond to the situation where word #i starts a row, consistent with our search algorithm. The second axis would be the number of rows used from this point on. M[i][1] would correspond to the situation where a row starts from word #i and the rest of the text is put on the same row. M[i][2] corresponds to the same situtation, but the text scattered over two rows etc. The value stored at M[i][j] would be the minimum width required in order to render the rest of the text with j rows and starting from word #i.

I then realized that the array M was in fact the solution to the problem. M[0] would have the minimum required width for all choices of number of rows. I could select the best one from M[0] and then proceed from there to form a complete solution. So, I set out to compute M without doing any search at all.

So, I tried to find a mathematical model for the elements of M. For the last word, there are no choices at all. Therefore, its value is already known.

  M[N-1][1]= A[N-1][N-1] = W[N-1]

Since we can't divide the last word into two, this is the only valid value in M[N-1]. We observe that for the one-row situation, the first half of the above equation holds true for all words. If you start a row with a word and insist that all following text to be put on the same row, then the minimum width of that row is the same as the addition of widths of all words from the chosen word until the end of the text.

  M[k][1]= A[k][N-1]

Another observation is that, if we start a row from word k, then the maximum number of rows needed (from that point on) is N-k. So,

  M[k][N-k+p] = invalid

for all positive p. Now we can proceed with the actual calculation. Let's assume that we already know M[k][r], the minimum width required for the text starting at word k, using r rows. What can we deduce from here?

We know that the previous row would end at word k-1. It could start at any word from 0 to k-1. We would be working with r+1 rows: we have r rows already starting from word k and would have one more (the previous one). So, we can update the M matrix as follows:

   for(j=k-1;j>=0;j--)
     M[j][r+1]= min( M[j][r+1],  max(A[j][k-1], M[k][r]) );

Here, we update the solution to M[j][r+1] only if the new solution (which has the next row starting at k) is better than the known solution. We fill the matrix with sentinels before this so that every element in it gets a sensible value. The second term is width of the combination of the previous row with the set of rows calculated so far. M[k][r] is the width of the rows calculated. A[j][k-1] is the width of the row starting at word j and ending at word k-1. So, the maximum of these two gives us the width of their combination.

We will do the above loop for all row-choices of word k. So, we wrap it in another loop:

  for(r=1;r<=n-k;r++)
   for(j=k-1;j>=0;j--)
     M[j][r+1]= min( M[j][r+1],  max(A[j][k-1], M[k][r]) );

We need to process all words in this manner. We need another loop to do that. However, this loop shall start from the end. At the start of the algorithm only M[N-1] is fully known. The others contain sentinels and one value at column #1. Therefore this loop needs to go thru the words in backwards order. It will also not process word #0 since there is nothing we can do with the values of M[0].

 for(k=n-1;k>0;k--)
  for(r=1;r<=n-k;r++)
   for(j=k-1;j>=0;j--)
     M[j][r+1]= min( M[j][r+1],  max(A[j][k-1], M[k][r]) );

Here, we could probably exchange the order of the two inner loops but it makes more sense to process word j for given M[k][r]. In any case, the matrix row M[0] holds all the solutions when the code ends.

However, there is one shortcoming. Although we know the minimum widths required, there is no way to find out which selections of row-start words make these happen. To find that out, we need to store our solution when constructing the matrix M. I'll make a new two dimensional array Y. Its axises are the same as M. However, the value holds the number of the word ending the row.

 for(k=n-1;k>0;k--)
  for(r=1;r<=n-k;r++)
   for(j=k-1;j>=0;j--) {
     combined= max(A[j][k-1], M[k][r]);
     if (combined<M[j][r+1]) {
       M[j][r+1]= combined;
       Y[j][r+1]= k-1;
     }
   }

Now we have the optimal width for r rows in M[k][r] and the end of the row first for that solution in Y[k][r]. The start of the row is of course, k.

All that is left is to find the best solution using M[0] and extract the list of row-start words. Since we are aiming for a most-square layout, we will try to minimize the circumference of the resulting layout rectangle. Other aspect ratios can be forced by using a different formula for the circumference. i.e. by multiplying the row_height or M[0][i] with a constant.

  best= M[0][1]+row_height*1;
  nrows= 1;
  for(i=2;i<=n;i++) {
    half_circ= M[0][i]+row_height*i;
    if (half_circ<best) {
      best= half_circ;
      nrows= i;
    }
  }

Once we have selected the optimum row width from M[0], we can proceed with the construction of the rows. The first one is of course at word 0. The resst can easily be extracted from the Y matrix.

  nsol= 0;
  sol[nsol++]= 0;

  while(nrows) {
    sol[nsol]= Y[sol[nsol-1]][nrows]+1;
    nrows--; nsol++;
  }
  return nsol-1;

That's it for my problem. Now I have a couple of observations regarding dynamic programming:

If the solution to a problem can be constructed using all solutions to all subproblems, then it probably can be solved using dynamic programming.
When dealing with a new algorithm, try to make a search first. You may observe some stuff never changes and you could memoize them. This could lead to a dynamic programming solution.
When you're trying to compute something, try to write it down as a formal mathematical model as much as you can. Writing different instantiations of your model on paper for different values makes common patterns very obvious.

Here is the uuencode of my implementation.

begin 644 square.c.gz
M'XL("(JPM$\``W-Q=6%R92YC`.U8ZU/;1A#_C/Z*+9D,EB6(SX9`,4J'M$TG
M&/.!Z4P&7`^CQ]D^$!*1Y#ANPO_>W7M(9^.DF7S*AW@2^[2WC]\^;I?3,Y'%
MZ3SA<%)6B<CW9J^<9S8I%=$ZK1#9E&C.BS;\Q3->A!5/(%I"P;LQ=/98;Z\'
M>09#^A\N@>T#.SSN'.`_Z'98%]HOG&>IR#@PV%[D15+NQ<4V:G/^GHD2)B+E
M\!`6)2\A!&4.1%;E('DAS!*T5,V+K(1JQG%=SM,*.7`'PJ((ETX^T7+E'KS=
MN8=*%(@0J0D/4U*W$-6,6*K\3NI;S,(JRRLH<\D_S9')08M)CB804Y[%7`E)
M'TD$MT0%A9C.JCT'D7,-"@T]H'C%"Q1(4_4`J"N$*,WC.X)QS^_S8HE613R#
M.,^J4&2E$^6HGSS2\LI5(M3.7.5SB-'-VWE9P:3@7$=`&7:,8;0F-_D'7BS1
M`<R7X[Q3T2M0'<?PFJ0AB(J7#V',]^`4\3;/$,^0+59("NYP%,E@_D!2,BJ3
M>197`K,\SU)>RFPL(8_C>2%CA1E)\GF$V7P_SRON/(2BH(1@<O)[+F$!:IF*
M#ZAVC;GT25LFW5O,\I0[]BY&A7^L2+@*[X@KAXA3L6C'9*6@RUA]2$]#BLE;
M&;A9^($8)VDXA4E>K/F!>G@6HID7B2CIUY'[9)/`3C`"<XQ?+B,<)@ED?%'O
M-M':<ZC"O_44.20([7;YD(KJ1M9X2Y'*(G:=3PY0]1/:FP++-X!.'TE:"%/?
M/*'2&TQ4V-<BF5066/R8FC(O?&C?A\4=+WP4"0O40`QR*P`T2OQR(]!49`BG
M6*/'CCZXO:/ZY&XCPBVI?;F,9\BY1;\0&&-]9ZO$:D!2BS9<(/:PY-#YV.D<
M;TUS#.5RR1!D36;'S;IKK7O6>M]:'UCKE];ZT%H?6>M?K?6IM7YMK7^WUG]8
MZS^M]9MFS3K6VL+/+/S,PL\L_,S"SRS\S,+/+/S,PL\L_,S"SRS\S,+/+/RL
MQK\#.PWY\$V=E:Y)RL[V3DT\0&+")R$VW9IVA+1'!P50C>?5B?_V2OB9\A\H
MY>O9W=?910?4\=_OV(-[Z]-:MP`I)SM&'Q[-K._VK)Z!983F51?""OFNFFFZ
MQ^'/YO'C51+;W]0]V*8".]`%=E@76'>MP"1CDLOQ:-74RZ.5FCKZWO9C)>!G
M`?TH!73XE4$C/6Y2K:NF]X6JX?FD*9HN6VU$;&UHF2)IK9:/^[-@?O2"8;VO
M54RO;B[LFYI+]^#E:IWL?W]WJ1&^W-@3-U4Z,U-7Q5@:07-J9.)6<TA(P?]`
MLP)DW-O?MZ/@:/>/Z>HP@9:Y;9`?`.HZ<6X&_*Z^-=`&7F7CAV7+7#U\]:>`
M#^>NVL<+RDC=0\8!V!<4J)]&YV-]J6EH7@#G'B/:(^"7TN!Y&VXFL/+7AJ-.
M^P8W5I%T^OK63#_]1T<"#4!>O-IM%\.<XF6]I>TBJI1G+;J/>5"*?]&"XG3;
MAH6YK@*G\!M5+NF%+POU:^<DIN:2)WVW[V3&;WT_LYS&E#H.):B-U[H[?G,?
MXN7R8TO=`-4%$M3V:6`<TWB0C'"0JV^8A%PACI;`BZ,XR19]X7DNG(X$ABU>
MEW9],-(ZGJ<2#ZE"YGA.E^^;DK\O6QJ"+\U<J)_VN^:"*_Q;TM.8OE"650&2
M?0GA'7ZK6B'.VT!@F=PB[RWQ2KIFOD5FM=AE8X]Y[Y"B"PH!?LA%`@]X":]N
M*KKIK\'[?U@:EU0Q:6T_[R;'L.T+U\;6>8+,L,/S7K+M&Z!:R&S^DVU+RJ.)
MY'VHTAGZ]!WI:J9HAR<1_`81'$-(G4N7P=!OMZ_ZVDE*PTW$R^I&OAQ;3\2*
MI_[`O_3/_&OY1F!(U=+4TP65*UQM(#J;XM-$(-`A&.JL[,K:'HXN,"\C)K,D
ME_2%NC;6@!)F)J62%:X,#1^E]TIR$-#SX%6G/]C==1V3*;EW&;#^)2+:'?0O
M4;':T/N:Y2P8H/C9*[1_AO)0;S9L^+D.9%:&H\%X=#FF3)Z-1RAH<JD_U(2&
MM'7I(5!T'3Y_ANN3FN3:S"L&@$)D!.&ZO[9WU>P-5$#KSZ.SMGJ4M50?RUF>
MEUQ61$L=15D0$?:=N]*J!O,6Z77`.OBI7RH5^:+T84#/KP/$V*$<>+TVID#N
MZ=ZE<MC%'`8JB4X3CPX=9101)Z\5N?;<:%3;M5M:L28\&BB1GAL*^RB+/$^-
M$J<Y3%KT>8*GRE>*9(KPWV(F\.`KDH8P"#"RM3I,YW@D]W7+634T\'3@)<ON
MKCJS^!BIM3ZC6=2TZ'MLV"W=D^5+L^FHB\$=U_UW0.W&44@F:"UI37U@/C'Y
M](8^4_4UQ;HS0U._"I13I-:#![Q9+T12S9H]4:^R1D@/(?LMY%0=;9G*^EVB
M_#6#M&\&LPN*55G:,&9HRA"K^V2^2`3RE"MAV>7UQ%7&Q%@#.5UM/K;&M6%S
MZFO??*U5<=D-OV;1RI_TR14&JZ*&IC^OJ!L:;MDF#>^5Y%UAO/)7IC]8I^_K
M<:N3%DDYJG[[+&M_]3E6[/4AT-E]GN@2IEEE%-O9B%0FK*F&C+KJ*0U/9M1:
M-J/ZJ.LC38!-\Z+#+WX)Z%RYU$"-7H^:.4_QKV(D6B5IQFAM'X?)M9PEIB_4
;.$N$J4JX&:=/L*I^J$]EAP[E?WT?Y9O^&@``
`
end