Bebbo Bebbo - 1 month ago 15
HTML Question

How to calculate the widths of multi-column (colspan) HTML-tables?

I am trying to parse an HTML-table into LaTeX code (using

longtabu
as it supports custom column width settings) in a Java program I am writing. My code was running quite stable and the output seemed quite OK until just now. I have to support the table's
colspan
-feature (I am skipping rowspan for now) and that is where the problem lies. The table that is causing problems looks something like this:

<table>
<tr>
<td width="385" colspan="3">
Content
</td>
<td width="359" colspan="3">
Content
</td>
<td width="151">
Content
</td>
</tr>
<tr>
<td width="24">
Content
</td>
<td width="361" colspan="2">
Content
</td>
<td width="359" colspan="3">
Content
</td>
<td width="151">
Content
</td>
</tr>
<tr>
<td width="24">
Content
</td>
<td width="276">
Content
</td>
<td width="85">
Content
</td>
<td width="198" colspan="2">
Content
</td>
<td width="161">
Content
</td>
<td width="151">
Content
</td>
</tr>




I identified the problem in the fact, that none of the table rows defines all of the column-widths.

In my understanding I would need a system of linear equations to solve the calculation of the width of the single columns... am I right or have a I missed something?

What would be the best approach to solve such an equations system in Java?

Answer

Assuming that the source table is not over constrained, underconstraint, nor inconsistently constrained, I would recommend:

  • Define a fact table that lists the known width for each column as it is determined
  • Define a collection of Constraint object for each colspan entry that specify the starting column, the column span, and total width.
  • Make a pass through the entire table definition collecting facts and constraints.
  • Then make a pass through the fact table, and for each column that is not defined, run through all of the constraints and see if there is a constraint over a set of columns for which all of the other columns are defined. Such a constraint will produce a value for the currently considered column.
  • Every time a new column value is discovered, you start back at the beginning of the fact table, scan for unknown columns, and for each scan the entire constraint set again.

This is an n-squared (or worse) algorithm, but should be fine as long as the table does not have ten thousand rows or columns. If the table is correctly constrained you will reach a point where all the column widths are defined. The advantage of a brute-force algorithm such as this is that it is relatively easy to debug and should be stable.

If the table is under-constrained, you reach a point where you make a pass, and there remain uncalculated column widths. If you want to handle this, you add another pass, and this time take an arbitrary constraint that involves the uncalculated table column, which also must include one or more other uncalculated table columns, and allocate the remaining space equally across all the uncalculated columns in the constraint. Since this is an arbitrary constraint, you may get a different answer on different runs ... but the table is under constrained ... does it matter?

When done, you have a complete fact table with all the column widths, and you can then generate your LaTeX code with all table columns specified.