alexroat - 4 months ago 29

Python Question

I have to **port an algorithm from an Excel sheet to python code** but I have to **reverse engineer the algorithm from the Excel file**.

The Excel sheet is quite complicated, it contains many cells in which there are formulas that refer to other cells (that can also contains a formula or a constant).

My idea is to analyze with a python script the sheet building a sort of table of dependencies between cells, that is:

A1 depends on B4,C5,E7 formula: "=sqrt(B4)+C5*E7"

A2 depends on B5,C6 formula: "=sin(B5)*C6"

...

The xlrd python module allows to read an XLS workbook but at the moment I can access to the

For example, with the following code I can get simply the value of a cell:

`import xlrd`

#open the .xls file

xlsname="test.xls"

book = xlrd.open_workbook(xlsname)

#build a dictionary of the names->sheets of the book

sd={}

for s in book.sheets():

sd[s.name]=s

#obtain Sheet "Foglio 1" from sheet names dictionary

sheet=sd["Foglio 1"]

#print value of the cell J141

print sheet.cell(142,9)

Anyway, It seems to have no way to get the formul from the Cell object returned by the

In documentation they say that it is possible to get a string version of the formula (in english because there is no information about function name translation stored in the Excel file). They speak about formulas (expressions) in the

Could you suggest a code snippet that gets the formula text from a cell?

Answer

[Dis]claimer: I'm the author/maintainer of `xlrd`

.

The documentation references to formula text are about "name" formulas; read the section "Named references, constants, formulas, and macros" near the start of the docs. These formulas are associated sheet-wide or book-wide to a name; they are not associated with individual cells. Examples: `PI`

maps to `=22/7`

, `SALES`

maps to `=Mktng!$A$2:$Z$99`

. The name-formula decompiler was written to support inspection of the simpler and/or commonly found usages of defined names.

Formulas in general are of several kinds: cell, shared, and array (all associated with a cell, directly or indirectly), name, data validation, and conditional formatting.

Decompiling general formulas from bytecode to text is a "work-in-progress", slowly. Note that supposing it were available, you would then need to parse the text formula to extract the cell references. Parsing Excel formulas correctly is not an easy job; as with HTML, using regexes looks easy but doesn't work. It would be better to extract the references directly from the formula bytecode.

Also note that cell-based formulas can refer to names, and name formulas can refer both to cells and to other names. So it would be necessary to extract both cell and name references from both cell-based and name formulas. It may be useful to you to have info on shared formulas available; otherwise having parsed the following:

```
B2 =A2
B3 =A3+B2
B4 =A4+B3
B5 =A5+B4
...
B60 =A60+B59
```

you would need to deduce the similarity between the `B3:B60`

formulas yourself.

In any case, none of the above is likely to be available any time soon -- `xlrd`

priorities lie elsewhere.