Learner Algorithm Learner Algorithm - 2 months ago 12
Python Question

how to print two columns of xls files on screen?

I asked a question here how can I read each line of a xls file with pausing and the answer is right as below

This solution goes row wise . for example if I have 3 values at first row. It prints first cell then second cell then third cell etc until the first row finish and then goes to second row.

What I want is to print all cells with a distance at the same time
for example print the first row then print the second row etc

As an example, if I have in my xls file two rows as follows

row 1 : 1 2 3 (there are three values in three cell)
row 2: 5 6 7

I want it prints it like

value ::: 1 ------> 2------> 3
value ::: 5-------> 6------> 7

import time
import pandas as pd
import os
import xlrd
# at first I try to know how many rows and how many columns I have
workbook = xlrd.open_workbook('myfile.xls')
for sheet in workbook.sheets():
for row in range(sheet.nrows):
for column in range(sheet.ncols):
print "value::: ", sheet.cell(row,column).value
time.sleep(5.5) # pause 5.5 seconds

this shows only one column of xls but if I have two or three columns
it ignore the second and the third etc and only print the first one

how can I amend this to be able to print as many cell as I want in each epoch ?


This is relatively easy to do with pandas.

import time
import pandas as pd
import os

frame = pd.read_excel('myfile.xls')

This will give you a pandas DataFrame with only the first sheet of that excel file.

pandas dataframes have an inbuilt method to iterate over all rows:

for row in frame.iterrows():
    print row

Please also refer to the documentation of pandas.read_excel (this is for pandas version 0.19.1).


Well, not properly testing this has come around to bite me in the bottom. It would appear that iterating over single rows of the DataFrame has unexpected side effects on formatting. To make sure this does not happen, the following code transforms the frame into a string via the to_string method that the DataFrame class features. We do not want to get the row index printed out as well so we set the parameter index to false.

To be able to iterate over this data row-wise, we need to split our string at each newline, this is what the splitlines function does. It is an inbuilt function of python strings. It turns our single string containing the entire data frame into a list of strings, each containing only a single line.

pandas separates columns by two spaces, so finally we replace each occurrence of double spaces with the wanted delimiter via the builtin replace function.

import time
import pandas as pd
import os

frame = pd.read_excel('data.xls')

for row in frame.to_string(index=False).splitlines():
    print 'value ::: ' + row.replace('  ', '------>')


For reference, also see the python manual on built in types, it lists the methods available for strings. These are very helpful in string formating.