Molly Zhao Molly Zhao - 1 month ago 11
Python Question

Python remove 'text' from lists made by xlrd

I used xlrd to read through each cell of three columns to make three lists. Then, I appended the ith element of all three lists to a new list, making a new list of lists.

search_terms=[]
for row in range(0, book.nrows):
search_terms.append([med_name[row], med_school[row], mentor[row]])
print(*search_terms[0:15], sep='\n')
[text:'Andrew Burkeland', 'Weill Cornell Medical College', 'Dave Cutler ']
[text:'Andrew Pence', 'University of Alabama at Birmingham School of Medicine', 'Jack Warran ']


Is there a way to take out the 'text:'? I am inputing each list in
search_terms
into
Entrez.egquery
to search for results on
pubmed
, and with 'text:' in the query line, I keep on getting 0 results.

Answer

Let's assume a simple table which is called 'students.xlsx'

Student     School     Mentor
John Doe    Harvard    Kornberg
Jane Done   Stanford   Pauling

and now open it with xlrd

import xlrd
xl_workbook = xlrd.open_workbook('students.xlsx')
xl_sheet = xl_workbook.sheet_by_index(0)
row = xl_sheet.row(1)

Now let's look at the individual parts

print(row)

[text:'John Doe', text:'Harvard', text:'Kornberg']

print(row[0])

text:'John Doe'

print(row[0].value)

'John Doe'

The problem is that row[0] is an xlrd cell and not a string which is reason why it is necessary to get the content via value.

Now let's do it for all rows (except the header):

raw_data = list()
for row in range(1, xl_sheet.nrows):
    raw_data.append(xl_sheet.row(row))

author_list = list()
for raw in raw_data:
    author_list.append(list())
    for r in raw:
        author_list[-1].append(r.value)
print(author_list)
> [['John Doe', 'Harvard', 'Kornberg'], ['Jane Done', 'Stanford',
> 'Pauling']]

or short:

author_list = [[c.value for c in xl_sheet.row(n)] for n in range(1, xl_sheet.nrows)]
Comments