Maryam Pashmi Maryam Pashmi - 5 months ago 53
Python Question

Pandas python, what is the workbook encoding type?

I am new to python and also to the pandas library in Python. The documentation is not well described and they didn't explain it well. I want to save the dataframe as an excel format and in the memory and I have found the following explanation:
[Pandas excel to the memory]

I need an explanation about

workbook
. The value of this variable is encoded, how can I see the real value of this variable? How to decode it? what should be return value for it?

EDITED:

How to pass it into the content of attachment in Mandrill api.
https://mandrillapp.com/api/docs/messages.python.html


This is my attachment part for excel extension :

'attachments': [
{
'content': content,
'name': 'fraud_report.xlsx',
'type': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
}


I am not able to open the excel file and all the time i'am getteing an error from Microsoft excel which said
the file format is not valid!...

Any help can be helpful. Thank you

Answer

For the explanation i paste the example from your link here again:

# Safe import for either Python 2.x or 3.x
try:
    from io import BytesIO
except ImportError:
    from cStringIO import StringIO as BytesIO

bio = BytesIO()

# By setting the 'engine' in the ExcelWriter constructor.
writer = ExcelWriter(bio, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')

# Save the workbook
writer.save()

# Seek to the beginning and read to copy the workbook to a variable in memory
bio.seek(0)
workbook = bio.read()

The method writer.save() saves the data in BytesIO (bio) instead of Excel file. It means, that the variable bio stores the bytes codes of the excel file.

The method bio.seek(0) sets the current position (for reading, writing,...) of bio to 0. So that you can read the data of bio from beginning with the next method bio.read().

The variable workbook store the bytes string of the excel file (or excel workbook). If you read a excel file in byte mode, you will get the same data. Or you can write it in a excel file:

with open("my_excel_file.xlsx", "wb") as f:
   f.write(workbook)

To read data from bio and store in a DataFrame you don't need bio.read():

bio.seek(0)
df = pd.read_excel(bio, "Sheet1", engine="xlrd")

For your question about using of mandrill:

In the example of mandrill you see:

{'attachments': [{'content': 'ZXhhbXBsZSBmaWxl',
                      'name': 'myfile.txt',
                      'type': 'text/plain'}],...

The documentation writes about it too:

content: the content of the attachment as a base64-encoded string

You should encode workbook in base64 and use it for sending

import base64
content = base64.b64encode(workbook)

P/S: workbook and content have the type bytes. May be you need to convert content to str before sending.

{'attachments': [{'content': content.decode('utf-8'),
                          'name': 'myfile.xlsx',
                          'type': 'text/plain'}],...

Add: if file is excel then you should change type to application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Comments