user1718097 user1718097 - 8 months ago 72
Python Question

How to I change data-type of pandas data frame to string with a defined format

I'm starting to tear my hair out with this - so I hope someone can help. I have a pandas DataFrame that was created from an Excel spreadsheet using openpyxl. The resulting DataFrame looks like:

print image_name_data
id image_name
0 1001 1001_mar2014_report
1 1002 1002_mar2014_report
2 1003 1003_mar2014_report

[3 rows x 2 columns]

…with the following datatypes:

print image_name_data.dtypes
id float64
image_name object
dtype: object

The issue is that the numbers in the id column are, in fact, identification numbers and I need to treat them as strings. I've tried converting the id column to strings using:

image_name_data['id'] = image_name_data['id'].astype('str')

This seems a bit ugly but it does produce a variable of type 'object' rather than 'float64':

print image_name_data.dyptes
id object
image_name object
dtype: object

However, the strings that are created have a decimal point, as shown:

print image_name_data
id image_name
0 1001.0 1001_mar2014_report
1 1002.0 1002_mar2014_report
2 1003.0 1003_mar2014_report

[3 rows x 2 columns]

How can I convert a float64 column in a pandas DataFrame to a string with a given format (in this case, for example, '%10.0f')?

Thanks in advance for any help and advice.


I'm unable to reproduce your problem but have you tried converting it to an integer first?

image_name_data['id'] = image_name_data['id'].astype(int).astype('str')

Then, regarding your more general question you could use map (as in this answer). In your case: image_name_data['id'] = image_name_data['id'].map('{:.0f}'.format)