Joseph M Njuguna - 1 year ago 94
Python Question

# Convert the string 2.90K to 2900 or 5.2M to 5200000 in pandas dataframe

Need some help on processing data inside a pandas dataframe.
Any help is most welcome.

I have OHCLV data in CSV format. I have loaded the file in to pandas dataframe.

How do I convert the volume column from 2.90K to 2900 or 5.2M to 5200000.
The column can contain both K in form of thousands and M in millions.

``````import pandas as pd

file_path = '/home/fatjoe/UCHM.csv'
df.columns = [
"closing_price",
"opening_price",
"high_price",
"low_price",
"volume",
"change"]

df['opening_price'] = df['closing_price']
df['opening_price'] = df['opening_price'].shift(-1)
df = df.replace('-', 0)
df = df[:-1]

Console:
Date
2016-09-23          0
2016-09-22      9.60K
2016-09-21     54.20K
2016-09-20    115.30K
2016-09-19     18.90K
2016-09-16    176.10K
2016-09-15     31.60K
2016-09-14     10.00K
2016-09-13      3.20K
``````

assuming you have the following DF:

``````In [30]: df
Out[30]:
Date      Val
0  2016-09-23      100
1  2016-09-22    9.60M
2  2016-09-21   54.20K
3  2016-09-20  115.30K
4  2016-09-19   18.90K
5  2016-09-16  176.10K
6  2016-09-15   31.60K
7  2016-09-14   10.00K
8  2016-09-13    3.20M
``````

you can do it this way:

``````In [31]: df.Val = (df.Val.replace(r'[KM]+\$', '', regex=True).astype(float) * \
....:           df.Val.str.extract(r'[\d\.]+([KM]+)', expand=False)
....:             .fillna(1)
....:             .replace(['K','M'], [10**3, 10**6]).astype(int))

In [32]: df
Out[32]:
Date        Val
0  2016-09-23      100.0
1  2016-09-22  9600000.0
2  2016-09-21    54200.0
3  2016-09-20   115300.0
4  2016-09-19    18900.0
5  2016-09-16   176100.0
6  2016-09-15    31600.0
7  2016-09-14    10000.0
8  2016-09-13  3200000.0
``````

Explanation:

``````In [36]: df.Val.replace(r'[KM]+\$', '', regex=True).astype(float)
Out[36]:
0    100.0
1      9.6
2     54.2
3    115.3
4     18.9
5    176.1
6     31.6
7     10.0
8      3.2
Name: Val, dtype: float64

In [37]: df.Val.str.extract(r'[\d\.]+([KM]+)', expand=False)
Out[37]:
0    NaN
1      M
2      K
3      K
4      K
5      K
6      K
7      K
8      M
Name: Val, dtype: object

In [38]: df.Val.str.extract(r'[\d\.]+([KM]+)', expand=False).fillna(1)
Out[38]:
0    1
1    M
2    K
3    K
4    K
5    K
6    K
7    K
8    M
Name: Val, dtype: object

In [39]: df.Val.str.extract(r'[\d\.]+([KM]+)', expand=False).fillna(1).replace(['K','M'], [10**3, 10**6]).astype(int)
Out[39]:
0          1
1    1000000
2       1000
3       1000
4       1000
5       1000
6       1000
7       1000
8    1000000
Name: Val, dtype: int32
``````
