ganga ganga - 2 months ago 7
Python Question

Sorting and arranging a list using pandas

I have an input file as shown below which needs to be arranged in such an order that the key values need to be in ascending order, while the keys which are not present need to be printed in the last.
I am getting the data arranged in the required format but the order is missing.

I have tried using sort() method but it shows "list has no attribute sort".
Please suggest solution and also suggest if any modifications required.

Input file:

3=1388|4=1388|5=IBM|8=157.75|9=88929|1021=1500|854=n|388=157.75|394=157.75|474=157.75|1584=88929|444=20160713|459=93000546718000|461=7|55=93000552181000|22=89020|400=157.75|361=0.73|981=0|16=1468416600.6006|18=1468416600.6006|362=0.46
3=1388|4=1388|5=IBM|8=157.73|9=100|1021=0|854=p|394=157.73|474=157.749977558|1584=89029|444=20160713|459=93001362639104|461=26142|55=93001362849000|22=89120|361=0.71|981=0|16=1468416601.372|18=1468416601.372|362=0.45
3=1388|4=1388|5=IBM|8=157.69|9=100|1021=600|854=p|394=157.69|474=157.749910415|1584=89129|444=20160713|459=93004178882560|461=27052|55=93004179085000|22=89328|361=0.67|981=1|16=1468416604.1916|18=1468416604.1916|362=0.43


Code i tried:

import pandas as pd
import numpy as np
df = pd.read_csv('inputfile', index_col=None, names=['text'])
s = df.text.str.split('|')
ds = [dict(w.split('=', 1) for w in x) for x in s]
p = pd.DataFrame.from_records(ds)
p1 = p.replace(np.nan,'n/a', regex=True)
st = p1.stack(level=0,dropna=False)
dfs = [g for i,g in st.groupby(level=0)]
#print st
i = 0
while i < len(dfs):
#index of each column
print ('\nindex[%d]'%i)
for (_,k),v in dfs[i].iteritems():
print k,'\t',v
i = i + 1


output getting:

index[0]
1021 1500
1584 88929
16 1468416600.6006
18 1468416600.6006
22 89020
3 1388
361 0.73
362 0.46
388 157.75
394 157.75
4 1388
400 157.75
444 20160713
459 93000546718000
461 7
474 157.75
5 IBM
55 93000552181000
8 157.75
854 n
9 88929
981 0

index[1]
1021 0
1584 89029
16 1468416601.372
18 1468416601.372
22 89120
3 1388
361 0.71
362 0.45
388 n/a
394 157.73
4 1388
400 n/a
444 20160713
459 93001362639104
461 26142
474 157.749977558
5 IBM
55 93001362849000
8 157.73
854 p
9 100
981 0


Expected output:

index[0]
3 1388
4 1388
5 IBM
8 157.75
9 88929
16 1468416600.6006
18 1468416600.6006
22 89020
55 93000552181000
361 0.73
362 0.46
388 157.75
394 157.75
400 157.75
444 20160713
459 93000546718000
461 7
474 157.75
854 n
981 0
1021 1500
1584 88929

index[1]
3 1388
4 1388
5 IBM
8 157.75
9 88929
16 1468416600.6006
18 1468416600.6006
22 89020
55 93000552181000
361 0.73
362 0.46
394 157.75
444 20160713
459 93000546718000
461 7
474 157.75
854 n
981 0
1021 1500
1584 88929
388 n/a
400 n/a

Answer

Replace your ds line with

ds = [{int(pair[0]): pair[1] for pair in [w.split('=', 1) for w in x]} for x in s]

To convert the index to an integer so it will be sorted numerically

btw, you can also simplify your stack, groupby, print to:

for (ix, series) in p1.iterrows():
    print('\nindex[%d]' % ix)
    for tag, value in series.iteritems():
        print(tag, '\t', value)

So the whole script becomes:

df = pd.read_csv('inputfile', index_col=None, names=['text'])
s = df.text.str.split('|')
ds = [{int(pair[0]): pair[1] for pair in [w.split('=', 1) for w in x]} for x in s]
p = pd.DataFrame.from_records(ds)
p1 = p.fillna('n/a')
for (ix, series) in p1.iterrows():
    print('\nindex[%d]' % ix)
    for tag, value in series.iteritems():
        print(tag, '\t', value)
Comments