running man running man - 3 days ago 5
Python Question

How to loop list value of a specific column in pandas?

I have a pandas dataframe, which the first column are list values. I want to loop each str value of each list, and the values of next columns will be in included together.

For example:

tm = pd.DataFrame({'author':[['author_a1','author_a2','author_a3'],['author_b1','author_b2'],['author_c1','author_c2']],'journal':['journal01','journal02','journal03'],'date':pd.date_range('2015-02-03',periods=3)})
tm

author date journal
0 [author_a1, author_a2, author_a3] 2015-02-03 journal01
1 [author_b1, author_b2] 2015-02-04 journal02
2 [author_c1, author_c2] 2015-02-05 journal03


I want this:

author date journal
0 author_a1 2015-02-03 journal01
1 author_a2 2015-02-03 journal01
2 author_a3 2015-02-03 journal01
3 author_b1 2015-02-04 journal02
4 author_b2 2015-02-04 journal02
5 author_c1 2015-02-05 journal03
6 author_c2 2015-02-05 journal03





I 've used a complex method to solve the problem. Is there any simple and efficient method by using pandas?

author_use = []
date_use = []
journal_use = []

for i in range(0,len(tm['author'])):
for m in range(0,len(tm['author'][i])):
author_use.append(tm['author'][i][m])
date_use.append(tm['date'][i])
journal_use.append(tm['journal'][i])

df_author = pd.DataFrame({'author':author_use,
'date':date_use,
'journal':journal_use,
})

df_author

Answer

I think you can use numpy.repeat for repeat values by legths by str.len and flat values of nested lists by chain:

from  itertools import chain

lens = tm.author.str.len()

df = pd.DataFrame({
        "date": np.repeat(tm.date.values, lens),
        "journal": np.repeat(tm.journal.values,lens),
        "author": list(chain.from_iterable(tm.author))})

print (df)

      author       date    journal
0  author_a1 2015-02-03  journal01
1  author_a2 2015-02-03  journal01
2  author_a3 2015-02-03  journal01
3  author_b1 2015-02-04  journal02
4  author_b2 2015-02-04  journal02
5  author_c1 2015-02-05  journal03
6  author_c2 2015-02-05  journal03

Another numpy solution:

df = pd.DataFrame(np.column_stack((tm[['date','journal']].values.\
     repeat(list(map(len,tm.author)),axis=0) ,np.hstack(tm.author))), 
     columns=['date','journal','author'])

print (df)
                  date    journal     author
0  2015-02-03 00:00:00  journal01  auther_a1
1  2015-02-03 00:00:00  journal01  auther_a2
2  2015-02-03 00:00:00  journal01  auther_a3
3  2015-02-04 00:00:00  journal02  auther_b1
4  2015-02-04 00:00:00  journal02  auther_b2
5  2015-02-05 00:00:00  journal03  auther_c1
6  2015-02-05 00:00:00  journal03  auther_c2
Comments