cwl cwl - 2 months ago 18
Python Question

Python Pandas: how to convert a list of pair mappings to a row-vector format?

I have a 2-column DataFrame, column-1 corresponds to customer, column-2 corresponds to the city this customer has visited. The DataFrame looks like the following:

print(df)

customer visited_city
0 John London
1 Mary Melbourne
2 Steve Paris
3 John New_York
4 Peter New_York
5 Mary London
6 John Melbourne
7 John New_York


I would like to convert the above DataFrame into a row-vector format, such that each row represents a unique user with the row vector indicating the cities visited.

print(wide_format_df)

London Melbourne New_York Paris
John 1.0 1.0 1.0 0.0
Mary 1.0 1.0 0.0 0.0
Steve 0.0 0.0 0.0 1.0
Peter 0.0 0.0 1.0 0.0


Below is the code I used to generate the wide format. It iterates through each user one by one. I was wondering is there any more efficient way to do so?

import pandas as pd
import numpy as np

UNIQUE_CITIESS = np.sort(df['visited_city'].unique())
p = len(UNIQUE_CITIESS)
unique_customers = df['customer'].unique().tolist()

X = []
for customer in unique_customers:
x = np.zeros(p)
city_visited = np.sort(df[df['customer'] == customer]['visited_city'].unique())
visited_idx = np.searchsorted(UNIQUE_CITIESS, city_visited)
x[visited_idx] = 1
X.append(x)
wide_format_df = pd.DataFrame(np.array(X), columns=UNIQUE_CITIESS, index=unique_customers)
wide_format_df

Answer Source

You can use crosstab

pd.crosstab(df.customer, df.visited_city)

You get

visited_city    London  Melbourne   New_York    Paris
customer                
John            1       1           1           0
Mary            1       1           0           0
Peter           0       0           1           0
Steve           0       0           0           1