宋国庆 宋国庆 - 24 days ago 12
Python Question

How to change this dataframe with python in order to use collaborative filtering

Here is my original data:
enter image description here

As you can see.The cust_id column records the consumption record for each ID.And second column means the product name,third is the munber they bought each time.

I want to get this kind of data:
enter image description here

The result data shows each customer bought which product and how many.If they never bought,then the data is None.I think this is Sparse matrix.

I have tried many ways and still can't fix it up.....

Maybe pandas?Numpy?

Answer

There is problem with duplicates, I add last row with same cust_id and prd_id values for demonstrate it.

print (df)
    cust_id  prd_id  prd_number
8       462      40           1
9       462      46           3
10      462      59           1
11      462      63          13
12      462      67           1
13      462      82          12
14      462      88           1
15      462     163           3
16      463      68           1
17      463      90           1
18      463     159           2
16      464      93          11
20      464      94           8
21      464      96           1
22      464     142           4
23      465      50           1
24      465      50           5

Then need groupby by columns cust_id and prd_id with aggreagting some function like mean() or sum(). Last unstack with replacing NaN to 0:

print (df.groupby(['cust_id', 'prd_id'])['prd_number'].sum().unstack(fill_value=0))
prd_id   40   46   50   59   63   67   68   82   88   90   93   94   96   142  \
cust_id                                                                         
462        1    3    0    1   13    1    0   12    1    0    0    0    0    0   
463        0    0    0    0    0    0    1    0    0    1    0    0    0    0   
464        0    0    0    0    0    0    0    0    0    0   11    8    1    4   
465        0    0    6    0    0    0    0    0    0    0    0    0    0    0   

prd_id   159  163  
cust_id            
462        0    3  
463        2    0  
464        0    0  
465        0    0  
Comments