L.Jans L.Jans - 4 months ago 9
Python Question

Pandas DataFrame - Adding rows to df based on data in df

Apologies for the not so specified title. I've been, unsuccesfully so far, trying to come up with a way to add new 'rows' to a pandas dataframe based on the contents of some of the columns. I hope to make it clear with an example. The data is mock-up data which hopefully suffices in painting the bigger picture.

So, lets say a car dealer has, among others, the following 7 customers. In the dataframe you can see their customer-id, their gender (because why not), and the country they currently live in. In addition, you can see whether they've bought any of four car brands (and which type of car) or not (NA) (all values in the dataframe are strings btw). For example, Customer 4 is a female from Russia, and she has bought a Porsche 911 from the dealer.

Cust-id Sex Country Audi Ferrari Porsche Jaguar
0 Cu1 F FR R8 FF NA NA
1 Cu2 M US NA NA NA XF
2 Cu3 M UK RS7 NA NA NA
3 Cu4 F RU NA NA 911 NA
4 Cu5 M US NA NA 918 Ford
5 Cu6 F US S6 NA NA F-type
6 Cu7 M UK A8 NA MacanS XE


What i'd like to be able to do is create new rows for those cases where a customer has bought more than one car, with each row only specifying one car, and the other car brand columns all saying 'NA' in that specific row. For the above example this would result in the following dataframe.

Cust-id Sex Country Audi Ferrari Porsche Jaguar
0 Cu1 F FR R8 NA NA NA
1 Cu1 F FR NA FF NA NA
2 Cu2 M US NA NA NA XF
3 Cu3 M UK RS7 NA NA NA
4 Cu4 F RU NA NA 911 NA
5 Cu5 M US NA NA 918 NA
6 Cu5 M US NA NA NA Ford
7 Cu6 F US S6 NA NA F-type
8 Cu7 M UK A8 NA NA NA
9 Cu7 M UK NA NA MacanS NA
10 Cu7 M UK NA NA NA XE


This means that an original row with three cars specified would lead to three new rows each specifying only one of the cars (with the original row gone). The Cust-id, Sex, and Country values do not change. First time using the website to ask a question myself so hopefully the formatting is not too bad. Appreciate any help/guidance.
python pandas dataframe

Answer

The way I would approach this is the following:

  1. Iterate over every car column and keep only the records that have non-null values

    df_dict = {}
    
    for car in ['Audi', 'Ferrari', 'Porsche' ,'Jaguar']:  
    
        non_nulls = df[ df.apply(lambda x: not pd.isnull(x[car] ), axis=1)]
    
        df_dict[car] = non_nulls[[Cust-id,Sex,Country, car]]
    
  2. concatenate the dataframes with pd.concat, this will create the nulls in the right places

    final_df = pd.concat( df_dict.values() )
    

Something along those lines should work. Did not test my code though, so use your own judgement!

Comments