A. Tiek A. Tiek - 1 year ago 37
Python Question

Python - For loop that performes a sum and stores answer in new column

I would like to calculate the total age of all persons with the same name: See the example table here.

table with names

This is the code I have written so far.. but it is not complete and it doesn't work..

final_df = DataFrame()

for i in [list of names]:
dummy = sort_df.loc[sort_df['name'] == i]
total_age = 0

for j in dummy.age:
age2 = dummy.age(j)

total_age = total_age + age2


final_df['total_age'] = total_age

How do I fix this problem and I can write a code that will iterate over ages of people with the same name and sum them and store these in a new column?

In the end it should look like this:



Looking at your code I assume there is one csv file named input.csv which has already been read to sort_df with this data inside:

name,age,total age

In this case, there is no need to declare another dummy dataframe. Use this:

from pandas import DataFrame

sort_df = DataFrame.from_csv("inCSV.txt", index_col=False)
final_df = sort_df

# Use a dictionary to keep track instead
total_age = {}
for name in sort_df["name"]:
    if name not in total_age.keys():
        total_age[name] = 0

# Add up the ages
for index in xrange(len(sort_df)):
    person = sort_df.loc[index]
    name = person["name"]
    age = person["age"]
    total_age[name] += age

# Set the new ages into final_df
for index in xrange(len(final_df)):
    person = final_df.loc[index]
    name = person["name"]
    final_df.set_value(index, "total age", total_age[name])

print final_df

which will give you (in final_df):

      name  age  total age
0  Alfredo   13       40.0
1  Alfredo   12       40.0
2  Alfredo   15       40.0
3     Jaap   12       26.0
4     Jaap   14       26.0
5     Koen   16       16.0
6     Lian   76      169.0
7     Lian   45      169.0
8     Lian   34      169.0
9     Lian   14      169.0