Kasey Kasey - 4 months ago 14
R Question

Proportional Tables by Group

I have a data.frame structured as follows:

location gender 15.19 20.30 31.40 41.64 65.
New York Female 2 41 13 19 1
New York Male 1 23 15 17 2
San Francisco Female 1 27 14 14 3
San Francisco Male 4 24 14 10 1
Mexico City Female 1 40 26 11 3
Mexico City Male 4 23 35 8 3
Paris Female 2 12 10 6 0
Paris Male 1 20 13 11 1


...and need to turn it into a proportion table with each cell representing its proportion of the two rows for its given city. This is one solution but is there an easier way to do it for multiple columns (and just transform them without making new columns?)

EDIT
Correct output would give each cell as a proportion of ALL cells within that city, so that all cells that share the location 'New York' would add up to 1, as would all cells that share location 'San Francisco', etc. i.e. :

location gender 15.19 20.30 31.40 41.64 65.
New York Female .01 .31 .1 .14 .01
New York Male .01 .17 .11 .13 .02

Answer
library("data.table")
selected_cols <- colnames(df)[3:7]
setDT(df)[, prop.table(.SD), by = location, .SDcols = selected_cols]

#         location      X15.19    X20.30     X31.40     X41.64        X65.
# 1:      New York 0.014925373 0.3059701 0.09701493 0.14179104 0.007462687
# 2:      New York 0.007462687 0.1716418 0.11194030 0.12686567 0.014925373
# 3: San Francisco 0.008928571 0.2410714 0.12500000 0.12500000 0.026785714
# 4: San Francisco 0.035714286 0.2142857 0.12500000 0.08928571 0.008928571
# 5:   Mexico City 0.006493506 0.2597403 0.16883117 0.07142857 0.019480519
# 6:   Mexico City 0.025974026 0.1493506 0.22727273 0.05194805 0.019480519
# 7:         Paris 0.026315789 0.1578947 0.13157895 0.07894737 0.000000000
# 8:         Paris 0.013157895 0.2631579 0.17105263 0.14473684 0.013157895

Verify results: Whether each city sum to 1

a1 <- setDT(df)[, prop.table(.SD), by = location, .SDcols = selected_cols]
sum(subset(a1, location == "New York", select = selected_cols))
# [1] 1

Then gender column can be combined with a1

do.call(cbind, list(gender = df$gender, a1))