scotgis scotgis - 4 months ago 25
R Question

Creating pivot table in R

I know this has been asked numerous times but I've scrolled the internet for the last 2 days and can't find the help I need to create a pivot table or summary table like I could do with excel. I'm completely new to R as in a complete illiterate so I find most examples in R packages help files way too complicated.

I have a list of data ("assemblage") that looks like this

Phase class NISP
<chr> <chr> <int>
1 L Aves 11
2 L Fish 128
3 L Mammals 14
4 K Aves 63
5 K Fish 30
6 K Mammals 311
7 J Aves 170
8 J Fish 327
9 J Mammals 740
10 I Aves 45
# ... with 18 more rows


and I've managed to do a summary/pivottable like table using dcast()

pivot <-dcast(assemblage, Phase ~ class, fun.aggregate = sum, value.var = "NISP", margins = TRUE)


with the following result

Phase Aves Fish Indeterminate Mammals (all)
1 A 1 0 0 6 7
2 B 2 0 0 3 5
3 C 58 20 0 255 333
4 E 5 2 0 5 12
5 F 14 0 0 17 31
6 H 121 154 1 784 1060
7 I 45 110 0 149 304
8 J 170 327 0 740 1237
9 K 63 30 0 311 404
10 L 11 128 0 14 153
11 (all) 490 771 1 2284 3546


But what I need now that I can't figure out is:


  • replace (all) with "Total"

  • order columns in following order ("Phase","Mammals","Fish","Aves","Indeterminate","Total")

  • add new columns with percentage of each animal class per phase (percentage of row total).


Answer

The code below should do everything step by step. If anything is unclear, let me know.

# make some data
df = data.frame(Phase = c(1, 1, 2, 2, 3), 
                Fish = floor(rnorm(5, 150)),
                Mammal = floor(rnorm(5, 50)))

df$all = rowSums(df[, 2:3])

# 1 change name
names(df)[which(names(df) == 'all')] = 'Total'

# 2 - reverse Fish and Mammal
idx1 = 2:3 # columns to change
idx2 = 3:2 # new order of columns
df[, idx1] = df[, idx2]
names(df)[idx1] = names(df)[idx2]

# 3 - calculate percentages
idxT = 2:3 # column indices of interest

newColNames = paste('%', names(df)[idxT])

tmp = df[, idxT, drop = FALSE] / matrix(df["Total"], ncol = length(idxT))
colnames(tmp) = newColNames
df = cbind(df, tmp)