Walker Walker - 28 days ago 7
R Question

pivottabler Counting Small Ns and Row Percentages (R)

I was having difficulty with the pivottabler package and was wondering if you could assist.

library(pivottabler)

# perform the aggregation in R code explicitly
trains <- bhmtrains %>%
group_by(TrainCategory, TOC) %>%
summarise(NumberOfTrains=n()) %>%
ungroup()

# display this pre-calculated data
pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", # << *** CODE CHANGE (AND BELOW) *** <<
type="value", valueName="NumberOfTrains",
summariseExpression="sum(NumberOfTrains)")
pt$renderPivot()


This produces a great pivot-like table that looks like this

enter image description here

Does anyone know how I can add a percent of row column like this?

enter image description here

I added columns to my dataset for total by TOC and Total by TOC & TrainCategory. I tried to get a percentage calculated from that but

#total calculations
bhmtrains <- bhmtrains %>%
+ group_by(TOC) %>%
+ mutate(TOCCount = n())

bhmtrains <- bhmtrains %>%
+ group_by(TrainCategory) %>%
+ mutate(TrainCategoryCCount = n())

pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", # << *** CODE CHANGE (AND BELOW) *** <<
type="value", valueName="NumberOfTrains",
summariseExpression="sum(NumberOfTrains)")
##my attempt to calculate row percentage
pt$defineCalculation(calculationName="Percent", caption="%",
type="calculation", basedOn=c("TOCCount", "TrainCategoryCCount"),
format="%.1f %%",
calculationExpression="values$TOCCount/values$TrainCategoryCCount*100")
pt$renderPivot()


And I received this error:

rror in if (calc$type == "value") { : argument is of length zero


Can anyone assist?

Answer Source

I am the package author.

The row percentage is slightly more complex since in a given % cell in the body of the pivot table, you need both the number of trains of that category (Express/Ordinary) and the number of all categories. There are a couple of enhancements on the backlog that will help with this. But, in the meantime, the following will work (explanation after the code):

getPercentageOfAllCategories <- function(pivotCalculator, netFilters, format, baseValues, cell) {
  trains <- pivotCalculator$getDataFrame("bhmtrains")
  netFilters$setFilterValues(variableName="TrainCategory", type="ALL", values=NULL, action="replace") 
  filteredTrains <- pivotCalculator$getFilteredDataFrame(trains, netFilters)
  totalTrainsAllCategories <- nrow(filteredTrains)
  percentageOfAllCategories <- baseValues$N / totalTrainsAllCategories * 100
  value <- list()
  value$rawValue <- percentageOfAllCategories
  value$formattedValue <- pivotCalculator$formatValue(percentageOfAllCategories, format=format)
  return(value)
}

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains) 
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="N", summariseExpression="n()")
pt$defineCalculation(calculationName="Percentage", caption="%", format="%.1f %%", basedOn="N",
  type="function", calculationFunction=getPercentageOfAllCategories)
pt$renderPivot()

Result:

results

This works by defining a custom calculation function that is invoked once per % cell in the pivot table. The custom calculation function gets the filters for a given cell (i.e. which TOC and TrainCategory), then overrides the category filter to clear the TrainCategory criteria. The filters are then applied to the data frame, the resulting number of rows counted and the percentage calculated. There is a little bit more information on custom calculation functions in the calculations vignette.