Nick Knauer Nick Knauer - 1 month ago 11
R Question

Create Percentage Table from Dataframe for Multiple Selected Columns

I have a dataframe that looks like this:

COLA COLB COLC COLD COLE
Name1 yes A AB uno
Name2 yes B AC dos
Name3 no C AB tres
Name4 no D AC cuatro


How do I create a proportion dataframe that shows the percentages of each selected column value along with the frequency:

ATTRIBUTE Percentages Frequency
*COLB* *Percentage* *Amount*
yes 50% 2
no 50% 2
*COLC* *Percentage *Amount*
A 25% 1
B 25% 1
C 25% 1
D 25% 1
*COLD* *Percentage* *Amount*
AB 50% 2
AC 50% 2


It doesn't need to look exactly like this but I need it to all be in one dataframe and include only the selected columns mentioned.

Any help would be great, thanks!

Answer

You can do the following:

dat <- data.frame(COLA=paste0("name",1:4),
                  "COLB"=c("yes", "yes", "no", "no"))

require(purrr)
col_to_stat <- function(col){
  tmp <- table(col)
  data.frame(ATTRIBUTE = names(tmp), Percentages = c(tmp/length(col)), Frequency = c(tmp),
             stringsAsFactors = FALSE)
}
map_df(dat, col_to_stat, .id="col")

Which gives you:

   col ATTRIBUTE Percentages Frequency
1 COLA     name1        0.25         1
2 COLA     name2        0.25         1
3 COLA     name3        0.25         1
4 COLA     name4        0.25         1
5 COLB        no        0.50         2
6 COLB       yes        0.50         2

If you want to print percentages instead of decimals have a look at: How to format a number as percentage in R?

P.S.: If you use tibble instead of data.frame you can use the following which is shorter:

tibble(ATTRIBUTE = names(tmp), Percentages = tmp/length(col), Frequency = tmp)