carpiediem carpiediem - 3 months ago 20
R Question

Working with Three-Dimensional Frequency Tables in R

I'm able to generate 3D frequency tables, but I'm having a hard time working with them to get the information I want. What's the best way to approach this?

What I Have



A data frame named ballots with columns
member
,
vote_time
, and
vote
, all strings. The possible values in the vote column are Yes, No, Abstain, Present, and Absent. There is exactly one row for every combination of
member
and
vote_time
values. The vote_times are a bit random in that most dates have no votes, some have one and some have several votes.

What I Want



A data frame with the columns
member
,
vote_date
, and
absent_ratio
. The
absent_ratio
column would show the percentage of ballots entered by a given member on a given date that were Absent instead of Yes or No, etc.

What I've Tried



Adding a
vote_date
column to the original data frame was easy enough.

ballots$vote_date <- as.Date(ballots$vote_time)


I've tried using both the built-in
table
function and the
count
function in the
plyr
package. They are able to give me the number of Absent votes for a given member/date combination, but I'm not sure how to turn this into a ratio.

library(plyr)
daily_vote_count <- count(ballots, c('vote_date', 'name_en'))
daily_count_by_vote <- count(ballots, c('vote_date', 'vote', 'name_en'))
daily_absense_rate <- subset(daily_count_by_vote,
name_en == daily_vote_count$name_en &
vote_date == daily_vote_count$vote_date &
vote == "Absent"
)$freq / daily_vote_count # DOESN'T WORK


The warnings point out that I'm working with objects of different lengths, but I'm unsure of how to change my approach. Thanks for any tips you can give me.

An Equivalent Problem



Since it was pointed out that this is difficult to advise on without any data, here's an equivalent question using
mtcars
. For every combination of cylinder & gear count, what percentage of cars have a fuel efficiency greater than 20 miles per gallon?

With this command, I can count the number of cars in each group.

table(mtcars$mpg>20, mtcars$cyl, mtcars$gear)


What I can't figure out is how to calculate the right percentage. The proportion of
TRUE
+ the proportion of
FALSE
for each
cyl
/
gear
pair should add to 100%. But prop.table() doesn't seem to return what I want.

prop.table(table(mtcars$mpg>20, mtcars$cyl, mtcars$gear)) # No
prop.table(table(mtcars$mpg>20, mtcars$cyl, mtcars$gear),1) # Nope
prop.table(table(mtcars$mpg>20, mtcars$cyl, mtcars$gear),2) # Still wrong


I've tried rearranging the arguments in the
table()
funtion as well, but I haven't stumbled upon the right answer yet.

Answer

You are very close, but you need to sum over 2 margins. I'm re-arranging your example so the "vote" is at the end as in your original question:

> tab <- xtabs(~cyl+gear+I(mpg>20), mtcars)
> prop.table(tab, 1:2)
, , I(mpg > 20) = FALSE

   gear
cyl   3   4   5
  4 0.0 0.0 0.0
  6 0.5 0.5 1.0
  8 1.0     1.0

, , I(mpg > 20) = TRUE

   gear
cyl   3   4   5
  4 1.0 1.0 1.0
  6 0.5 0.5 0.0
  8 0.0     0.0

> prop.table(tab, 1:2)[ , , 2] # Proportion TRUE for each combo
   gear
cyl   3   4 5
  4 1.0 1.0 1
  6 0.5 0.5 0
  8 0.0 NaN 0

All 4 cylinder cars get over 20mpg and no 8 cylinder cars do. To get a data frame:

> as.data.frame.table(prop.table(tab, 1:2)[ , , 2])
  cyl gear Freq
1   4    3  1.0
2   6    3  0.5
3   8    3  0.0
4   4    4  1.0
5   6    4  0.5
6   8    4  NaN
7   4    5  1.0
8   6    5  0.0
9   8    5  0.0