Rose Rose - 2 months ago 8
R Question

R subsetting dataframe and running function on each subset

I have a data frame that I want to subset by one of the column values, and then I want to run chi squared on each of the new subsets.

I read the question about Subsetting a data frame into multiple data frames based on multiple column values which showed me how to subset a data frame. I used a variant on the code suggested there:

split(SpellingVars, with(SpellingVars, interaction(Headword)), drop = TRUE)


That worked with my data, but what I then want to know is how to reuse those subsets so:


  • how do I run a function over each new subset?



The data I have looks like this:

SPELLING VARS DATA SET
Headword Variant Freq1 Freq2
Knight Kniht 17 22
Knight Knyhht 28 12
Knight Knyt 6 7
Sword Sword 7 8
Sword Swerd 14 44


So I'd like a subset for Sword, and one for Knight, and I'd like to run chi squared over each subset. But I'm not sure how to do it.

I've tried to do this myself, but with no success. The code I've been attempting to use is a variant on the answer to the Subsetting question I linked to above:

chisq.test(split(SpellingVars, with(SpellingVars, interaction(Headword)), drop = TRUE))


However, this gives the error
(list) object cannot be coerced to type 'double'
. I'm at a bit of a loss and I'd appreciate any advice!

Answer

use lapply to do a function over a list of dataframes:

SpellingVars <- data.frame(Headword= c('Knight','Knight','Knight','Sword','Sword')
           ,Variant= c('Kniht', 'Knyhht', 'Knyt', 'Sword', 'Swerd')
           ,Freq1 = c(17,28,6,7,14)
           ,Freq2 = c(22,12,7,8,44))


sp <- split(SpellingVars, with(SpellingVars, interaction(Headword)), drop = TRUE)

lapply(sp, function(x){chisq.test(x$Freq1, x$Freq2)})