Rose - 3 months ago 13

R Question

I have a data frame that I want to subset by one of the column values, and then I want to run chi squared on each of the new subsets.

I read the question about Subsetting a data frame into multiple data frames based on multiple column values which showed me how to subset a data frame. I used a variant on the code suggested there:

`split(SpellingVars, with(SpellingVars, interaction(Headword)), drop = TRUE)`

That worked with my data, but what I then want to know is how to reuse those subsets so:

- how do I run a function over each new subset?

The data I have looks like this:

`SPELLING VARS DATA SET`

Headword Variant Freq1 Freq2

Knight Kniht 17 22

Knight Knyhht 28 12

Knight Knyt 6 7

Sword Sword 7 8

Sword Swerd 14 44

So I'd like a subset for Sword, and one for Knight, and I'd like to run chi squared over each subset. But I'm not sure how to do it.

I've tried to do this myself, but with no success. The code I've been attempting to use is a variant on the answer to the Subsetting question I linked to above:

`chisq.test(split(SpellingVars, with(SpellingVars, interaction(Headword)), drop = TRUE))`

However, this gives the error

`(list) object cannot be coerced to type 'double'`

Answer

use lapply to do a function over a list of dataframes:

```
SpellingVars <- data.frame(Headword= c('Knight','Knight','Knight','Sword','Sword')
,Variant= c('Kniht', 'Knyhht', 'Knyt', 'Sword', 'Swerd')
,Freq1 = c(17,28,6,7,14)
,Freq2 = c(22,12,7,8,44))
sp <- split(SpellingVars, with(SpellingVars, interaction(Headword)), drop = TRUE)
lapply(sp, function(x){chisq.test(x$Freq1, x$Freq2)})
```