baha-kev - 2 months ago 11

R Question

These are some newbie questions about statistical programming for R for which I haven't been able to find an answer online. *My dataframe is labeled "eitc" in the code below.*

**1)** Once I've loaded in a data frame, I would like to look at summary statistics. I've used the functions:

`eitc <- read.dta(file="/Users/Documents/eitc.dta")`

summary(eitc)

sapply(eitc,mean,na.rm=TRUE) #for sample mean, min, max, etc.

How do I find summary statistics on my dataframe when certain qualifications are met. For example, I would like to see the summary statistics on all variables when the variable "children" is greater than or equal to 1. The equivalent STATA code is:

`summarize if children >= 1`

`mean work if post93==0 & anykids==1`

`post93.dummy <- as.numeric(eitc$year>1993)`

eitc=cbind(eitc,post93.dummy)

Any help would be awesome! Thank you- Kevin

Answer

A lot of your requirements are answered by `subset`

, e.g.

```
summary(subset(eitc, post93 == 0 & anykids == 1, select=work))
nrow(subset(eitc, post93 == 0 & anykids == 1, select=work)) # for number of obs.
```

The `?subset`

documentation has good examples.

The `cbind`

method of attaching dummy variables is unneccesary. Just do:

```
eitc$post93.dummy <- as.numeric(eitc$year>1993)
```