Hoser - 4 months ago 52

R Question

I have a dataset called

`spam`

I plan on running some linear regression on this dataset in the future, but I'd like to do some pre-processing beforehand and standardize the columns to have zero mean and unit variance.

I've been told the best way to go about this is with R, so I'd like to ask

Answer

I have to assume you meant to say that you wanted a mean of 0 and a standard deviation of 1. If your data is in a dataframe and all the columns are numeric you can simply call the `scale`

function on the data to do what you want.

```
dat <- data.frame(x = rnorm(10, 30, .2), y = runif(10, 3, 5))
scaled.dat <- scale(dat)
# check that we get mean of 0 and sd of 1
colMeans(scaled.dat) # faster version of apply(scaled.dat, 2, mean)
apply(scaled.dat, 2, sd)
```

Using built in functions is classy. Like this cat: