hwq729 - 6 months ago 51

R Question

I used the code below to generate a dataframe in r:

`set.seed(456)`

data_5 <- data.frame(id=factor(rep(c("A","B","c"),each=214)),

people=c(floor(runif(214,min=10,max=800)),

floor(runif(214,min=20,max=810)),

floor(runif(214,min=30,max=820))))

Q1: I want to add a new column where it displays outcomes that come from each value in column"people" divided by the total value it belongs to (that is to say each value in category A will be divided by the total value of category A, the same for category B )

Q2: want to add a new column where it's supposed to display the mean of category A,B,C for each step (214 step in total), I know that it will generate a column with 214 values 3 times....but it may not affect ploting i guess.

Q3: I want to calculate cumulative value form Q1 for category A,B,C respectively

have tried to get these results by generating each column and integrate it, but just looking for a better way to optimise it..

Cheers

Answer

You could use the `ave`

function as below (hope I understood your question correctly). Using the syntax of the first line, the world is pretty much your oyster and you can specify any function you would like to be applied to the different id categories. You may also want to check out the `aggregate`

, `by`

, and `tapply`

functions to apply functions to different categories.

```
data_5$perc<- ave(data_5$people, data_5$id, FUN = function(x) x/sum(x))
data_5$mean <- ave(data_5$people, data_5$id, FUN = mean)
data_5$cummean <- ave(data_5$people, data_5$id, FUN = cummean)
```