user2543622 - 1 year ago 71

R Question

I have data as below.

`A=c(rep("x",3),rep("Y",2),rep("Z",3))`

B=c(0,1,0,1,1,0,0,0)

new=data.frame(A,B)

I want to create one column -

`modified_zip`

- Look at count of unique values in column A. example - X=3, Y=2, Z=3
- for each unique value in column A, find sum of column B. for example when column A has x, column B sum is 1. when column A has z, column B sum is 0
- Divide calculation 1 by calculation 2 and find %. X=1/3,Y=2/2,Z=0/3
- create a new column that will have below values

• if calculation 1 is above 100 and calculation 3 is above 65% thenwill have value`modified_zip`

`65%above100`

• if calculation 1 is above 100 and calculation 3 is less than 35% thenwill have value`modified_zip`

`35%above100`

• if calculation 1 is above 100 and calculation 3 is between 35% to 65 thenwill have value`modified_zip`

`otherabove100`

• if calculation 1 is between 50 to 100 and calculation 3 is above 65% thenwill have value`modified_zip`

`65%between50and100`

• if calculation 1 is between 50 to 100 and calculation 3 is below 35% thenwill have value`modified_zip`

`35%between50and100`

• if calculation 1 is between 50 to 100 and calculation 3 is between 35% and 65% thenwill have value`modified_zip`

`otherbetween50and100`

• if calculation 1 is between 10 to 50 and calculation 3 is above 65% thenwill have value`modified_zip`

`65%between10and50`

• if calculation 1 is between 10 to 50 and calculation 3 is below 35% thenwill have value`modified_zip`

`35%between10and50`

• if calculation 1 is between 10 to 50 and calculation 3 is between 35% and 65% thenwill have value`modified_zip`

`otherbetween10and50`

• if calculation 1 is below 10 thenwill have value`modified_zip`

`smallnumber`

I tried to use

`table`

Answer Source

```
#Look at count of unique values in column
library(sqldf)
sqldf("select A, count(A) from new group by A")
#for each unique value in column A, find sum of column B.
sqldf("select A, count(A), sum(B) as sumB from new group by A")
# Divide calculation 1 by calculation 2
new1 <- sqldf("select A, count(A), sum(B) as sumB from new group by A")
new1$calc <- new1$`count(A)`/new1$sumB
new1$calc[is.infinite(new1$calc)] <- 0
new1$calc <- new1$calc*100
```

You have a very long list of rules there and almost none of them apply to your example because your highest Calculation 1 is 3, but when you know how to do 1 you know how to do all of them, so I will give you 1 example:

```
#create a new column that will have below values
new1$modified_zip <- NA
new1$modified_zip[new1$`count(A)` > 100 & new1$calc > 65] <- "65%above100"
```