user2543622 - 1 year ago 76
R Question

# R table data and create additional columns

I have data as below.

``````A=c(rep("x",3),rep("Y",2),rep("Z",3))

B=c(0,1,0,1,1,0,0,0)

new=data.frame(A,B)
``````

I want to create one column -
`modified_zip`
as below

1. Look at count of unique values in column A. example - X=3, Y=2, Z=3

2. for each unique value in column A, find sum of column B. for example when column A has x, column B sum is 1. when column A has z, column B sum is 0

3. Divide calculation 1 by calculation 2 and find %. X=1/3,Y=2/2,Z=0/3

4. create a new column that will have below values

• if calculation 1 is above 100 and calculation 3 is above 65% then
`modified_zip`
will have value
`65%above100`

• if calculation 1 is above 100 and calculation 3 is less than 35% then
`modified_zip`
will have value
`35%above100`

• if calculation 1 is above 100 and calculation 3 is between 35% to 65 then
`modified_zip`
will have value
`otherabove100`

• if calculation 1 is between 50 to 100 and calculation 3 is above 65% then
`modified_zip`
will have value
`65%between50and100`

• if calculation 1 is between 50 to 100 and calculation 3 is below 35% then
`modified_zip`
will have value
`35%between50and100`

• if calculation 1 is between 50 to 100 and calculation 3 is between 35% and 65% then
`modified_zip`
will have value
`otherbetween50and100`

• if calculation 1 is between 10 to 50 and calculation 3 is above 65% then
`modified_zip`
will have value
`65%between10and50`

• if calculation 1 is between 10 to 50 and calculation 3 is below 35% then
`modified_zip`
will have value
`35%between10and50`

• if calculation 1 is between 10 to 50 and calculation 3 is between 35% and 65% then
`modified_zip`
will have value
`otherbetween10and50`

• if calculation 1 is below 10 then
`modified_zip`
will have value
`smallnumber`

I tried to use
`table`
command but dont know how to work with % and counts, both

``````#Look at count of unique values in column
library(sqldf)
sqldf("select A, count(A) from new group by A")

#for each unique value in column A, find sum of column B.
sqldf("select A, count(A), sum(B) as sumB from new group by A")

# Divide calculation 1 by calculation 2
new1      <- sqldf("select A, count(A), sum(B) as sumB from new group by A")
new1\$calc <- new1\$`count(A)`/new1\$sumB
new1\$calc[is.infinite(new1\$calc)] <- 0
new1\$calc <- new1\$calc*100
``````

You have a very long list of rules there and almost none of them apply to your example because your highest Calculation 1 is 3, but when you know how to do 1 you know how to do all of them, so I will give you 1 example:

``````#create a new column that will have below values
new1\$modified_zip <- NA
new1\$modified_zip[new1\$`count(A)` > 100 & new1\$calc > 65] <- "65%above100"
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download