user2543622 user2543622 - 1 month ago 17
R Question

R table data and create additional columns

I have data as below.

A=c(rep("x",3),rep("Y",2),rep("Z",3))

B=c(0,1,0,1,1,0,0,0)

new=data.frame(A,B)


I want to create one column -
modified_zip
as below


  1. Look at count of unique values in column A. example - X=3, Y=2, Z=3

  2. for each unique value in column A, find sum of column B. for example when column A has x, column B sum is 1. when column A has z, column B sum is 0

  3. Divide calculation 1 by calculation 2 and find %. X=1/3,Y=2/2,Z=0/3

  4. create a new column that will have below values

    • if calculation 1 is above 100 and calculation 3 is above 65% then
    modified_zip
    will have value
    65%above100


    • if calculation 1 is above 100 and calculation 3 is less than 35% then
    modified_zip
    will have value
    35%above100


    • if calculation 1 is above 100 and calculation 3 is between 35% to 65 then
    modified_zip
    will have value
    otherabove100


    • if calculation 1 is between 50 to 100 and calculation 3 is above 65% then
    modified_zip
    will have value
    65%between50and100


    • if calculation 1 is between 50 to 100 and calculation 3 is below 35% then
    modified_zip
    will have value
    35%between50and100


    • if calculation 1 is between 50 to 100 and calculation 3 is between 35% and 65% then
    modified_zip
    will have value
    otherbetween50and100


    • if calculation 1 is between 10 to 50 and calculation 3 is above 65% then
    modified_zip
    will have value
    65%between10and50


    • if calculation 1 is between 10 to 50 and calculation 3 is below 35% then
    modified_zip
    will have value
    35%between10and50


    • if calculation 1 is between 10 to 50 and calculation 3 is between 35% and 65% then
    modified_zip
    will have value
    otherbetween10and50


    • if calculation 1 is below 10 then
    modified_zip
    will have value
    smallnumber



I tried to use
table
command but dont know how to work with % and counts, both

Answer
#Look at count of unique values in column 
library(sqldf)
sqldf("select A, count(A) from new group by A")

#for each unique value in column A, find sum of column B. 
sqldf("select A, count(A), sum(B) as sumB from new group by A")

# Divide calculation 1 by calculation 2
new1      <- sqldf("select A, count(A), sum(B) as sumB from new group by A")
new1$calc <- new1$`count(A)`/new1$sumB
new1$calc[is.infinite(new1$calc)] <- 0
new1$calc <- new1$calc*100

You have a very long list of rules there and almost none of them apply to your example because your highest Calculation 1 is 3, but when you know how to do 1 you know how to do all of them, so I will give you 1 example:

#create a new column that will have below values
new1$modified_zip <- NA
new1$modified_zip[new1$`count(A)` > 100 & new1$calc > 65] <- "65%above100"
Comments