Natalia P Natalia P - 1 year ago 77
R Question

R: Melting and Merging Data

this is an example of my dataset:

ID = c(1, 2, 3, 4)
Allegation = c("A::B::C::V", "A::C", "A::D", "D::E::D")
Disposition = c("Open::Closed::Open", "Closed::Closed", "Open::Open", "Closed::Open")
df <- data.frame(ID,Allegation, Disposition)

ID Allegation Disposition
1 A::B::C::V Open::Closed::Open
2 A::C Closed::Closed
3 A::D Open::Open
4 D::E::D Closed::Open


I want the following results:

ID Allegation Disposition Allegation_detail Dispostion_detail
1 A::B::C::V Open::Closed::Open A Open
1 A::B::C::V Open::Closed::Open B Closed
1 A::B::C::V Open::Closed::Open C Open
1 A::B::C::V Open::Closed::Open V NA
2 A::C Closed::Closed A Closed


I have tried to melt the data and later merged it, but I am not obtaining the desired output

This is my approach so far:

#Create column to see num of allegations
df$num_allegations <- (str_count(as.character(df$Allegation), "::") +1)

#Looking max allegations
max(df$num_allegations)

#Expanding allegations
df$Allegation1 <- sapply(strsplit(as.character(df$Allegation), "::", fixed= TRUE), `[`, 1)
df$Allegation2 <- sapply(strsplit(as.character(df$Allegation), "::", fixed= TRUE), `[`, 2)
df$Allegation3 <- sapply(strsplit(as.character(df$Allegation), "::", fixed= TRUE), `[`, 3)
df$Allegation4 <- sapply(strsplit(as.character(df$Allegation), "::", fixed= TRUE), `[`, 4)

#Expanding Disposition
df$Disposition1 <- sapply(strsplit(as.character(df$Disposition), "::", fixed= TRUE), `[`, 1)
df$Disposition2 <- sapply(strsplit(as.character(df$Disposition), "::", fixed= TRUE), `[`, 2)
df$Disposition3 <- sapply(strsplit(as.character(df$Disposition), "::", fixed= TRUE), `[`, 3)
df$Disposition4 <- sapply(strsplit(as.character(df$Disposition), "::", fixed= TRUE), `[`, 4)

#melting data
dfmelt1 <- melt(df[,c(1:8)], id=c("ID", "Allegation", "Disposition", "num_allegations"))
dfmelt2 <- melt(df[,c(1,2,3,4,9,10,11,12)], id=c("ID", "Allegation", "Disposition", "num_allegations"))
colnames(dfmelt2) <- c("ID" ,"Allegation" ,"Disposition","num_allegations", "variable2",
"value2")


But when I am merging the data, I am obtaining this result, which is not what I want to:

merge(dfmelt1, dfmelt2, by = c("ID", "Allegation", "Disposition", "num_allegations"))

ID Allegation Disposition num_allegations variable value variable2 value2
1 A::B::C::V Open::Closed::Open 4 Allegation1 A Disposition1 Open
1 A::B::C::V Open::Closed::Open 4 Allegation1 A Disposition2 Closed
1 A::B::C::V Open::Closed::Open 4 Allegation1 A Disposition3 Open
1 A::B::C::V Open::Closed::Open 4 Allegation1 A Disposition4 <NA>
1 A::B::C::V Open::Closed::Open 4 Allegation2 B Disposition1 Open
1 A::B::C::V Open::Closed::Open 4 Allegation2 B Disposition2 Closed
1 A::B::C::V Open::Closed::Open 4 Allegation2 B Disposition3 Open
1 A::B::C::V Open::Closed::Open 4 Allegation2 B Disposition4 <NA>
1 A::B::C::V Open::Closed::Open 4 Allegation3 C Disposition1 Open
1 A::B::C::V Open::Closed::Open 4 Allegation3 C Disposition2 Closed
1 A::B::C::V Open::Closed::Open 4 Allegation3 C Disposition3 Open
1 A::B::C::V Open::Closed::Open 4 Allegation3 C Disposition4 <NA>
1 A::B::C::V Open::Closed::Open 4 Allegation4 V Disposition1 Open
1 A::B::C::V Open::Closed::Open 4 Allegation4 V Disposition2 Closed
1 A::B::C::V Open::Closed::Open 4 Allegation4 V Disposition3 Open
1 A::B::C::V Open::Closed::Open 4 Allegation4 V Disposition4 <NA>
2 A::C Closed::Closed 2 Allegation1 A Disposition1 Closed


How can I merge, so I obtain Disposition 1, only where it says Allegation 1?

Thanks

Answer Source

Here is an idea,

#get a vector with repeats for expanding the data.frame
ind <- stringr::str_count(df$Allegation, '\\w+') 
new_df <- df[rep(row.names(df), ind),]
#create vector with allegation details
v1 <- do.call(rbind, sapply(strsplit(as.character(df$Allegation), '::'), function(i)
                                                                  t(as.data.frame(t(i)))))
#create vector with Disposition details
v2 <- do.call(rbind, sapply(strsplit(as.character(df$Disposition), '::'), function(i)
                                                                  t(as.data.frame(t(i)))))
v2 <- v2[match(make.unique(rownames(v1)), make.unique(rownames(v2)))]

#construct final data frame
final_df <- data.frame(new_df, Allegation_detail=v1, Disposition_detail=v2, 
                                              stringsAsFactors = FALSE, row.names = NULL)

final_df
#    ID Allegation        Disposition Allegation_detail Disposition_detail
#1    1 A::B::C::V Open::Closed::Open                 A               Open
#2    1 A::B::C::V Open::Closed::Open                 B             Closed
#3    1 A::B::C::V Open::Closed::Open                 C               Open
#4    1 A::B::C::V Open::Closed::Open                 V               <NA>
#5    2       A::C     Closed::Closed                 A             Closed
#6    2       A::C     Closed::Closed                 C             Closed
#7    3       A::D         Open::Open                 A               Open
#8    3       A::D         Open::Open                 D               Open
#9    4    D::E::D       Closed::Open                 D             Closed
#10   4    D::E::D       Closed::Open                 E               Open
#11   4    D::E::D       Closed::Open                 D               <NA>
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download