Kindermann Kindermann - 2 months ago 7
R Question

Grouping and summarizing

For research purposes I need to process data from csv table. The table looks like the following:

Frame Nr. 0 frame_type I_frame
Frame Nr. 1 frame_type P_frame
Frame Nr. 2 frame_type P_frame
Frame Nr. 3 frame_type B_frame
Frame Nr. 4 frame_type P_frame
Frame Nr. 5 frame_type P_frame
Frame Nr. 6 frame_type B_frame
Frame Nr. 7 frame_type P_frame
Frame Nr. 8 frame_type P_frame
Frame Nr. 9 frame_type I_frame
Frame Nr. 10 frame_type P_frame
Frame Nr. 11 frame_type P_frame
Frame Nr. 12 frame_type P_frame
Frame Nr. 13 frame_type I_frame
Frame Nr. 14 frame_type P_frame
Frame Nr. 15 frame_type P_frame
Frame Nr. 16 frame_type B_frame
Frame Nr. 17 frame_type P_frame
Frame Nr. 18 frame_type P_frame
Frame Nr. 19 frame_type P_frame
Frame Nr. 20 frame_type P_frame
Frame Nr. 21 frame_type I_frame
Frame Nr. 22 frame_type P_frame
Frame Nr. 23 frame_type P_frame
Frame Nr. 24 frame_type P_frame
Frame Nr. 25 frame_type I_frame
...


I want R to firstly group frames starting with each I_frame and end up with another I_frame calculating the sum of p-frames and b-frames. In this example, my R program should deliver a result like the following:

I2PB2PB2P I3P I2PB4P I3P ...


Is there a way in R to do that?

Answer

Assuming that your data is in a data.frame named "df" and your "frame classes" are in a column named "frame_class", as in the code below, this should work:

df = data.frame(n_frame = seq(1:13), frame_type = "frame_type",
                frame_class = c("I_frame", "P_frame", "P_frame", "B_frame", "P_frame", "P_frame",
                                "B_frame", "I_frame", "B_frame", "P_frame", "I_frame", "P_frame", "I_frame"))

where_i = which(df$frame_class == "I_frame")
num_i = length(where_i)
out_codes = list()

for (ind_i in 1:(num_i-1)){
  start = where_i[ind_i]
  end = where_i[ind_i+1]
  sub_data = df$frame_class[start:end]
  count_B = sum(sub_data == "B_frame")
  count_P = sum(sub_data == "P_frame")
  out_codes [[ind_i]] = paste0("I",ifelse(count_P > 0 , paste0(count_P,"P") ,""),
                                   ifelse(count_B != 0 , paste0(count_B,"B"),""))
}
out_codes

gives:

[[1]]
[1] "I4P2B"

[[2]]
[1] "I1P1B"

[[3]]
[1] "I1P"

note it's really quick and dirty: you should at least want to implement some checks to be sure that the series always start and end with an "I_frame", but this could put you in the right direction...

Also note that this could be slow for large datasets.

Lorenzo

Comments