Apache11 Apache11 - 5 months ago 10
SQL Question

section wise add up data

hello as the title suggest i need to sum section wise data --->

can.id status qid marks
001 section 1 question 1 112 3
001 section 1 question 2 117 3
001 section 1 question 3 116 3
001 section 2 question 1 115 3
001 section 2 question 2 114 -1
001 section 2 question 3 111 3
001 section 3 question 1 112 -1
001 section 3 question 2 116 3
002 section 1 question 1 114 3
002 section 1 question 2 111 3
002 section 2 question 2 111 -1
002 section 3 question 1 111 -1


i want to display sum of marks for each can.id for every section, help is appreciated....

Answer

In R, we can use dplyr. We extract (from tidyr), substring from 'status' to create 'section', then grouped by 'can.id' and 'section', get the sum of 'marks'.

library(dplyr)
library(tidyr)
df1 %>% 
  extract(status, into = "section", "(.*\\d+)\\s+[[:alpha:]].*") %>%
  group_by(can.id, section) %>% 
  summarise(SumMarks = sum(marks))
#  can.id   section SumMarks
#   <int>     <chr>    <int>
#1      1 section 1        9
#2      1 section 2        5
#3      1 section 3        2
#4      2 section 1        6
#5      2 section 2       -1
#6      2 section 3       -1

Or using data.table

library(data.table)
setDT(df1)[,.(SumMarks = sum(marks)), .(can.id, 
                 section = sub("\\s+[[:alpha:]].*", "", status))]

data

 df1 <- structure(list(can.id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
 2L, 2L, 2L), status = c("section 1 question 1", "section 1 question 2", 
 "section 1 question 3", "section 2 question 1", "section 2 question 2", 
 "section 2 question 3", "section 3 question 1", "section 3 question 2", 
 "section 1 question 1", "section 1 question 2", "section 2 question 2", 
 "section 3 question 1"), qid = c(112L, 117L, 116L, 115L, 114L, 
  111L, 112L, 116L, 114L, 111L, 111L, 111L), marks = c(3L, 3L, 
 3L, 3L, -1L, 3L, -1L, 3L, 3L, 3L, -1L, -1L)), .Names = c("can.id", 
 "status", "qid", "marks"), class = "data.frame",
  row.names = c(NA, -12L))
Comments