Alex Alex - 2 months ago 22
R Question

Math function using multiple matching criteria

I'm new to this but I'm pretty sure this question hasn't been answered, or I'm just not good at searching....

I would like to subtract the values in multiple rows from a particular row based on matching columns and values. My actual data will be a large matrix with >5000 columns, eaching needing to be subtracted by a blank value that matches the a value in a factor column.

Here is an example data table:

c1 c2 c3 c4 c5
r1 A 1 2 3 aa
r2 B 2 3 4 bb
r3 C 3 4 5 aa
r4 D 4 1 6 bb
r5 Blank 2 3 4 aa
r6 Blank 3 4 5 bb


I would like to subtract the c2,c3,and c4 values of c1 ="Blank" row from A,B,and C using the c5 factor to define which Blank values are used (aa or bb). I would like the "Blank" values to be subtracted from all rows sharing c5 info.
(i know this is confusing to describe)

So the results would look like this:

c1 c2 c3 c4 c5
r1 A -1 -1 -1 aa
r2 B -1 -1 -1 bb
r3 C 1 1 1 aa
r4 D 1 -3 1 bb


I've seen the ddply function work for doing something like this with a single column, but I wasn't able to expand that to perform this task for multiple columns. I'm a noob though...

Thank you for your help!

Answer

This is not tested for all possible cases, but should give you an idea:

df <- read.table(text = 
"c1  c2  c3  c4  c5
r1  A   1   2   3   aa
r2  B   2   3   4   bb
r3  C   3   4   5   aa
r4  D   4   1   6   bb
r5  Blank   2   3   4   aa
r6  Blank   3   4   5   bb", header = T)

library(data.table)
# separate dataset into two
dt <- data.table(df, key = "c5")
dt.blank <- dt[c1 == "Blank"]
dt <- dt[c1 != "Blank"]

# merge into resulting dataset 
dt.res <- dt[dt.blank]

# update each column
columns.count <- ncol(dt)
for(i in 2:(columns.count-1)) {
  dt.res[[i]] <- dt.res[[i]] - dt.res[[i + columns.count]]
}

# > dt.res
#   c1 c2 c3 c4 c5  i.c1 i.c2 i.c3 i.c4
# 1:  A -1 -1 -1 aa Blank    2    3    4
# 2:  C  1  1  1 aa Blank    2    3    4
# 3:  B -1 -1 -1 bb Blank    3    4    5
# 4:  D  1 -3  1 bb Blank    3    4    5