Ell Ell - 3 months ago 8
R Question

How to include a level of error in matching/comparing data frames in R

I am new to R and am trying my best (and so far; so good) but I have hit a problem. I have two data frames, one with the theoretical values and the other with the experimental values, and the data frames are not the same length. I would like to compare the two data frames to find matching values between them. As is it theoretical vs experimental values, I need to include a level of error in matching the values, say ±0.5 from the theoretical value. This is where I having my problem- I don’t know how to include this error.

The data frames are quiet large but below is an example of what I have tried.

Theory <- c("195.0882",
"196.0852",
"196.0916",
"300.1600",
"288.1752",
"289.1786",
"290.1819",
"393.2077",
"394.2111")

Experi <- c("195.0312",
"196.0340",
"196.1251",
"288.1856",
"289.1786",
"290.1819")


T <- data.frame(Theory)
E <- data.frame(Experi)
M1 <- merge.default(T, E)
M2 <- match(Theory, Experi)
M2
# [1] NA NA NA NA NA 5 6 NA NA


Both merge and match leave no room for error, and the compare package seems to be no help either.

Answer

We can use data.table::foverlaps function for merging with overlap. First we need to prepare the data, create ranges for Theory values.

library(data.table)

# set tolerance for merge
tolerance <- 0.5

# Theory data, prepare data with tolerance for Start/End
dt_T <- data.table(
  Theory = as.numeric(Theory),
  Start = as.numeric(Theory) - tolerance,
  End = as.numeric(Theory) + tolerance, 
  key= c("Start", "End"))

# Experi data, Start/End are the same
dt_E <- data.table(
  Experi = as.numeric(Experi),
  Start = as.numeric(Experi),
  End = as.numeric(Experi), 
  key= c("Start", "End"))

# merge with overlap
foverlaps(dt_E, dt_T)
#      Theory    Start      End   Experi  i.Start    i.End
# 1: 195.0882 194.5882 195.5882 195.0312 195.0312 195.0312
# 2: 196.0852 195.5852 196.5852 196.0340 196.0340 196.0340
# 3: 196.0916 195.5916 196.5916 196.0340 196.0340 196.0340
# 4: 196.0852 195.5852 196.5852 196.1251 196.1251 196.1251
# 5: 196.0916 195.5916 196.5916 196.1251 196.1251 196.1251
# 6: 288.1752 287.6752 288.6752 288.1856 288.1856 288.1856
# 7: 289.1786 288.6786 289.6786 289.1786 289.1786 289.1786
# 8: 290.1819 289.6819 290.6819 290.1819 290.1819 290.1819
Comments