Michelle Fournet Michelle Fournet - 1 month ago 6
R Question

Search for value within a range of values in two separate vectors

This is my first time posting to Stack Exchange, my apologies as I'm certain I will make a few mistakes. I am trying to assess false detections in a dataset.

I have one data frame with "true" detections

truth=
ID Start Stop SNR
1 213466 213468 10.08
2 32238 32240 10.28
3 218934 218936 12.02
4 222774 222776 11.4
5 68137 68139 10.99


And another data frame with a list of times, that represent possible 'real' detections


possible=
ID Times

1 32239.76

2 32241.14

3 68138.72

4 111233.93

5 128395.28

6 146180.31

7 188433.35

8 198714.7


I am trying to see if the values in my 'possible' data frame lies between the start and stop values. If so I'd like to create a third column in possible called "between" and a column in the "truth" data frame called "match. For every value from possible that falls between I'd like a 1, otherwise a 0. For all of the rows in "truth" that find a match I'd like a 1, otherwise a 0.

Neither ID, not SNR are important. I'm not looking to match on ID. Instead I wand to run through the data frame entirely. Output should look something like:


ID Times Between

1 32239.76 0

2 32241.14 1

3 68138.72 0

4 111233.93 0

5 128395.28 0

6 146180.31 1

7 188433.35 0

8 198714.7 0


Alternatively, knowing if any of my 'possible' time values fall within 2 seconds of start or end times would also do the trick (also with 1/0 outputs)

(Thanks for the feedback on the original post)

Thanks in advance for your patience with me as I navigate this system.

Answer

I think this can be conceptulised as a rolling join in data.table. Take this simplified example:

truth
#   id start stop
#1:  1     1    5
#2:  2     7   10
#3:  3    12   15
#4:  4    17   20
#5:  5    22   26

possible
#   id times
#1:  1     3
#2:  2    11
#3:  3    13
#4:  4    28

setDT(truth)
setDT(possible)
melt(truth, measure.vars=c("start","stop"), value.name="times")[
    possible, on="times", roll=TRUE
    ][, .(id=i.id, truthid=id, times, status=factor(variable, labels=c("in","out")))]

#   id truthid times status
#1:  1       1     3     in
#2:  2       2    11    out
#3:  3       3    13     in
#4:  4       5    28    out

The source datasets were:

truth <- read.table(text="id start stop
1 1 5
2 7 10
3 12 15
4 17 20
5 22 26", header=TRUE)

possible <- read.table(text="id times
1 3
2 11
3 13
4 28", header=TRUE)