user3209752 - 6 months ago 44

SQL Question

Put succinctly, if a query tells me A overlaps B then I don't need it to also tell me that B also overlaps A as they overlap each other.

So I am trying to use a self join in sql to select just 'DISTINCT' overlaps.

To illustrate, here is a simple SQL fiddle that I wrote to show inclusive overlap selection (http://sqlfiddle.com/#!9/7af84f/1)

In detail...

Assume I have a table of name (char), d1 (int), d2 (int) , the schema of which is below. Here d1 and d2 represent the start and end of some interval that might overlap with another interval in the same table,.

`CREATE TABLE test (`

letter char ,

d1 int ,

d2 int

) ;

Given this table I fill it with some values

`INSERT INTO test (letter,d1,d2)`

VALUES

('A', 2, 10), -- overlaps C and D

('B', 12, 20), -- overlaps E

('C', 5, 10), -- overlaps A and D

('D', 1, 8), -- overlaps A and C

('E', 13, 15), -- overlaps B

('F', 25, 30); -- doesn't overlap anything

and run the following query that uses a self join to correctly find the rows where d1 and d2 in one row has an inclusive overlap with d1 and d2 in other rows.

`-- selects all records that overlap in the range d1 - d2 inclusive`

-- (excluding the implicit overlap between a record and itself)

-- The results are sorted by letter followed by d1

SELECT

basetable.letter as test_letter,

basetable.d1,

basetable.d2,

overlaptable.letter as overlap_letter,

overlaptable.d1 as overlap_d1,

overlaptable.d2 as overlap_d2

FROM

test as basetable,

test as overlaptable

WHERE

-- there is an inclusive overlap

basetable.d1 <= overlaptable.d2 and basetable.d2 >= overlaptable.d1

AND

-- the row being checked is not itsself

basetable.letter <> overlaptable.letter

AND

basetable.d1 <> overlaptable.d1

AND

basetable.d2 <> overlaptable.d2

ORDER BY

basetable.letter,

basetable.d1

That correctly gives me the following, showing all 6 versions of overlaps eg left hand column indicates that A overlaps C and another row shows that C overlaps A (note the sqlfiddle doesn't seem to understand field aliases so my column headers are different)

`test_letter d1 d2 overlap_letter overlap_d1 overlap_d2`

A 2 10 D 1 8

B 12 20 E 13 15

C 5 10 D 1 8

D 1 8 A 2 10

D 1 8 C 5 10

E 13 15 B 12 20

My question is this:

ie this result...

`test_letter d1 d2 overlap_letter overlap_d1 overlap_d2`

A 2 10 D 1 8

A 2 10 C 5 10

B 12 20 E 13 15

C 5 10 D 1 8

eg:

a result that just shows records for A, B and C in the left hand column according to the following reasoning

- A(2,10) overlaps with D(1,8) and C(5,10) and {SHOW THESE TWO ROWS}
- B(12,20) overlaps with E(13,15) {SHOW THIS ROW}
- C(5,10) overlaps with D(1,8) {SHOW THIS ROW but don't show the A(1,10) overlap as row 2 already shows that A and C overlap}
- D(1,8) {DON'T SHOW anything new as we already know about A(1,10) and C(5,10)}
- E(13,15) {DON'T SHOW anything new as we already know about B(12,20) }
- F(25,30) {DON'T SHOW anything as there are no overlaps}

Answer

You can just change to an inequality. And, you should also use `JOIN`

:

```
SELECT basetable.letter as test_letter, basetable.d1, basetable.d2,
overlaptable.letter as overlap_letter, overlaptable.d1 as overlap_d1, overlaptable.d2 as overlap_d2
FROM test basetable JOIN
test overlaptable
ON basetable.d1 <= overlaptable.d2 AND
basetable.d2 >= overlaptable.d1
WHERE basetable.letter < overlaptable.letter -- This is the change
ORDER BY basetable.letter, basetable.d1;
```