charbelfa charbelfa - 2 months ago 10
SQL Question

Speeding up SELECT query response in large PostgreSQL Database (250 million rows)

Using PostGreSQL, I have a very large table of ~250 million rows composed of the following 4 attributes:

enter image description here

I am trying to select the

sim_value
attribute based on the defined
word1
and
word2
using the following SELECT query:

(Assume I have 2 words: X and Y in which X or Y can belong to either word1 or word2)

SELECT sim_value
From public.sim_values_english
Where (word1='X' or word2='X') and (word1='Y' or word2='Y') and (word1!=word2)


This is taking on average 1~1.5 minutes to return the
sim_value
which is really long! Storing the entire table in memory is very costly since it weights more than 10 GB).

How can I speed up this query? What are your suggestions?

P.S.:
word1
and
word2
are never identical so if for 1 case:
word1
is X and
word2
is Y, Then there does not exist another row where
word1
is Y and
word2
is X!

NOTE: I have searched for similar topics but none of them addressed this exact issue. Thank you for understanding

Thank you

Answer

Firstly, if you don't have it already, I would ensure that the following index exists:

CREATE INDEX ON sim_values_english(word1, word2);

Then I would try the following query:

SELECT sim_value
FROM sim_values_english
WHERE word1='X' AND word2='Y'
UNION ALL
SELECT sim_value
FROM sim_values_english
WHERE word1='Y' AND word2='X'