Deepak Bala Deepak Bala - 1 year ago 80
SQL Question

PostgreSQL does not use a partial index

I have a table in PostgreSQL 9.2 that has a

column. Let's call this
. The values in this column are fairly unique (may contain 5-6 duplicates at the most). The table has ~5 million rows. About half these rows contain a
value for
. When I execute the following query I expect 1-5 rows. In most cases (>80%) I only expect 1 row.


explain analyze SELECT col1,col2.. colN
FROM table
WHERE text_col = 'my_value';

index exists on
. This index is never used by the query planner and I am not sure why. This is the output of the query.


Seq Scan on two (cost=0.000..459573.080 rows=93 width=339) (actual time=1392.864..3196.283 rows=2 loops=1)
Filter: (victor = 'foxtrot'::text)
Rows Removed by Filter: 4077384

I added another partial index to try to filter out those values that were not null, but that did not help (with or without
. I do not need
considering no
conditions are expressed in my queries, but they also match equality).

ON table
USING btree
(text_col COLLATE pg_catalog."default" text_pattern_ops)

Disabling sequence scans using
set enable_seqscan = off;
makes the planner still pick the
over an
. In summary...

  1. The number of rows returned by this query is small.

  2. Given that the non-null rows are fairly unique, an index scan over the text should be faster.

  3. Vacuuming and analyzing the table did not help the optimizer pick the index.

My questions

  1. Why does the database pick the sequence scan over the index scan?

  2. When a table has a text column whose equality condition should be checked, are there any best practices I can adhere to?

  3. How do I reduce the time taken for this query?

[Edit - More information]

  1. The index scan is picked up on my local database that houses about 10% of the data that is available in production.

Answer Source

A partial index is a good idea to exclude half the rows of the table which you obviously do not need. Simpler:

CREATE INDEX name_idx ON table (text_col)

Be sure to run ANALYZE table after creating the index. (Autovacuum does that automatically after some time if you don't do it manually, but if you test right after creation, your test will fail.)

Then, to convince the query planner that a particular partial index can be used, repeat the WHERE condition in the query - even if it seems completely redundant:

SELECT col1,col2, .. colN
FROM   table 
WHERE  text_col = 'my_value'
AND   text_col IS NOT NULL;  -- repeat condition


Per documentation:

However, keep in mind that the predicate must match the conditions used in the queries that are supposed to benefit from the index. To be precise, a partial index can be used in a query only if the system can recognize that the WHERE condition of the query mathematically implies the predicate of the index. PostgreSQL does not have a sophisticated theorem prover that can recognize mathematically equivalent expressions that are written in different forms. (Not only is such a general theorem prover extremely difficult to create, it would probably be too slow to be of any real use.) The system can recognize simple inequality implications, for example "x < 1" implies "x < 2"; otherwise the predicate condition must exactly match part of the query's WHERE condition or the index will not be recognized as usable. Matching takes place at query planning time, not at run time. As a result, parameterized query clauses do not work with a partial index.

As for parameterized queries: again, add the (redundant) predicate of the partial index as an additional, constant WHERE condition, and it works just fine.

An important update in Postgres 9.6 largely improves chances for index-only scans (which can make queries cheaper and the query planner will more readily chose such query plans). Related:

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download