CorneliaS CorneliaS - 2 months ago 11
SQL Question

Optimizing SQL query on table of 10 million rows: neverending query

I have two tables:

CREATE TABLE routing
(
id integer NOT NULL,
link_geom geometry,
source integer,
target integer,
traveltime_min double precision,
CONSTRAINT routing_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);

CREATE INDEX routing_id_idx
ON routing
USING btree
(id);

CREATE INDEX routing_link_geom_gidx
ON routing
USING gist
(link_geom);

CREATE INDEX routing_source_idx
ON routing
USING btree
(source);

CREATE INDEX routing_target_idx
ON routing
USING btree
(target);


and

CREATE TABLE test
(
link_id character varying,
link_geom geometry,
id integer NOT NULL,
.. (some more attributes here)
traveltime_min double precision,
CONSTRAINT id PRIMARY KEY (id),
CONSTRAINT test_link_id_key UNIQUE (link_id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE test
OWNER TO postgres;


and I am trying to appy the follwing query:

update routing
set traveltime_min = t2.traveltime_min
from test t2
where t2.id = routing.id


Both tables have near 10 millions rows. The problem is that this query runs neverending. Here what 'EXPLAIN' shows:

Update on routing (cost=601725.94..1804772.15 rows=9712264 width=208)
-> Hash Join (cost=601725.94..1804772.15 rows=9712264 width=208)
Hash Cond: (routing.id = t2.id)"
-> Seq Scan on routing (cost=0.00..366200.23 rows=9798223 width=194)"
-> Hash (cost=423414.64..423414.64 rows=9712264 width=18)"
-> Seq Scan on test t2 (cost=0.00..423414.64 rows=9712264 width=18)"


I cannot understand what might cause the problem of such a slow response.
Is it possible to be a problem caused from the server settings? The thing is that i use the default postgrSQL 9.3 settings.

Answer

Drop all indexes on routing before you run the UPDATE and add them again afterwards. That will bring a huge improvement.

Set work_mem high in the session where you run the UPDATE. That will help with the hash.
Set shared_buffers to ¼ of the available memory, but not more than 1GB.