Querying a large Postgres correlation table
I have a large Postgres table (150gb+) which stores a large correlation matrix between two variables val1 and val2. For example:
val1 | val2 | distance _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 0 | 1 | 10 0 | 2 | 21 0 | 3 | 13 1 | 2 | 65 1 | 3 | 43 2 | 3 | 56
The pair (val1,val2) is the composite primary key for the table. I'm finding that when I run the query below the query executes in under 35ms.
SELECT * FROM sliding_window_distances WHERE (val1 = 10000)
But when I search using val2, it doesn't execute and times out.
SELECT * FROM sliding_window_distances WHERE (val2 = 10000)
Ideally I want to run the query below, so that I have all records for the specific value (10000 in my example)
SELECT * FROM sliding_window_distances WHERE (val1 = 10000) OR (val2 = 10000)
I'm not sure how to speed up the query.作者: kPow989 的来源 发布者： 2017 年 12 月 27 日
You may need to clean up the stale cache data before doing any other actions that cause timeout.
VACUUM ANALYZE sliding_window_distances;
Also you should use secondary indexes in your table. Creating an index dramatically speed up the query operations.
To creating an index without locking out writes to the table:
CREATE INDEX CONCURRENTLY windows_dist_index ON sliding_window_distances (val2);
You may also define additional
UNIQUE constraint like below:
作者: gokcand 发布者: 2017 年 12 月 27 日
ALTER TABLE sliding_window_distances ADD UNIQUE (val2, val1);