Querying a large Postgres correlation table

postgresql

80 观看

1回复

166 作者的声誉

I have a large Postgres table (150gb+) which stores a large correlation matrix between two variables val1 and val2. For example:

val1   |   val2  |  distance  
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  0    |    1    |     10
  0    |    2    |     21
  0    |    3    |     13
  1    |    2    |     65
  1    |    3    |     43
  2    |    3    |     56

The pair (val1,val2) is the composite primary key for the table. I'm finding that when I run the query below the query executes in under 35ms.

SELECT *
FROM sliding_window_distances
WHERE (val1 = 10000)

But when I search using val2, it doesn't execute and times out.

SELECT *
FROM sliding_window_distances
WHERE (val2 = 10000)

Ideally I want to run the query below, so that I have all records for the specific value (10000 in my example)

SELECT *
FROM sliding_window_distances
WHERE (val1 = 10000)
OR (val2 = 10000)

I'm not sure how to speed up the query.

作者: kPow989 的来源 发布者: 2017 年 12 月 27 日

回应 1


1

4245 作者的声誉

决定

You may need to clean up the stale cache data before doing any other actions that cause timeout.

First type:

VACUUM ANALYZE sliding_window_distances;  

Also you should use secondary indexes in your table. Creating an index dramatically speed up the query operations.

To creating an index without locking out writes to the table:

CREATE INDEX CONCURRENTLY windows_dist_index ON sliding_window_distances (val2);

You may also define additional UNIQUE constraint like below:

ALTER TABLE sliding_window_distances ADD UNIQUE (val2, val1);

PostgreSQL Documentation on Indexes

作者: gokcand 发布者: 2017 年 12 月 27 日
32x32