Time-series in PostgreSQL with heavy query requirement

sql postgresql

321 观看


31 作者的声誉

I'm looking to implement a time-series DB and I've gone through various options of databases to use, however since I'm not that knowledge I've opted to stick with PostgreSQL as I'm somewhat familiar using it with Django (especially due to the ORM).

The idea is to store time series for data with 4 columns (indexed by all but price).

timestamp | id | item | price

I'm looking at adding these every minute, roughly 1500 datapoints are bulk-inserted every minute. After a month I no longer need them to be specific to the minute, only one per day should suffice (at 00:00).

Am I correct in thinking that PostgreSQL should do just fine for this? This will be served by a backend and needs to be quite low in latency (300 ms roundtrips).

My main issue lies with understanding if PostgreSQL is able to return the data efficiently, even when given requirements such as a range of items, a start and end timestamp and the interval the data is requested for (without having to return everything and filtering manually).

If my table contains a single item with the following data:

timestamp  | id | item | price
1514391000   01    foo     10
1514391100   02    foo     20
1514391200   03    foo     30
..........   ..    ...     ..
1514392000   11    foo     20
1514393000   21    foo     20

I would like to be able to request start: 1514391000, end: 1514392000 and step: 200, I would then expect to receive 6 results back with (1000, 1200, 1400, 1600, 1800 and 2000). Is this possible with PostgreSQL in an efficient manner?

The only thing I can think of is when inserting my timeseries I make sure their values are rounded up to the nearest minute, then I know exactly which timestamps to filter for without needing to search the database for.

I'm also wondering if it's possible to search 'nearest timestamp' for a given item, same scenario. All of this seems solvable by clever timestamp entry, but I'm not sure if that's the way to go.

作者: sof2er 的来源 发布者: 2017 年 12 月 27 日

回应 (1)


865034 作者的声誉

I would suggest having a timestamp start and a time stamp end column. Then you can readily find the matching row.

I am thinking of a two-table solution, one for the more recent data and one for the older data.

You should also partition your most recent table, perhaps by day. This will allow you to manage older data more effectively -- dropping data one day (or week or month) at a time.

Then, each day (or week or month), summarize the older data into the records you want to archive. You can drop the partition from the newer data.

You can either swap in the archive partitions or use a view to combine them.

作者: Gordon Linoff 发布者: 27.12.2017 05:01