Time-series in PostgreSQL with heavy query requirement
I'm looking to implement a time-series DB and I've gone through various options of databases to use, however since I'm not that knowledge I've opted to stick with PostgreSQL as I'm somewhat familiar using it with Django (especially due to the ORM).
The idea is to store time series for data with 4 columns (indexed by all but price).
timestamp | id | item | price
I'm looking at adding these every minute, roughly 1500 datapoints are bulk-inserted every minute. After a month I no longer need them to be specific to the minute, only one per day should suffice (at 00:00).
Am I correct in thinking that PostgreSQL should do just fine for this? This will be served by a backend and needs to be quite low in latency (300 ms roundtrips).
My main issue lies with understanding if PostgreSQL is able to return the data efficiently, even when given requirements such as a range of items, a start and end timestamp and the interval the data is requested for (without having to return everything and filtering manually).
If my table contains a single item with the following data:
timestamp | id | item | price 1514391000 01 foo 10 1514391100 02 foo 20 1514391200 03 foo 30 .......... .. ... .. 1514392000 11 foo 20 1514393000 21 foo 20
I would like to be able to request
end: 1514392000 and
step: 200, I would then expect to receive 6 results back with (1000, 1200, 1400, 1600, 1800 and 2000). Is this possible with PostgreSQL in an efficient manner?
The only thing I can think of is when inserting my timeseries I make sure their values are rounded up to the nearest minute, then I know exactly which timestamps to filter for without needing to search the database for.
I'm also wondering if it's possible to search 'nearest timestamp' for a given item, same scenario. All of this seems solvable by clever timestamp entry, but I'm not sure if that's the way to go.作者: sof2er 的来源 发布者: 2017 年 12 月 27 日
I would suggest having a timestamp start and a time stamp end column. Then you can readily find the matching row.
I am thinking of a two-table solution, one for the more recent data and one for the older data.
You should also partition your most recent table, perhaps by day. This will allow you to manage older data more effectively -- dropping data one day (or week or month) at a time.
Then, each day (or week or month), summarize the older data into the records you want to archive. You can drop the partition from the newer data.
You can either swap in the archive partitions or use a view to combine them.作者: Gordon Linoff 发布者: 27.12.2017 05:01