Page 1 of 1

Opinions re: Schema Design for storing time series data

Posted: Thu Sep 25, 2014 2:34 pm
by dcosgrov
I am looking to get some advice on what type(s) of schema designs have been effective for storing time series telemetry/sensor data in Vertica. I am working on a project where there is a need to store multiple metric values for a given entity at a single point in time, then, continually add new "polled" data on a fixed interval. I see at least a couple of different options for schema design:

option 1:
entity | time | metric name | metric value
A now cpu util 70
A now memory util 90
...
A now+5min cpu util 75
A now+5min memory util 85


option 2:
entity | time | cpu util | memory util | ...
A now 70 90
A now+5min 75 85

I would like to go with more of a key-value design (option 1) but am concerned about significant record growth and combining values from different metrics that might not compress well. Anyway, I'd be interested in hearing about other options beyond what I have above and schema designs that have been particularly effective for storing time series data in Vertica.

Thanks in advance,

Dave

Re: Opinions re: Schema Design for storing time series data

Posted: Thu Sep 25, 2014 4:18 pm
by NorbertKrupa
Your first option should be fine with proper partitioning and a proper timestamp. Instead of having a metric name, you could also use a metric lookup table.

The second option is not really normalized.

Re: Opinions re: Schema Design for storing time series data

Posted: Fri Sep 26, 2014 7:21 pm
by dcosgrov
Thanks a lot for the feedback.