TSQL10分鐘快速入門
我們現在以一個機房性能監控為應用場景,通過一個工具產生的樣本數據,來展示如何使用TSQL來完成時序查詢。
樣本數據
這個樣本數據使用了一個時序性能測試工具benchmark(https://github.com/influxdata/influxdb-comparisons)中的樣本數據生成工具。在安裝并編譯這個工具后,你可以使用下面兩步生成數據并加載到TSDB引擎中。
生成數據
cd influxdb-comparisons/cmd
bulk_data_gen/bulk_data_gen --seed=123 --use-case=devops --scale-var=10 --format=opentsdb --timestamp-start="2019-03-01T00:00:00Z" --timestamp-end="2019-03-01T00:10:00Z" > tsdb_devops_sf10_10m_seed123.json
下面顯示了部分樣本數據:
{"metric":"redis.evicted_keys","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":2951}
{"metric":"redis.keyspace_hits","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":2945}
{"metric":"redis.keyspace_misses","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":2944}
{"metric":"redis.instantaneous_ops_per_sec","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":65}
{"metric":"redis.instantaneous_input_kbps","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":58}
加載數據
cat tsdb_devops_sf10_10m_seed123.json | bulk_load_opentsdb/bulk_load_opentsdb --urls=http://your_tsdb_host:port_num -workers=5
查詢
時間范圍查詢:查看一個metric在一個時間段內的所有的列,包括值,時間戳,以及對應的tag key的值。
select *
from tsdb.`cpu.usage_system`
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'
時間范圍查詢:查詢一個metric在一個時間段內的指定的列,包括值,時間戳,以及具體某幾個tag key的值。
select `value`, `timestamp`, hostname, datacenter
from tsdb.`cpu.usage_system`
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'
時間范圍+tagkey條件查詢:查詢一個metric在時間段內的值,時間戳, 并且hostname滿足IN-LIST的條件。
select `value`, `timestamp`, hostname, datacenter
from tsdb.`cpu.usage_system`
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10' and
hostname in ('host_0', 'host_2', 'host_4')
查詢結果按時間戳排序:查看一個metric在一個時間段內的值,時間戳,以及對應的tag key的值。
select *
from tsdb.`cpu.usage_system`
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'
order by `timestamp`
含數學計算表達式的值過濾條件查詢:查看一個metric在一個時間段內的值,時間戳,以及對應的tag key的值, 值滿足其平方根>1.5。
select *
from tsdb.`cpu.usage_system`
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10' and
sqrt(`value`) > 1.5
分組聚合查詢:按照hostname, datacenter來分組,計算每個分組最大值,最小值,平均值。
select
hostname,
datacenter,
max(`value`) as maxV,
min(`value`) as minV,
avg(`value`) as avgV
from tsdb.`cpu.usage_system`
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'
group by hostname, datacenter
分組分時間段聚合查詢,按照hostname, datacenter來分組,并且進一步按照2分鐘的間隔分組,計算最大值,最小值,平均值。
select
hostname,
datacenter,
tumble(`timestamp`, interval '2' minute) as ts,
max(`value`) as maxV,
min(`value`) as minV,
avg(`value`) as avgV
from tsdb.`cpu.usage_system`
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:10:00'
group by hostname, datacenter, ts
分組分時間段聚合查詢, 計算聚合后的表達式的值
max(value) - min(value) + 0.5* avg(value)
。
select
hostname,
datacenter,
tumble(`timestamp`, interval '2' minute) as ts,
max(`value`) - min(`value`) + 0.5* avg(`value`) as compV
from tsdb.`cpu.usage_system`
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:10:00'
group by hostname, datacenter, ts
計算每臺機器上,相鄰兩個時間戳上的記錄值之間的差值。
下面的例子使用window function lag()
, lag()
函數使用一個基于hostname做分組,時間戳排序的窗口frame, 返回在同一個窗口內當前記錄的前一條記錄的值,通過計算兩者之差,獲取每臺主機上相鄰時間戳的記錄值上的差值。
select hostname, `timestamp`, `value`,
`value` - lag(`value`) over(partition by hostname order by `timestamp`) as diff
from tsdb.`cpu.usage_system`
where `timestamp` between '2019-03-01' and '2019-03-01 00:10:00'
計算每臺機器上,相鄰兩個時間戳上的記錄值之間的差值, 如果差值超過異常值,則重設成0。
下面的例子,把上面的查詢放在一個子查詢中,在子查詢之外用了一個case表達式,來表達如果查詢中的差值超過50.0(認為是異常值), 則重設成0.0。
select hostname, `timestamp`, `value`,
case when diff > 50.0 then 0.0
else diff
end
from (
select hostname, `timestamp`, `value`,
`value` - lag(`value`) over(partition by hostname order by `timestamp`) as diff
from tsdb.`cpu.usage_system`
where `timestamp` between '2019-03-01' and '2019-03-01 00:10:00'
);
計算每臺機器上一分鐘內的記錄的最大值,以及相鄰分鐘之間最大值的差值。
下面的查詢區別于之前差值計算,在于要獲取每分鐘的最大值的差值。我們可以在子查詢中計算每分鐘的最大值,在子查詢外用窗口函數lag()
計算獲得相鄰分鐘之間最大值的差值。
select hostname, ts, maxValue,
maxValue - lag(maxValue) over(partition by hostname order by ts) as diff
from (
select hostname,
tumble(`timestamp`, interval '1' minute) ts, max(`value`) maxValue
from tsdb.`cpu.usage_system`
where `timestamp` between '2019-03-01' and '2019-03-01 00:10:00'
group by hostname, ts)