Apache IoTDB tutorial series-2: Basic SQL operations

Today we mainly introduce commonly used SQL, including the addition, deletion, modification, and checking of metadata and data. The SQL in this article is based on 0.10.0. This major version will be released soon!

The body text is 11018 words (mostly SQL and printed information, not many Chinese characters), and the estimated reading time is 10 minutes.

At present, IoTDB has two main interfaces: SQL and NoSQL. Today, I will introduce the SQL interface. For ease of understanding, you can download the 0.10.0 pre-release version at the link below, and try it while watching:

Binary version download link:

https://pan.baidu.com/s/1KWnEIIE0Duwr9TZVugib6w  

Password: dmrg

You can also compile the source code:

git clone https://github.com/apache/incubator-iotdb.git
cd incubator-iotdb
git fetch origin rel/0.10:rel/0.10
git checkout rel/0.10
mvn clean package -pl distribution -am -DskipTests
二进制发布包位置:
distribution/target/apache-iotdb-0.10.0-SNAPSHOT-incubating-bin.zip

Okay, let's start!

DDL data definition language

Reference documents:

http://iotdb.apache.org/UserGuide/Master/Operation%20Manual/DDL%20Data%20Definition%20Language.html

Storage group operations

# 创建存储组
IoTDB> set storage group to root.turbine


# 查询存储组
IoTDB> SHOW STORAGE GROUP
+-------------+
|storage group|
+-------------+
| root.turbine|
+-------------+


# 删除存储组
IoTDB> delete storage group root.turbine

Create time series

create timeseries root.turbine.d1.s1(temperature1) with datatype=FLOAT, encoding=GORILLA, compression=SNAPPY tags(unit=degree, owner=user1) attributes(description=mysensor1, location=BeiJing)
create timeseries root.turbine.d1.s2(temperature2) with datatype=FLOAT, encoding=GORILLA, compression=SNAPPY tags(unit=degree, owner=user1) attributes(description=mysensor2, location=TianJin)
create timeseries root.turbine.d2.s1(temperature1) with datatype=FLOAT, encoding=GORILLA, compression=SNAPPY tags(unit=degree, owner=user2) attributes(description=mysensor3, location=HeBei)

The sequence visualization registered above is the picture below (hand drawn... currently there is no visualization function)

In order to be more convenient to use in practical applications, in addition to basic information such as the path and code of the time series, we have added three concepts of measuring point alias, label, and attribute. The total size of tags and attributes is set in the configuration file tag_attribute_total_size.

Alias: The alias of the measuring point, which can be used for reading and writing like the name of the measuring point, without setting.

Label: In the form of key=value, the time series metadata can be queried backward through the label, for example, the unit and the owner. The label will reside in memory. Currently, only one tag query condition can be given, which can be precise and fuzzy query.

Attributes: key=value format, which can only display attribute information, such as description information and location, according to the time series path. If there is no need for reverse query, it is recommended to define it as an attribute.

# 插入更新 别名、标签、属性
ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(unit=Degree, owner=me) ATTRIBUTES(description=ha, newAttr=v1)
# 删除时间序列
delete timeseries root.turbine.d2.s1

Query sequence metadata based on path and label

# 查询所有时间序列数据
IoTDB> show timeseries
+------------------+------------+-------------+--------+--------+-----------+-----------+--------+-----+------+
|        timeseries|       alias|storage group|dataType|encoding|compression|description|location|owner|  unit|
+------------------+------------+-------------+--------+--------+-----------+-----------+--------+-----+------+
|root.turbine.d1.s1|temperature1| root.turbine|   FLOAT| GORILLA|     SNAPPY|  mysensor1| BeiJing|user1|degree|
|root.turbine.d1.s2|temperature2| root.turbine|   FLOAT| GORILLA|     SNAPPY|  mysensor2| TianJin|user1|degree|
|root.turbine.d2.s1|temperature1| root.turbine|   FLOAT| GORILLA|     SNAPPY|  mysensor3|   HeBei|user2|degree|
+------------------+------------+-------------+--------+--------+-----------+-----------+--------+-----+------+


# 查询 root.turbine.d1 前缀路径下的时间序列 
# 根据 tag 精确查询 owner 为 user1 的序列
IoTDB> show timeseries root.turbine.d1
IoTDB> show timeseries root.turbine where owner=user1
+------------------+------------+-------------+--------+--------+-----------+-----------+--------+-----+------+
|        timeseries|       alias|storage group|dataType|encoding|compression|description|location|owner|  unit|
+------------------+------------+-------------+--------+--------+-----------+-----------+--------+-----+------+
|root.turbine.d1.s1|temperature1| root.turbine|   FLOAT| GORILLA|     SNAPPY|  mysensor1| BeiJing|user1|degree|
|root.turbine.d1.s2|temperature2| root.turbine|   FLOAT| GORILLA|     SNAPPY|  mysensor2| TianJin|user1|degree|
+------------------+------------+-------------+--------+--------+-----------+-----------+--------+-----+------+


# 根据 tag 模糊查询 owner 的 value 中包含 'user' 的序列
IoTDB> show timeseries where owner contains 'user'
+------------------+------------+-------------+--------+--------+-----------+-----------+--------+-----+------+
|        timeseries|       alias|storage group|dataType|encoding|compression|description|location|owner|  unit|
+------------------+------------+-------------+--------+--------+-----------+-----------+--------+-----+------+
|root.turbine.d1.s1|temperature1| root.turbine|   FLOAT| GORILLA|     SNAPPY|  mysensor1| BeiJing|user1|degree|
|root.turbine.d1.s2|temperature2| root.turbine|   FLOAT| GORILLA|     SNAPPY|  mysensor2| TianJin|user1|degree|
|root.turbine.d2.s1|temperature1| root.turbine|   FLOAT| GORILLA|     SNAPPY|  mysensor3|   HeBei|user2|degree|
+------------------+------------+-------------+--------+--------+-----------+-----------+--------+-----+------+

View the child nodes of a path

IoTDB> show child paths root.turbine
+---------------+
|    child paths|
+---------------+
|root.turbine.d1|
|root.turbine.d2|
+---------------+


Count the number of time series

# 统计所有时间序列数量
IoTDB> count timeseries
+-----+
|count|
+-----+
|    3|
+-----+


# 分组统计时间序列,root 为第 0 层
IoTDB> count timeseries group by level=2
+---------------+-----+
|         column|count|
+---------------+-----+
|root.turbine.d1|    2|
|root.turbine.d2|    1|
+---------------+-----+

Query all devices

That is to query the path of the second-to-last node

IoTDB> show devices
+---------------+
|        devices|
+---------------+
|root.turbine.d1|
|root.turbine.d2|
+---------------+

DML data manipulation language

Reference documents:

http://iotdb.apache.org/UserGuide/Master/Operation%20Manual/DML%20Data%20Manipulation%20Language.html

Data write

The values ​​of one device, one time stamp, and multiple measuring points can be written at a time.

insert into root.turbine.d1(timestamp,s1,s2) values(1,1,2);
insert into root.turbine.d1(timestamp,s1,s2) values(2,1,2);
insert into root.turbine.d1(timestamp,s1,s2) values(3,1,2);
insert into root.turbine.d1(timestamp,s1,s2) values(4,1,2);
insert into root.turbine.d1(timestamp,s1,s2) values(5,1,2);
insert into root.turbine.d1(timestamp,s1,s2) values(6,1,2);
insert into root.turbine.d1(timestamp,s1,s2) values(10,1,2);

Data deletion

Currently, it only supports deleting the data before a point in time, and then it will support deleting the data for any period of time.

delete from root.turbine.d2.s1 where time <= 10

Raw data query

Next comes various queries, the most commonly used is raw data query.

IoTDB> select s1, s2 from root.turbine.d1
+-----------------------------+------------------+------------------+
|                         Time|root.turbine.d1.s1|root.turbine.d1.s2|
+-----------------------------+------------------+------------------+
|1970-01-01T08:00:00.001+08:00|               1.0|               2.0|
|1970-01-01T08:00:00.002+08:00|               1.0|               2.0|
|1970-01-01T08:00:00.003+08:00|               1.0|               2.0|
|1970-01-01T08:00:00.004+08:00|               1.0|               2.0|
|1970-01-01T08:00:00.005+08:00|               1.0|               2.0|
|1970-01-01T08:00:00.006+08:00|               1.0|               2.0|
|1970-01-01T08:00:00.010+08:00|               1.0|               2.0|
+-----------------------------+------------------+------------------+

Single-point fill-in null value query

Many timestamps of the data collected by the sensor have deviations, and it is easy to find the data in accurate timestamp query. You can use previous or linear methods to fill in the blank value.

IoTDB> select s1 from root.turbine.d1 where time = 8
+----+------------------+
|Time|root.turbine.d1.s1|
+----+------------------+
+----+------------------+


# 用前边最近的值填过来
IoTDB> select s1 from root.turbine.d1 where time = 8 fill(float[previous])
+-----------------------------+------------------+
|                         Time|root.turbine.d1.s1|
+-----------------------------+------------------+
|1970-01-01T08:00:00.008+08:00|               1.0|
+-----------------------------+------------------+


# 如果想限制补值的范围,超过这个范围就不补了,可以再加个参数,要带单位
IoTDB> select s1 from root.turbine.d1 where time = 8 fill(float[previous,1ms])
+-----------------------------+------------------+
|                         Time|root.turbine.d1.s1|
+-----------------------------+------------------+
|1970-01-01T08:00:00.008+08:00|              null|
+-----------------------------+------------------+

Latest data query

In order to visualize the latest data in real time, we have made a separate latest data point query function. Use the select last keyword as a prefix. The other syntax is the same as the original data, and predicate filtering cannot be added.

IoTDB> select last * from root
+-----------------------------+------------------+-----+
|                         Time|        timeseries|value|
+-----------------------------+------------------+-----+
|1970-01-01T08:00:00.010+08:00|root.turbine.d1.s1|  1.0|
|1970-01-01T08:00:00.010+08:00|root.turbine.d1.s2|  2.0|
+-----------------------------+------------------+-----+

Aggregate query

To count the aggregated value of a time series, we currently treat each time series as an independent sequence, and aggregation is also done in series. The next version will add the ability to aggregate all sequences under one path.

IoTDB> select count(*) from root where time <= 10
+-------------------------+-------------------------+-------------------------+
|count(root.turbine.d1.s1)|count(root.turbine.d1.s2)|count(root.turbine.d2.s1)|
+-------------------------+-------------------------+-------------------------+
|                        7|                        7|                        0|
+-------------------------+-------------------------+-------------------------+

0.10.0 Reduced frequency aggregation query

The syntax of down-frequency aggregation 0.10 is different from 0.9. First introduce the 0.10.0 version of the down-frequency aggregation query syntax. Let me give an example. Check the average value of a sequence from 9 am to 12 am in May this year. The result should be similar to this:

May 1st, 9 o'clock-12 o'clock: Aggregate value

May 2nd 9 o'clock-12 o'clock: aggregated value

...

May 31st, 9 o'clock-12 o'clock: aggregated value

In order to achieve this flexible query, a sliding window is required. The window starts at 9 o'clock on May 1st, and the length is 3 hours. Each time you slide forward for 24 hours and slide until May 31, an average is calculated in each window. value.

Therefore, we mainly designed three parameters:

(1) The start and end range of the sliding window, left closed and right open interval: May 1 to 31

(2) The length of the sliding window: 3 hours

(3) Sliding step length: 24 hours

The statement is as follows (I didn't write so much data, it is all empty at present):

select avg(s1) from root.turbine.d1 group by([2020-05-01T09:00:00, 2020-05-31T12:00:00), 3h, 24h)

Give another simpler example: check the daily average of May

In this example, the length of the sliding window is equal to the sliding step length, so the third parameter can be omitted:

select avg(s1) from root.turbine.d1 group by([2020-05-01T00:00:00, 2020-06-01T00:00:00), 1d)

0.10.0 Sample fill-in value

0.10.0 The new query function, based on the group by query, if we use the last_value aggregation function, it is a sampling function. If there is no value in a certain time interval, you can also use the previous value to fill in the blank.

# 正常降采样,没数据的区间会填充 null
IoTDB> select last_value(s1) from root.turbine.d1 group by([1,10), 2ms)
+-----------------------------+------------------------------+
|                         Time|last_value(root.turbine.d1.s1)|
+-----------------------------+------------------------------+
|1970-01-01T08:00:00.001+08:00|                           1.0|
|1970-01-01T08:00:00.003+08:00|                           1.0|
|1970-01-01T08:00:00.005+08:00|                           1.0|
|1970-01-01T08:00:00.007+08:00|                          null|
|1970-01-01T08:00:00.009+08:00|                          null|
+-----------------------------+------------------------------+


# 降采样,如果某个区间没值,可以用前一个聚合值补空,填充函数为 previous
IoTDB> select last_value(s1) from root.turbine.d1 group by([1,10), 2ms) fill(float[previous])
+-----------------------------+------------------------------+
|                         Time|last_value(root.turbine.d1.s1)|
+-----------------------------+------------------------------+
|1970-01-01T08:00:00.001+08:00|                           1.0|
|1970-01-01T08:00:00.003+08:00|                           1.0|
|1970-01-01T08:00:00.005+08:00|                           1.0|
|1970-01-01T08:00:00.007+08:00|                           1.0|
|1970-01-01T08:00:00.009+08:00|                           1.0|
+-----------------------------+------------------------------+

In addition, it also supports another way to fill in the blank value, previousuntillast. Use the previous value to fill in the blank until the time value of the latest point is filled. For example, the timestamp of the latest point here is 10, 11 and 13 One point will no longer be added.

IoTDB> select last_value(s1) from root.turbine.d1 group by((1,15], 2ms) fill(float[previousuntillast])
+-----------------------------+------------------------------+
|                         Time|last_value(root.turbine.d1.s1)|
+-----------------------------+------------------------------+
|1970-01-01T08:00:00.003+08:00|                           1.0|
|1970-01-01T08:00:00.005+08:00|                           1.0|
|1970-01-01T08:00:00.007+08:00|                           1.0|
|1970-01-01T08:00:00.009+08:00|                           1.0|
|1970-01-01T08:00:00.011+08:00|                           1.0|
|1970-01-01T08:00:00.013+08:00|                          null|
|1970-01-01T08:00:00.015+08:00|                          null|
+-----------------------------+------------------------------+

I don't know if you have noticed that the interval of this sentence is opened before and closed, and the result set is also the time point of the closed interval. In this way, the query of sampling and filling null values ​​is realized through the group by fill statement.

0.9.x down-frequency aggregation query

The down-frequency aggregation syntax of the old version 0.9 is different from that of 0.10. There are several parameters

(1) Segment interval, divide the time axis into segments according to this length

(2) The origin of the division, from which point to start the division, the end point of any segment can be used. The default is January 1, 1970, 0:00:00, 0:00:00 as the cutting origin, which is the 0 of the timestamp.

(3) The display range of the result set

After the first two parameters are fixed, the segment of the time axis is determined, and the third parameter specifies the result set.

For example, query the daily average of May

select avg(s1) from root.turbine.d1 group by (1d, 2020-05-01 00:00:00, [2020-05-01 00:00:00, 2020-05-31 23:59:59]);

Align query by device

Through the above example, we can see that the default table structure of IoTDB query is [time, sequence 1, sequence 2, ..., sequence n], all sequences will be aligned according to time, if there is a sequence that does not exist at a point in time Exist, it will fill in the blank value. When doing value filtering, the filtering of this table structure will be very strict.

In order to prevent each device from interfering with each other when querying, we support aligning query by time and device. The table structure is [time, device ID, measuring point 1, measuring point 2, ..., measuring point n]. This is the relationship The table structure is similar, just add align by device after the query statement 

IoTDB> select * from root align by device
+-----------------------------+---------------+---+---+
|                         Time|         Device| s1| s2|
+-----------------------------+---------------+---+---+
|1970-01-01T08:00:00.001+08:00|root.turbine.d1|1.0|2.0|
|1970-01-01T08:00:00.002+08:00|root.turbine.d1|1.0|2.0|
|1970-01-01T08:00:00.003+08:00|root.turbine.d1|1.0|2.0|
|1970-01-01T08:00:00.004+08:00|root.turbine.d1|1.0|2.0|
|1970-01-01T08:00:00.005+08:00|root.turbine.d1|1.0|2.0|
|1970-01-01T08:00:00.006+08:00|root.turbine.d1|1.0|2.0|
|1970-01-01T08:00:00.010+08:00|root.turbine.d1|1.0|2.0|
+-----------------------------+---------------+---+---+

to sum up

Today’s basic operations will be introduced first. For the specific sql syntax, please refer to the official website. The sql in this article can be glued to the CLI and have a fun by yourself~ Happy weekend everyone!

Welcome to watch! Appreciate! Forward!

Guess you like

Origin blog.csdn.net/qiaojialin/article/details/106596017