Apache IoTDB Series Tutorial-7: Time Series Data File Format TsFile

There are many file formats in the big data ecosystem, such as Parquet, ORC, Avro, etc., which are all file formats designed for nested data. These file formats generally have a pre-defined schema, and data is written in rows, organized by attributes, and stored in columns. However, these file formats generally cannot meet the management requirements of time series data well. For example, in some time series data scenarios, generally each sequence is written independently, and the timestamps are not aligned; query results also need to be sorted by timestamp. TsFile (Time series File) is the file format we designed for time series data scenarios. Today I mainly introduce the usage, mainly for the 0.10 version.

The text is 2587 words, and the estimated reading time is 7 minutes.

scenes to be used

Because the file format is relatively lightweight, it is suitable for use as a data compression package at the edge. This edge can be inside the device, or at the industrial computer or factory level. The data generated on the device can be persisted to a file for storage at any time. The equipment mentioned here may be a fan with multiple measuring points, such as wind speed sensor, temperature sensor, etc. The data collected by each sensor is a time series. Lenovo's IoT platform has been using TsFile to store time series data since 2017.

Therefore, the target scenario of TsFile is to manage the time series data of one or more devices.

Equipment-measuring point model

Device (DeviceId): a concept similar to a table.

Measuring point (MeasurementId): A device can have multiple measuring points, similar to the concept of a column in a table.

Time series path (Path): Path (device Id, measurement point Id) can be defined by equipment and measuring point.

Measuring point description information (MeasurementSchema): Each time series corresponds to a description information, including data type, encoding method, and compression method.

Each time series has two columns: time column and value column.

I like to draw a picture recently. It’s basically like this. Different equipment can have different measuring points.

Registration metadata

Using TsFile, the first step is to register metadata.

Registration time series: Path+MeasurementSchema

Each time series can be registered in this way.

To register the time series, you need to provide a Path and a MeasurementSchema 

String path = "test.tsfile";
File f = FSFactoryProducer.getFSFactory().getFile(path);
TsFileWriter tsFileWriter = new TsFileWriter(f);
// add measurements into file schema
tsFileWriter.registerTimeseries(new Path("device_1", "measurement_1"),new MeasurementSchema("measurement_1", TSDataType.INT64, TSEncoding.RLE));

Before 0.10, all devices shared a point table, and the measurement schema of the same name also needed to be the same (this is the source of the same restrictions for the measurement point types of the same name under a storage group in IoTDB). After 0.10, each time series is truly independent and does not interfere with each other.

Register devices by template: device template + device

The above one-by-one registration is more troublesome, so it provides the function of a device template. Each template defines a set of MeasurementSchema, for example, there are 10 measurement points. When a device is associated with this template, 10 sequences are automatically registered.

First generate the device template, and then register the template.

Map<String, MeasurementSchema> template = new HashMap<>();
template.put("measurement_1", new MeasurementSchema("measurement_1", TSDataType.INT64, TSEncoding.RLE));
template.put("measurement_2", new MeasurementSchema("measurement_2", TSDataType.DOUBLE, TSEncoding.GORILLA));
tsFileWriter.registerTemplate("template_1", template);

Next, register the device and associate it to the template by the template name:

tsFileWriter.registerDevice("device_1", "template_1");
tsFileWriter.registerDevice("device_2", "template_1");

In this way, I registered 2 devices, each with 2 measuring points.

Register a template and write data in real time

This is an advanced simplified version. When we only register one device template, we can write data directly without registering the device. During the writing process, if it is found that the data written by this device is not registered, it will directly find the MeasurementSchema with the same name in the template for registration. This also inherits the fine tradition of versions before 0.9 (for versions before 0.9, TsFile can only register one template, and then you can write data).

Write data

TsFile data writing has a limitation. Each column needs to be written in increments of time, otherwise the correctness is not guaranteed.

Write a row of data according to the device: TSRecord

A TSRecord is a device, a time stamp, and the value of multiple measuring points. Similar to a row of data in a table.

Write a batch of data by device: Tablet

Haha, I saw Tablet again, yes, this structure is a structure that runs through TsFile and IoTDB Session. Represents the value of multiple measuring points with multiple timestamps for one device, similar to a sub-table. This subtable cannot have null values.

Similarly, this writing interface is fast and can reach a writing speed of tens of millions of points per second.

Read data

The query interface receives a batch of paths and an expression (time filtering and value filtering can be performed), which actually corresponds to the select and where clauses.

When querying, the default table structure of TsFile is wide table, time, d1.m1, d1.m2, d2.m1, d2.m2. By default, this structure aligns the given query Path by Time and performs conditional filtering.

data

Sample code:

https://github.com/apache/incubator-iotdb/blob/master/example/tsfile/

Documentation:

http://iotdb.apache.org/zh/UserGuide/V0.10.x/Client/Programming%20-%20TsFile%20API.html

to sum up

Today we introduced the data model, metadata registration, writing and reading process of the time series file format TsFile. That's it. Everyone, order Star!

https://github.com/apache/incubator-iotdb/tree/master

like! attention! Forward!

Guess you like

Origin blog.csdn.net/qiaojialin/article/details/107587660