Apache IoTDB Series Tutorial-6: Performance Optimization (0.8-0.10)

 

Today's content includes modeling optimization, read and write performance optimization, and will involve some simple principles. Mainly for 0.8-0.10 version.

The text is 3754 words, and the estimated reading time is 10 minutes.

Modeling guide

About storage groups

Now each storage group is a relatively independent engine, and the read-write lock is at the storage group level. Therefore, changing the storage group from 1 to 10 can basically increase the speed of reading and writing by 8 times. It is recommended to set up several storage groups with CPU cores for a single IoTDB instance. The more storage groups, the higher the degree of parallelism. We plan to decentralize the lock granularity to the device layer.

equipment

The concept of device is not defined in the SQL statement, but the second-to-last layer is set as the device by default during server-side processing, which makes it easy for everyone to ignore this concept. Let me talk about what the equipment affects.

(1) Distinguish between sequential data and out-of-order data is based on equipment as the granularity. For example, if a device writes data with timestamp 1-10 in the memory (no matter which measurement points are written, the timestamp will be counted on the head of the device), and the disk is placed, and then write data with timestamp <=10 , These data will be cached as out-of-order data and placed on disk.

(2) Time range index of equipment granularity. For each TsFile file, an index of device granularity will be constructed in the memory. If all devices are active, N TsFile and D devices, there will be N*D indexes. The index memory of millions of devices will be overwhelming. We will change this thing in one or two versions.

Let me talk about how to design and model to control the number of devices. It is better to say that the actual application equipment and sensor level are relatively simple. The sensor layer is directly under the equipment, and it is generally not wrong to build. Pay attention to the multi-layer structure under the equipment.

For example, I have 10 sensors (s1, s2,...,s10) under one device, and each sensor collects 10 time series data (f1, f2,...f10). At this time, it is easy to build root.xxx.device.s1.f1. When you build this, the device you thought is no longer the device you thought, and the actual device becomes root.xxx.device.s1. The actual number of devices is 10 times what you think.

What to do? If there are not many sub-devices under the device, there is no problem with modeling, as long as you know how many devices are actually in the system, so that there will be no deviation in communication, which is convenient for troubleshooting later.

If there are a lot of sub-devices, you can layer the ones behind the device into one layer, such as root.xxx.device.s1_f1. Since we use. As the separator, so s1_f1 becomes level 1. The actual device is root.xxx.device.

Measurement definition

Measurement is the measurement point of the last layer. If a measuring point is of INT32 or INT64 type, and the value of this data is the same most of the time, there is no change, then it is good to use RLE encoding. Can greatly save disk space, of course, the speed of flashing will also become faster. The compression mode is just fine with SNAPPY on.

Tag & Attribute

With these two concepts introduced in 0.10.0, it is easy to be confused about the difference between the two. Although they are all key-value attributes. However, the tag can be used to query the metadata of the time series in reverse. If the key of a tag is the owner, you can use show timeseries where owner=Thanos to check the time series owned by Thanos. Tag is resident in memory and has an index from Tag to time series.

Attribute is a common attribute, for example, an attribute is description="this is my series". These attributes can only be shown incidentally along the path of a given time series for helpers to view.

Therefore, it is necessary to distinguish according to actual needs. Those attributes that need to be reversely searched can be built into tags, and the others can be made into attributes. 

Read and write optimization

Reading and writing are closely related, and data writing and parameter configuration will affect query performance.

Write interface

Taking 0.10 as an example, first comparing the same kind, the insertRecords interface is definitely faster than the insertRecord interface. This difference between executeBatch and execute similar to JDBC saves the number of network communications. Similarly, insertTablets is faster than insertTablet, and createMultiTimeseries is also faster than createTimeseries.

Furthermore, we provide two insertRecords methods, one is to pass the value of Object and the other is to pass the value of String. If the client can get the value type, it is recommended to use Object, which will be about 25% faster than String.

For cross-class comparison, if you don't consider the time-consuming format conversion on the client side, insertTablet is much faster than insertRecords, perhaps more than 8 times, which saves a lot of time consuming object encapsulation, and batch size is about 1000.

The insertTablet interface is not sorted by default. If you can ensure that the timestamp of a Tablet data is non-decreasing, you can add a parameter with sorted as true. It saves the sorting of the client.

When the statistics are time-consuming, you also need to pay attention to the time-consuming format conversion of the client. You can separate the construction time and execution time of the interface parameters.

Query interface

The query interface is relatively simple. The Session default hasNext and next will return the RowRecord structure. This structure is not necessarily required by everyone. You can use the SessionDataSet's iterator to get an iterator, and then get the original data through a JDBC-like interface to avoid a lot of useless Object generation.

Sequential write

For time series databases, time series is a very important concept, it is best not to mess around. IoTDB supports out-of-order writing of data, but out-of-order data will affect query performance, mainly for aggregate queries. The principle is that out-of-order data will invalidate the pre-calculated statistical information and reduce the dimensionality of aggregate queries to read the original data.

Under normal circumstances, there is no problem with a few times the disorder, but if too many (tens of thousands of times) disorder data is written in a period of time, the memory may burst during query. For example, the memory buffer writes data with timestamps 1-10 to the disk, and then writes 9999 times 1-10 buffers, so there are 10,000 data blocks with timestamps 1-10 on the disk . When querying, all 10,000 data blocks need to be read out for merging, and the memory usage is relatively large.

In the face of this scenario, we will do data sorting in the background to deal with disorder (the merge introduced in 0.9, but there are bugs in version 0.9, 0.10 is fixed, but it is turned off by default, and merge will be reopened in 0.11), but If you can avoid disorder on the client, try to avoid it when writing. A device is written in ascending order.

If Kafka is connected to the front, it is best to pay attention to the device id as the partition granularity, so that the data of a device will be sent to one partition, and the same partition can guarantee the order during consumption.

Memory buffer

Let me first introduce how to calculate how many points can be cached in memory for each sequence. Divide the size of memtable by the number of sequences, and then divide by the size of each point. For example, the long type is 16 bytes (8-byte timestamp, 8 byte value), float is 12 bytes.

The size of the memtable can be seen in the log. Search for reaches, and probably the log is the memtable size xxx reaches the threshold. If the enable_parameter_adapter in the configuration file is not changed to false, the size of this memtable is not fixed, and is adjusted with the number of registered sequences.

The memory buffer is as large as possible within a certain range to facilitate reading and writing. It is better that each sequence can buffer less than 1 million points. But it is not recommended to be too large. It will be sorted temporarily during query. If there are too many data points in the memory, such as tens of millions, the memory sorting will take up to ten seconds during query.

In order to avoid this problem, a parameter is added in 0.10.0, avg_series_point_number_threshold, the default is 10000, which means that each sequence in the memory buffer will cache at most so many points and the disk will be flushed. This default parameter is not given well, and can be changed to 500,000 Or 1 million.

The larger the parameter memtable_size_threshold is, the faster the writing speed is, generally about a few hundred M to one or two G. Do not set too small, such as a few megabytes, which will seriously affect the writing speed. When setting this parameter, you need to be careful not to exceed the memory limit. Before adjusting this parameter, you need to make sure that enable_parameter_adapter is changed to false.

Multiple data directories

The bottleneck of the database is disk IO. The simple way to improve disk IO capability is to configure multiple disks. The data directory of IoTDB can be configured in the data_dirs parameter, separating multiple directories with commas. There can be one directory per disk. When writing data, it will find the most free writing among these disks.

Client optimization

I just mentioned storage group-level locks. For N write threads in the same storage group, these N write threads will grab a lock. It is better for a storage group to correspond to no more than 50 clients. Too many write threads will cause Excessive lock contention.

The capacity of the thread pool SessionPool is generally enough to have a server CPU core, not too much.

The client's memory, data production and consumption rate can also be monitored to avoid excessive backlog of submitted tasks. If the client's memory is full, a phenomenon will occur: the client sends a request to the server, and the server executes and returns quickly, but The client will be slow to receive the results.

Easy to burst memory point

Select * from root This statement is best not to do when there are too many sequences. This operation will treat the entire library as a table, and find out a batch of data in all columns at once. It is easy to burst the memory. We will add a check in version 0.11. Reject in time.

The show timeseries version 0.10 and earlier will copy all the system sequences in the memory and transmit them to the client. If there are too many sequences, it is best to specify a prefix to filter. Or show child paths to check down layer by layer.

If there are too many time series (100 million level), the metadata may explode in memory. According to a time series of 200 bytes, it can be estimated that about 10 million series will account for 2G metadata (that is, the metadata tree).

to sum up

The database requires a lot of manual tuning in the early stage. The current automatic tuning tools still need to be improved. Our goal is to be as simple as possible. Version 0.11 will improve the memory and parameter configuration. There is a lot of content today, and then I will have a sequel if I think about it later!

Welcome to follow, forward, and star github!

https://github.com/apache/incubator-iotdb/tree/master

Guess you like

Origin blog.csdn.net/qiaojialin/article/details/107328557