Apache IoTDB Series Tutorial-8: File Synchronization Tool

In the System Tools column of the official website user manual, there is a Sync Tool. Many people ask how to use this tool and what is the delay. Today I will introduce the usage scenarios, basic principles and tests of this tool. skill.

The main text is 2439 words, and the expected reading time is 7 minutes.

scenes to be used

A company has multiple power plants, and an IoTDB is deployed in each power plant to monitor the operating status of multiple devices in the power plant. There is no data exchange among the power plants. At this time, the company wants to build a cloud platform to aggregate the data of all power plants for analysis (there is a premise here, the data analysis of the cloud platform is mainly for long-term historical data, and the real-time requirements of the data are not high). At this time you can use this file synchronization tool.

Positioning of file synchronization tools: To synchronize the schema and data files (TsFile) of one IoTDB to another IoTDB, there will be a certain delay in the synchronization process, depending on the load and configuration.

Fundamental

It should be obvious from the name that the synchronization granularity of this tool is data files, not every data point. This also leads to the longest synchronization delay is the time it takes to generate a data file (from creating the file to sealing). Because the sender must wait for a file to be generated before it can be synchronized, otherwise half of the file will not be parsed. Transferring files is similar to scp, and the advantage of synchronizing each data is to avoid data analysis and re-import.

We call the two IoTDBs involved in file synchronization the sender and the receiver.

Writing process

In order to better understand the time-consuming generation of data files, it is necessary to briefly introduce the writing process.

IoTDB uses the LSM structure. Data is first written to the memory buffer memtable. When the memtable reaches a certain size, it will be placed on the disk, and multiple memtables will correspond to a data file.

For example, a storage group has 1 time series, the write frequency is 1 second, and each data point is 16 bytes. The size of a memtable is 160 bytes, and the size of a TsFile file is 200 bytes. The storage group initially has an empty TsFile available for writing.

(1) When the memtable is full of 10 data points for the first time, submit an asynchronous flash disk task (append to the TsFile currently being written), and check the TsFile size. At this time, it is still 0 bytes, so leave the file alone.

(2) The asynchronous flash disk task is executed, and after flashing, the TsFile becomes 200 bytes.


(3) When the memtable is full of 10 data points for the second time, submit an asynchronous flash disk task, and check the TsFile size at the same time, and find 200 bytes, so mark the off file.

(4) The asynchronous flashing task starts to be executed, and after the flashing, the current TsFile is closed.

In this example, two memtables (20 data) are filled and the file is closed once. The writing frequency is 1 second, and the generation of the file takes about 20 seconds.

Synchronization process

The sender regularly checks whether there is a newly created Schema and a newly generated complete data file locally, and if it exists, it will be sent to the receiver. The synchronization delay is basically max (synchronization check interval, time-consuming file generation).

In the iotdb-sync-client.properties configuration file, there is a parameter sync_period_in_second, which controls how often the sender checks. If 60 seconds is configured, the longest synchronization delay is 60 seconds, if 10 seconds is configured, the delay is 20 seconds (the generation of a file takes time). It can be seen that the synchronization delay is related to the configuration and also related to the write frequency.

The sync_period_in_second in the configuration file can be configured relatively small, and there is no major problem. The main problem is the generation interval of TsFile, which is controlled by memtable_size_threshold and tsfile_size_threshold. These two parameters, especially the larger the memtable, the faster the historical data query. Therefore, there needs to be a balance between the lowest latency that synchronization can achieve and query performance.

A relatively simple check of the file generation rate in your own system, you can go to the data directory data/data/storage group{/partition} to see the last modification time interval of the .resource file.

Test synchronization

Test synchronization on a machine, the sample script is linux environment, version 0.10.1. First download the binary package, unzip it and rename it twice, one sender and one receiver.

Start receiving end

cd receiver
## 配置
conf/iotdb-engine.properties 中的 is_sync_enable=true
## 启动接收端 IoTDB
nohup ./sbin/start-server.sh >/dev/null 2>&1 &
## 启动接收端 CLI,默认用 root 用户连本地 6667 端口
./sbin/start-cli.sh

Start the sender and prepare the data

cd sender
## 配置
conf/iotdb-engine.properties 中的 rpc_port=6668
conf/iotdb-sync-client.properties 中的 sync_period_in_second=10
## 启动 iotdb
nohup ./sbin/start-server.sh >/dev/null 2>&1 &
## 启动发送进程
nohup ./tools/start-sync-client.sh >/dev/null 2>&1 &
## 启动发送端 CLI,注意,改到 6668 端口了
./sbin/start-cli.sh -h 127.0.0.1 -p 6668 -u root -pw root
## 输入
insert into root.turbine1.d1(timestamp,s1,s3) values(2,1,3);
flush (这个是精髓,强制把 memtable 刷盘,文件封口)

Receiving end verification data

## 在接收端 CLI 中输入
select * from root

If you want to configure it yourself, you need to set enable_parameter_adapter to false first, otherwise the system will automatically adjust the size of memtable and tsfile.

For details, please refer to the user manual, or click to read the original text to skip over.

http://iotdb.apache.org/zh/UserGuide/V0.10.x/System%20Tools/Sync%20Tool.html

to sum up

File synchronization is suitable for data collection and backup that do not require so much real-time performance. The second or minute synchronization needs to change the plan. In the experiment, if the data has not been synchronized, either the synchronization timing check interval is set too long, or the file is not closed. There is no mechanism to close the file regularly. You need to manually call flush. By the way, for one thing, the start-cli.sh script can also be connected to the remote iotdb with the -h -p -u -pw parameter.

Guess you like

Origin blog.csdn.net/qiaojialin/article/details/107873047