Introduction to time series databases | Detailed explanation of the characteristics of time series databases and their differences with traditional databases

Over the past few years, the increasing popularity of the Internet of Things (IoT) and the need for real-time data have led to a significant growth in the adoption of time series databases (TSDB). According to the ranking of DB-Engines, the popularity of TSDB exceeds that of any other type of database, second only to Graph DBMS .

As an important tool for storing, managing and analyzing time series data, the demand for time series databases (TSDB) is likely to continue to rise in the future. If you don’t know much about this yet, this article will comprehensively introduce what a time series database is and why you need a database for time series data.

What is time series data

Talking about the popularity of time series databases in recent years, we have to talk about time series data first. Why does it require a specially optimized database to process? Can't a general relational database satisfy it?

The so-called time series data, from a very popular perspective, is some values ​​(Value) that change with time. At the same time, these values ​​are accompanied by some tags consisting of Key=Value.

Generally includes the following three attributes (from Wikipedia):

Time series

A unique identifier consisting of a name (often called a metric) and a series of Key=Value labels (Labels, or generally called Tags).

Key-value pair (Timestamp, Value)

Key-value pairs composed of timestamps and values, and are naturally sorted according to timestamps. These key-value pairs are generally called samples.

Value

The Value in point 2 is generally a numerical value, such as temperature, humidity, CPU, memory usage, etc., but it may also be any data structure (both structured and unstructured).

Time series data case

For example, take a screenshot of a weather website’s 15-day weather forecast for Yuhang:

Analyzing the two lines of maximum temperature and minimum temperature, the three attributes here are:

  1. The timelines are: a. Daily maximum temperature + <region=Yuhang> b. Daily minimum temperature + <region=Yuhang>
  2. The sequence composed of the timestamp and value of the maximum temperature is 15 key-value pairs from 8/29 to 09/06, and the value is the daily maximum temperature. Minimum temperatures are similar.
  3. Here Value is the temperature, that is, a numerical value. For example, the highest temperature on 8/29 is 36 degrees Celsius, and the lowest temperature is 25 degrees Celsius.

In addition to weather forecast information, time series data also widely exists in the following fields:

Stock Price : Allows stock analysts and traders to understand the trend and direction of a certain stock price.

Health Monitoring : Used in the medical field to monitor the heart rate or other health values ​​of patients who may be taking certain medications.

Physical sensors for industry and the Internet of Things : including various temperature, humidity, speed, acceleration, direction, heart rate, blood oxygen and other sensors included in various smartphones, smart cars and smart homes, etc., which are widely used in manufacturing, medical and other industries. Various sensors generate massive amounts of sensory data all the time and at fixed or irregular intervals, which are mainly used for daily and abnormal monitoring of equipment and human bodies, and intelligent applications based on these massive data mining (such as production line optimization of intelligent manufacturing). , autonomous driving), etc.

Software sensors : such as monitoring intrusive probes in traditional DevOps, non-intrusive probes in cloud native environments (such as the currently popular non-intrusive probe solutions based on eBPF and Service Mesh data plane probes), various software The main purpose of various indicators and embedded data is to monitor daily and abnormal software applications to ensure the continuous and stable operation of business services. Coupled with the current development of the AIOps field, higher requirements are also put forward for the scale and granularity of time series data usage.

Characteristics of time series data

  • Data is generated relatively frequently and stably , and the frequency is generally stable and does not change with people's diurnal activity cycles. There are many types of sensors, coupled with a large number of labels for industries and geographical locations, the scale of data and timelines are extremely large. And the scale of such data is growing rapidly with the popularity of smart devices (wearable devices, smart cars, smart manufacturing) and people's more sophisticated demands for these data applications.
  • The change characteristics of data are more similar to the Append-Only method . Data is continuously added, and there are fewer update scenarios (but there are still data delays, especially in weak network environments). Data is usually deleted based on the expiration time. Delete in batches for a period.
  • In terms of data applications, the most common ones are daily and abnormal monitoring . Based on these data, visual monitoring reports and alarm systems are built, followed by prediction of future trends, that is, time series prediction, especially in the financial field.

Why time series data is so important

尽管时序数据并不是一种新的数据类型,但在过去的几年里,它的流行程度和使用量显著增加,正如 DB-Engines 所分析的那样。有几个因素不容忽视,包括:

  • The development of the Internet and the digitization of many industries . This directly leads to the generation of massive time series data, such as website traffic, social media activities, and sensor readings.
  • Development of machine learning algorithms . Such as recurrent neural network (RNN) and long short-term memory (LSTM) network, these algorithms are suitable for time series data analysis, making it easier for people to extract valuable information from this type of data, giving time series data the opportunity to further generate value.
  • The rise of predictive analytics . This makes time series data an important tool for predicting trends and future outcomes.
  • needs in areas such as finance, medical care and transportation . There is an increasing need for real-time decision-making in these fields, and time series data analytics can cope with these rapidly changing situations.

What is a time series database

Time Series Database (Time Series Database) according to Wikipedia's definition is a database specifically optimized for time series data processing. It is a type of domain database and is designed for data processing services in specific business fields, such as graph database processing graph storage and retrieval. , the document database is used for the storage and retrieval of semi-structured documents, and the search engine is specially used for the retrieval of unstructured text.

Characteristics of time series database

To address the characteristics and challenges involved in time series data described above, TSDB employs a number of techniques. Some of these typical characteristics include:

Log-Structured Merge-tree(LSM-tree)

LSM-tree is a disk-based data structure optimized for write-heavy workloads that enables efficient data ingestion and storage by merging and compressing data in a series of tiers. This reduces write amplification and provides better write performance compared to traditional B-trees.

time based partitioning

Time series databases typically partition data based on time intervals, making queries faster and more efficient and making it easier to retain and manage the data. This approach helps separate recent, frequently accessed data from older, less frequently accessed data, optimizing storage and query performance.

data compression

Time series databases use various compression techniques such as delta encoding, Gorilla compression, or dictionary encoding to reduce storage space requirements. These techniques leverage temporal and value-based patterns in time series data to enable efficient storage without losing data fidelity.

Built-in time-based functions and aggregations

Time series databases provide native support for time-based functions such as moving averages, percentages, and time-based aggregations. These built-in capabilities enable users to perform complex time series analysis more efficiently and with less computational overhead compared to traditional databases.

Why choose time series database

From the above introduction, we also have a preliminary answer to why we need a database in the specific field of time series database.

Based on the characteristics, scale and application of time series data, time series database can make targeted optimizations: the storage adopts a customized compression algorithm, the storage format adopts a row-column mixed storage format optimized for time series mass writing and query scenarios; query operators Introduce more time window-related functions for timing, optimize the query protocol for timing models, and adopt a more flexible expiration strategy for data deletion .

These domain-specific optimizations can give time series databases great advantages over general-purpose databases in terms of domain capabilities, performance, cost, stability and other dimensions.

Summarize

Time series databases have been widely used in the Internet of Things, financial data analysis, monitoring and alarm systems, energy management, healthcare applications, and other "time"-sensitive industries. By using time series databases to analyze and predict time series data, companies can obtain valuable information from the data, thereby making more informed decisions and gaining unique competitive advantages.

However, time series databases and relational databases are not incompatible. Since business systems usually still use relational databases extensively, how can time series data and business data be more conveniently and better combined to generate greater business value? It is one of the problems that time series database needs to solve.


About Greptime:

Greptime Greptime Technology is committed to providing real-time and efficient data storage and analysis services for fields that generate large amounts of time series data, such as smart cars, the Internet of Things, and observability, helping customers mine the deep value of data. Currently there are three main products:

  • GreptimeDB is a time series database written in Rust language. It is distributed, open source, cloud native and highly compatible. It helps enterprises read, write, process and analyze time series data in real time while reducing long-term storage costs.

  • GreptimeCloud can provide users with fully managed DBaaS services, which can be highly integrated with observability, Internet of Things and other fields.

  • GreptimeAI is an observability solution tailored for LLM applications.

  • The vehicle-cloud integrated solution is a time-series database solution that goes deep into the actual business scenarios of car companies, and solves the actual business pain points after the company's vehicle data grows exponentially.

GreptimeCloud and GreptimeAI have been officially tested. Welcome to follow the official account or official website for the latest developments! If you are interested in the enterprise version of GreptimDB, you are welcome to contact the assistant (search greptime on WeChat to add the assistant).

Official website: https://greptime.cn/

GitHub: https://github.com/GreptimeTeam/greptimedb

Documentation: https://docs.greptime.cn/

Twitter: https://twitter.com/Greptime

Slack: https://www.greptime.com/slack

LinkedIn: https://www.linkedin.com/company/greptime

A programmer born in the 1990s developed a video porting software and made over 7 million in less than a year. The ending was very punishing! High school students create their own open source programming language as a coming-of-age ceremony - sharp comments from netizens: Relying on RustDesk due to rampant fraud, domestic service Taobao (taobao.com) suspended domestic services and restarted web version optimization work Java 17 is the most commonly used Java LTS version Windows 10 market share Reaching 70%, Windows 11 continues to decline Open Source Daily | Google supports Hongmeng to take over; open source Rabbit R1; Android phones supported by Docker; Microsoft's anxiety and ambition; Haier Electric shuts down the open platform Apple releases M4 chip Google deletes Android universal kernel (ACK ) Support for RISC-V architecture Yunfeng resigned from Alibaba and plans to produce independent games for Windows platforms in the future
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/6839317/blog/11046036