The connection between causal analysis and association analysis

Association analysis in causal analysis

The discovery of causal analysis has become more and more important in the context of big data. In the field of data analysis, people have begun to try to use artificial intelligence to conduct causal analysis of data, but the conclusion of a causal relationship is intricate and not only through machines. fixed.

In data analysis, we are always troubled by the problem of causal analysis, and we can't figure out the causal connection between things. Generally, the available data is analyzed from a statistical perspective first, the data is processed through specific analysis methods, and the analysis model is established through feature learning, while causal analysis is essentially different from machine learning modeling and prediction. Although we can try to predict the results by learning data with class labels to build models, we are not sure about the predicted results, nor do we know the prediction process. Sometimes we are more unknown than known about the results.

 

Therefore, to know what kind of causal relationship you want to seek, you must first understand that causality is the functional relationship between one event (ie, "cause") and a second event (ie, "effect"), where the latter event is considered to be the cause of the former. result of an event. Generally speaking, an event is the result of a combination of many reasons, and the reasons all occurred at an earlier point in time, and this event can become the cause of other events.

The useful association rules mined in association analysis can provide initial cause conditions for causal analysis, because association analysis can discover the correlation or correlation between item sets in a large number of data sets , as well as the relationship between the values ​​of two or more variables. There is some regularity between them. The rules of such associations are the "data sources" in causation. Causal analysis is based on these rules to discover the causal relationship between things, which requires further causal analysis on the basis of association analysis combined with the chronological order in which the rules appear.

Graph association rules to discover causal relationships

In the context of such rich data features, data analysis tools are expected to dig out clear, accurate, and interpretable association rules, and at the same time perform in-depth analysis according to the order of time, but the existing association analysis tools The association rules analyzed in are all expressions based on relational data: and these expressions cannot clearly describe the discovered rules in most business scenarios, let alone interpretability, so it is more difficult to go further causal analysis.

regular expression

At the same time, in the era of big data with increasing data scale and complex data structure, traditional relational data has exposed many problems such as modeling defects and horizontal scaling, so the data under the graph structure with stronger expressive power It has been used in a large number of fields to store, process, and analyze data. Graph (Graph) abstracts the entities in the information and the relationship between entities into structural data such as vertices and edges between vertices. It is used to mine potential and difficult-to-observe behaviors and connections between people, things, and entities. The graph structure is better enough to better express the correlation between data . In the industry, many non-graph-structured data are often converted into graph data. for analysis.

 Graph

Graph data can describe the relationship between individuals, and is especially suitable for analysis and calculation related to big data associations. The in-depth mining of "cause" in association analysis is realized through the characteristics of graph edges, points, attributes, etc., which provides the basis for "attribution" of causal analysis . In-depth, high-precision, and interpretable association rules can assist data analysts in performing correct, effective, and interpretable causal analysis .

Association rules are obtained through association analysis of graph data; and relational data is stored, processed, and analyzed in a graph structure; graph association rules are used to present the correlation between data, making the rules more interpretable, and through association rules The chronological order in the data will make the correlation between things more closely, which is beneficial for data analysts to know the cause and effect in the association rules and conduct in-depth causal analysis on the data.

Tools that enable correlation to cause and effect

Many people have been doing research on converting relational data to graph data in the data field, especially in terms of databases. Graph databases have blossomed everywhere; but data analysis tools based on graph data are rare in the industry, and graph databases Only processing and storing data based on the graph structure cannot analyze the correlation in the data in a targeted manner. The energy of graph data has not been fully exploited. The industry urgently needs data analysis tools that can deeply process graph data. This is not only the key to mining association rules in data, but also an important basis for analyzing causal relationships.

After collecting the data analysis tools of major online platforms, in addition to the graph database, the data analysis tool that truly realizes the causal relationship from the association rules is Guanhe Causality . After reusing this data analysis tool for a long time, I found that this analysis system not only converts relational data into graph data online based on business needs like a graph database (real-time online conversion, no need to store, and does not change the original data ), and then automatically mine association rules from large-scale graph data . Compared with the graph database that converts relational data into graph data, the graph data is processed in a cumbersome analysis process, and then the association rules between the data are displayed through the graph structure. Guanhe karma can obtain related results more conveniently without changing the original storage status of data. At the same time, most data sources in today's data field are still stored in relational form, so Guanhe karma is more suitable for today's industry needs. Moreover, the function of its time sliding window solves the impact of different time periods on the rules, and can better assist humans to analyze the causal relationship in the data according to the association rules, because the association rules mined through Guanhe causality are more accurate and more accurate . In-depth and more comprehensive . Realized from correlation analysis to causal analysis.

Association rule instance

The above information is based on the current situation in the country. If there is demand, we will continue to share information on foreign data analysis tools for you in the future.

Guess you like

Origin blog.csdn.net/DuJinn/article/details/126344227