R software
- R is free
- R is a comprehensive statistical research platform that provides a variety of data analysis techniques
- R has top drawing functions
data analysis
What is data
Data is a symbol that records and can identify objective events. It is a physical symbol or a combination of these physical symbols that records the nature, state, and relationship of objective things.
Why do data analysis?
Use the results of data analysis to guide decision making
Data analysis process
Data collection → data storage → data analysis → data mining → data visualization → decision making
data collection
The collected data is called the original data,
Store data as files
Statistics
Use statistical methods to purposefully analyze and process the collected data, and interpret the analysis results
Data mining
Data mining, called Data Mining in English, also known as data exploration and data mining, generally refers to the process of searching for information hidden in a large amount of data through algorithms
The difference between data mining and data statistics
- Data mining cannot determine what to dig out. It is used to explore the unknown, and the specific method is not known. The goal of data statistics is generally clear, knowing which values to calculate, such as summing, calculating average, etc., only need to use the appropriate Statistical method
- Data mining is usually related to computer science. The goal of data mining is achieved through many methods such as statistical online analysis and processing, information retrieval, machine learning, artificial intelligence, expert systems, and pattern recognition;
- Data statistics, different statisticians use different methods to calculate the same results; while data mining, the same data, different people may get different results
- Data mining and data statistics are not independent of each other, and statistical knowledge is also required in the process of data mining
Data mining and three major thinking changes
1. To analyze all the data related to something, instead of relying on analyzing a small number of data samples
. 2. We are willing to accept the complexity of the data, and no longer pursue accuracy
3. No longer seek elusive causality, and focus on things Correlation
data visualization
Graphics are often more clear than numbers. For example, the latitude and longitude information obtained by GPS positioning is better displayed on a map.