Master the entire process of data analysis projects
Statistical analysis Descriptive statistics refers to the general method of application of pattern classification tabulation (such as mean, variance, etc.) to summarize data distribution.
Inferential statistical analysis method is through random sampling, the application of statistical methods to save the sample data obtained conclusion, extended to the general method of data analysis
The need to sample data contains information on statistical generalization, abstraction and integration, resulting in composite indicator reflects the sample data, these indicators called statistics
Descriptive statistics feature data can be divided into two categories: one represents the center position data, such as mean, median, and other public; represents another degree of dispersion of data, such as variance, standard deviation, range, etc. with to measure the extent of individual off-center. Two types of indicators complement each other, common response data characteristics
Frequency Analysis
The number of cases referred to the frequency
Gansu ratio and total number of cases falling within the class known as the relative frequency of
Distribution of frequency analysis data will be described mainly by various statistics frequency distribution table, bar chart, pie and bar charts of central tendency and dispersion trends
⑴ selected frequency analysis → Descriptive Statistics →
If ⑵ checked (display frequency table) box
⑶ Click (statistic (s)) button
⑷ Click (Format (F)) button
Description of a new trend
Install a new trend refers to the tendency of a set of data to move closer to a center value. Statistics Descriptive statistics data distribution center position in the position referred to statistics. For the continuous variables (or scale variable) and the ordinal variable describing the trend of the index data center, there are the mean, median, mode, 5% trimmed mean; for qualitative data (public data), data describing central tendency only the mode indicator
Means
Generally refers to the arithmetic mean of mean data (arithmetic mean) is the main metrics data center trends, most indicators are also practical problems, use
The effects of extreme values of the mean data vulnerable
5% trimmed mean
The observed values in ascending order, excluding data sorted out, both ends of the rear portion of the numerical sequence of calculated mean
Median
The observations in the arrangement order from small to large, at the intermediate position called the median value
众数
The mode is the value appears most frequently observed value, that reflect the central tendency of this set of observations
Very poor
The difference between the observed maximum and minimum values of the data in the data reflect the fluctuations
Standard errors are worth
If the difference between the two sample mean and standard error of the ratio is greater than 2 or less than -2, it can be concluded two mean significant difference, and thus to determine the two samples, as for two different overall
The coefficient of variation
Visible when comparing two sets of discrete data size level, if the measured dimension much difference input, or data is not the same dimension, which is a direct comparison of the two standard deviation, is not appropriate, and the need to exclude an amount of measurement scale Gang influence of the coefficient of variation can eliminate these effects
Often the minimum value of the statistical data, the lower quartile, median, and maximum 4 quantile called number 5 summarizes the data. 5 words from the center may be generally seen that the degree of dispersion and distribution of data. This is the case in FIG. 5 is a graphical representation of the number of
Distribution of the case - skewness and kurtosis
Profile has a long left tail peak top-right