Skewness and kurtosis:
Skewness can reflect the symmetry of the distribution. Right skew (also called positive skew) is shown on the image as a long tail on the right side of the data. At this time, most values are distributed on the left, and a small part of the values are distributed on the right. .
The kurtosis reflects the sharpness of the image: the greater the kurtosis, the sharper the center point on the image. In the case of the same variance, the variance of a large part of the value in the middle is very small. In order to achieve the same purpose as the variance of the normal distribution, some values must be farther away from the center point, so this is the so-called "thick tail", the reaction It is the phenomenon of increasing abnormal points.
Definition of skewness:
The skewness of sample X is the third-order standard moment of the sample
where $\mu$ is the mean, $\delta$ is the standard deviation, and E is the mean operation. $\mu_3$ is the third-order center distance, $\kappa_t $ is the $t^{th}$ cumulant
Skewness can be represented by a third-order origin moment:
The calculation method of sample skewness:
For a data of capacity n, a typical skewness calculation method is as follows:
Where $\bar x$ is the mean of the sample (the difference from $\mu$ is that $\mu$ is the mean of the whole, and $\bar x$ is the mean of the sample). s is the standard deviation of the sample, $m_3$ is the 3rd order center distance of the sample.
Another definition is as follows:
$k_3$ is the unique symmetric unbiased estimator of the third-order cumulant $\kappa_3$ ($k_3$ and $\kappa_3$ are written differently). $k_2=s^2$ is a symmetric unbiased estimate of the second-order cumulant.
Most software uses $G_1$ to calculate skew, such as Excel, Minitab, SAS and SPSS.
Definition of kurtosis:
Kurtosis is defined as the fourth-order standard moment. It can be seen that it is very similar to the definition of skewness above, but the former is third-order.
Sample kurtosis calculation method:
The kurtosis of a sample can also be calculated as:
Where $k_4$ is the only symmetric unbiased estimate of the fourth-order cumulant, $k_2$ is the unbiased estimate of the second-order cumulant (equivalent to the sample variance), $m_4$ is the fourth-order average distance of the sample, $m_2$ is the sample Second-order mean distance.
Likewise, most programs use $G_2$ to calculate kurtosis.
python uses pandas to calculate skewness and kurtosis
import pandas as pd x = [53, 61, 49, 66, 78, 47] s = pd.Series(x) print(s.skew()) print(s.kurt())
It uses the above $G_1$ to calculate the skewness $G_2$ to calculate the kurtosis, and the results are as follows:
0.7826325504212567 -0.2631655441038463
refer to:
How Skewness and Kurtosis Affect Your Distribution
Skewness Wikipedia gives the formula for calculating the bias