Calculation of skewness and kurtosis

Skewness and kurtosis:

  Skewness can reflect the symmetry of the distribution. Right skew (also called positive skew) is shown on the image as a long tail on the right side of the data. At this time, most values ​​are distributed on the left, and a small part of the values ​​are distributed on the right. .

  The kurtosis reflects the sharpness of the image: the greater the kurtosis, the sharper the center point on the image. In the case of the same variance, the variance of a large part of the value in the middle is very small. In order to achieve the same purpose as the variance of the normal distribution, some values ​​must be farther away from the center point, so this is the so-called "thick tail", the reaction It is the phenomenon of increasing abnormal points.

 

Definition of skewness:

image

The skewness of sample X is the third-order standard moment of the sample

where $\mu$ is the mean, $\delta$ is the standard deviation, and E is the mean operation. $\mu_3$ is the third-order center distance, $\kappa_t $ is the $t^{th}$ cumulant

 

Skewness can be represented by a third-order origin moment:

image

 

The calculation method of sample skewness:

For a data of capacity n, a typical skewness calculation method is as follows:

image

Where $\bar x$ is the mean of the sample (the difference from $\mu$ is that $\mu$ is the mean of the whole, and $\bar x$ is the mean of the sample). s is the standard deviation of the sample, $m_3$ is the 3rd order center distance of the sample.

Another definition is as follows:

image

$k_3$ is the unique symmetric unbiased estimator of the third-order cumulant $\kappa_3$ ($k_3$ and $\kappa_3$ are written differently). $k_2=s^2$ is a symmetric unbiased estimate of the second-order cumulant.

Most software uses $G_1$ to calculate skew, such as Excel, Minitab, SAS and SPSS.

 

Definition of kurtosis:

image

  Kurtosis is defined as the fourth-order standard moment. It can be seen that it is very similar to the definition of skewness above, but the former is third-order.

 

Sample kurtosis calculation method:

image

 

The kurtosis of a sample can also be calculated as:

 

image

Where $k_4$ is the only symmetric unbiased estimate of the fourth-order cumulant, $k_2$ is the unbiased estimate of the second-order cumulant (equivalent to the sample variance), $m_4$ is the fourth-order average distance of the sample, $m_2$ is the sample Second-order mean distance.

Likewise, most programs use $G_2$ to calculate kurtosis.

 

python uses pandas to calculate skewness and kurtosis

import pandas as pd
x = [53, 61, 49, 66, 78, 47]
s = pd.Series(x)
print(s.skew())
print(s.kurt())

It uses the above $G_1$ to calculate the skewness $G_2$ to calculate the kurtosis, and the results are as follows:

0.7826325504212567
-0.2631655441038463

 

refer to:

    How Skewness and Kurtosis Affect Your Distribution

    Skewness Wikipedia gives the formula for calculating the bias

   Kurtosis   Wikipedia gives the formula for calculating kurtosis

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325201964&siteId=291194637