协方差numpy.cov与皮尔逊相关系数

协方差与相关系数

这两个概念别弄混就好了,挺简单的,我也就不多说了。

协方差:numpy.cov官网参数

从数值来看,协方差的数值越大,两个变量同向程度也就越大。反之亦然。

numpy.cov(my=Nonerowvar=Truebias=Falseddof=Nonefweights=Noneaweights=None)[source]

Parameters:

m : array_like

A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables. Also see rowvar below.

y : array_like, optional

An additional set of variables and observations. y has the same form as that of m.

rowvar : bool, optional

这个变量很重要,可以改变计算行还是计算列

If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.

bias : bool, optional

Default normalization (False) is by (N - 1), where N is the number of observations given (unbiased estimate). If bias is True, then normalization is by N. These values can be overridden by using the keyword ddof in numpy versions >= 1.5.

ddof : int, optional

If not None the default value implied by bias is overridden. Note that ddof=1 will return the unbiased estimate, even if both fweights and aweights are specified, and ddof=0 will return the simple average. See the notes for the details. The default value is None.

New in version 1.5.

fweights : array_like, int, optional

1-D array of integer frequency weights; the number of times each observation vector should be repeated.

New in version 1.10.

aweights : array_like, optional

1-D array of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If ddof=0 the array of weights can be used to assign probabilities to observation vectors.

New in version 1.10.

import numpy as np

a = np.array([1,2,3])
b = np.array([4,3,4])
np.cov(a, b)

>>> np.cov(a, b)
array([[ 1.        ,  0.        ],
       [ 0.        ,  0.33333333]])

 相关系数:numpy.corrcoef官网参数

        相关系数是用以反映变量之间相关关系密切程度的统计指标。相关系数也可以看成协方差:一种剔除了两个变量量纲影响、标准化后的特殊协方差,它消除了两个变量变化幅度的影响,而只是单纯反应两个变量每单位变化时的相似程度。

相关系数的公式为:

翻译一下:就是用X、Y的协方差除以X的标准差和Y的标准差

所以,相关系数也可以看成协方差:一种剔除了两个变量量纲影响、标准化后的特殊协方差。

numpy.corrcoef(xy=Nonerowvar=Truebias=<no value>ddof=<no value>)[source]

Parameters:

x : array_like

A 1-D or 2-D array containing multiple variables and observations. Each row of x represents a variable, and each column a single observation of all those variables. Also see rowvar below.

y : array_like, optional

An additional set of variables and observations. y has the same shape as x.

rowvar : bool, optional

If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.

bias : _NoValue, optional

Has no effect, do not use.

Deprecated since version 1.10.0.

ddof : _NoValue, optional

Has no effect, do not use.

Deprecated since version 1.10.0.

import numpy as np

a = np.array([1,2,3])
b = np.array([2,5,8])

np.corrcoef(a, b)

>>> np.corrcoef(a, b)
array([[ 1.,  1.],
       [ 1.,  1.]])

猜你喜欢

转载自blog.csdn.net/qq_20412595/article/details/82503276