协方差numpy.cov与皮尔逊相关系数

协方差与相关系数

这两个概念别弄混就好了，挺简单的，我也就不多说了。

协方差：numpy.cov官网参数

从数值来看，协方差的数值越大，两个变量同向程度也就越大。反之亦然。

numpy.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None)[source]¶

Parameters:
m : array_like

A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables. Also see rowvar below.

y : array_like, optional

An additional set of variables and observations. y has the same form as that of m.

rowvar : bool, optional

这个变量很重要，可以改变计算行还是计算列

If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.

bias : bool, optional

Default normalization (False) is by (N - 1), where N is the number of observations given (unbiased estimate). If bias is True, then normalization is by N. These values can be overridden by using the keyword ddof in numpy versions >= 1.5.

ddof : int, optional

If not None the default value implied by bias is overridden. Note that ddof=1 will return the unbiased estimate, even if both fweights and aweights are specified, and ddof=0 will return the simple average. See the notes for the details. The default value is None.

New in version 1.5.

fweights : array_like, int, optional

1-D array of integer frequency weights; the number of times each observation vector should be repeated.

New in version 1.10.

aweights : array_like, optional

1-D array of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If ddof=0 the array of weights can be used to assign probabilities to observation vectors.

New in version 1.10.

Parameters:	m : array_like A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables. Also see rowvar below. y : array_like, optional An additional set of variables and observations. y has the same form as that of m. rowvar : bool, optional 这个变量很重要，可以改变计算行还是计算列 If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations. bias : bool, optional Default normalization (False) is by `(N - 1)`, where `N` is the number of observations given (unbiased estimate). If bias is True, then normalization is by `N`. These values can be overridden by using the keyword `ddof` in numpy versions >= 1.5. ddof : int, optional If not `None` the default value implied by bias is overridden. Note that `ddof=1` will return the unbiased estimate, even if both fweights and aweights are specified, and `ddof=0` will return the simple average. See the notes for the details. The default value is `None`. New in version 1.5. fweights : array_like, int, optional 1-D array of integer frequency weights; the number of times each observation vector should be repeated. New in version 1.10. aweights : array_like, optional 1-D array of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If `ddof=0` the array of weights can be used to assign probabilities to observation vectors. New in version 1.10.

import numpy as np

a = np.array([1,2,3])
b = np.array([4,3,4])
np.cov(a, b)

>>> np.cov(a, b)
array([[ 1.        ,  0.        ],
       [ 0.        ,  0.33333333]])

相关系数：numpy.corrcoef官网参数

相关系数是用以反映变量之间相关关系密切程度的统计指标。相关系数也可以看成协方差：一种剔除了两个变量量纲影响、标准化后的特殊协方差,它消除了两个变量变化幅度的影响，而只是单纯反应两个变量每单位变化时的相似程度。

相关系数的公式为：

翻译一下：就是用X、Y的协方差除以X的标准差和Y的标准差。

所以，相关系数也可以看成协方差：一种剔除了两个变量量纲影响、标准化后的特殊协方差。

numpy.corrcoef(x, y=None, rowvar=True, bias=<no value>, ddof=<no value>)[source]

Parameters:
x : array_like

A 1-D or 2-D array containing multiple variables and observations. Each row of x represents a variable, and each column a single observation of all those variables. Also see rowvar below.

y : array_like, optional

An additional set of variables and observations. y has the same shape as x.

rowvar : bool, optional

If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.

bias : _NoValue, optional

Has no effect, do not use.

Deprecated since version 1.10.0.

ddof : _NoValue, optional

Has no effect, do not use.

Deprecated since version 1.10.0.

Parameters:	x : array_like A 1-D or 2-D array containing multiple variables and observations. Each row of x represents a variable, and each column a single observation of all those variables. Also see rowvar below. y : array_like, optional An additional set of variables and observations. y has the same shape as x. rowvar : bool, optional If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations. bias : _NoValue, optional Has no effect, do not use. Deprecated since version 1.10.0. ddof : _NoValue, optional Has no effect, do not use. Deprecated since version 1.10.0.

import numpy as np

a = np.array([1,2,3])
b = np.array([2,5,8])

np.corrcoef(a, b)

>>> np.corrcoef(a, b)
array([[ 1.,  1.],
       [ 1.,  1.]])

协方差numpy.cov与皮尔逊相关系数

协方差与相关系数

协方差：numpy.cov官网参数

相关系数：numpy.corrcoef官网参数

猜你喜欢