序列与 Pandas 中 DataFrame 的每一列的相关性,矢量化
Correlation of a Series to each column of a DataFrame in Pandas, vectorized
是否可以以向量化的方式计算 Series 与 DataFrame 中每一列的相关性?这适用于滚动相关和 EWM 相关,但不适用于普通相关。
例如:
In [3]: series = pd.Series(pd.np.random.rand(12))
In [4]: frame = pd.DataFrame(pd.np.random.rand(12,4))
In [7]: pd.ewmcorr(series, frame, span=3)
Out[7]:
0 1 2 3
0 NaN NaN NaN NaN
1 -1.000000 -1.000000 1.000000 1.000000
2 0.644915 -0.980088 -0.802944 -0.922638
3 0.499564 -0.919574 -0.240631 -0.256109
4 -0.172139 -0.913296 0.482402 -0.282733
5 -0.394725 -0.693024 0.168029 0.177241
6 -0.219131 -0.475347 0.192552 0.149787
7 -0.461821 0.353778 0.538289 -0.005628
8 0.573406 0.681704 -0.491689 0.194916
9 0.655414 -0.079153 -0.464814 -0.331571
10 0.735604 -0.389858 -0.647369 0.220238
11 0.205766 -0.249702 -0.463639 -0.106032
In [8]: pd.rolling_corr(series, frame, window=3)
Out[8]:
0 1 2 3
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 0.496697 -0.957551 -0.673210 -0.849874
3 0.886848 -0.937174 -0.479519 -0.505008
4 -0.180454 -0.950213 0.331308 0.987414
5 -0.998852 -0.770988 0.582625 0.821079
6 -0.849263 -0.142453 -0.690959 0.805143
7 -0.617343 0.768797 0.299155 0.415997
8 0.930545 0.883782 -0.287360 -0.073551
9 0.917790 -0.171220 -0.993951 -0.207630
10 0.916901 -0.246603 -0.990313 0.862856
11 0.426314 -0.876191 -0.643768 -0.225983
In [10]: series.corr(frame)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-10-599dbd7f0707> in <module>()
----> 1 series.corr(frame)
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/series.py in corr(self, other, method, min_periods)
1280 correlation : float
1281 """
-> 1282 this, other = self.align(other, join='inner', copy=False)
1283 if len(this) == 0:
1284 return np.nan
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/generic.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis)
3372 copy=copy, fill_value=fill_value,
3373 method=method, limit=limit,
-> 3374 fill_axis=fill_axis)
3375 elif isinstance(other, Series):
3376 return self._align_series(other, join=join, axis=axis, level=level,
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/generic.py in _align_frame(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis)
3396
3397 if axis is None or axis == 1:
-> 3398 if not self.columns.equals(other.columns):
3399 join_columns, clidx, cridx = \
3400 self.columns.join(other.columns, how=join, level=level,
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/generic.py in __getattr__(self, name)
2143 or name in self._metadata
2144 or name in self._accessors):
-> 2145 return object.__getattribute__(self, name)
2146 else:
2147 if name in self._info_axis:
AttributeError: 'Series' object has no attribute 'columns'
我可以做到,但它不是矢量化的,也不那么优雅:
In [11]: pd.Series({col:series.corr(frame[col]) for col in frame})
Out[11]:
0 0.286678
1 -0.438003
2 -0.011778
3 -0.387740
dtype: float64
您可以使用 corrwith
:
>>> frame.corrwith(series)
0 0.399534
1 0.321166
2 -0.101875
3 0.604326
dtype: float64
A related method corrwith is implemented on DataFrame to compute the correlation between like-labeled Series contained in different DataFrame objects.
是否可以以向量化的方式计算 Series 与 DataFrame 中每一列的相关性?这适用于滚动相关和 EWM 相关,但不适用于普通相关。
例如:
In [3]: series = pd.Series(pd.np.random.rand(12))
In [4]: frame = pd.DataFrame(pd.np.random.rand(12,4))
In [7]: pd.ewmcorr(series, frame, span=3)
Out[7]:
0 1 2 3
0 NaN NaN NaN NaN
1 -1.000000 -1.000000 1.000000 1.000000
2 0.644915 -0.980088 -0.802944 -0.922638
3 0.499564 -0.919574 -0.240631 -0.256109
4 -0.172139 -0.913296 0.482402 -0.282733
5 -0.394725 -0.693024 0.168029 0.177241
6 -0.219131 -0.475347 0.192552 0.149787
7 -0.461821 0.353778 0.538289 -0.005628
8 0.573406 0.681704 -0.491689 0.194916
9 0.655414 -0.079153 -0.464814 -0.331571
10 0.735604 -0.389858 -0.647369 0.220238
11 0.205766 -0.249702 -0.463639 -0.106032
In [8]: pd.rolling_corr(series, frame, window=3)
Out[8]:
0 1 2 3
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 0.496697 -0.957551 -0.673210 -0.849874
3 0.886848 -0.937174 -0.479519 -0.505008
4 -0.180454 -0.950213 0.331308 0.987414
5 -0.998852 -0.770988 0.582625 0.821079
6 -0.849263 -0.142453 -0.690959 0.805143
7 -0.617343 0.768797 0.299155 0.415997
8 0.930545 0.883782 -0.287360 -0.073551
9 0.917790 -0.171220 -0.993951 -0.207630
10 0.916901 -0.246603 -0.990313 0.862856
11 0.426314 -0.876191 -0.643768 -0.225983
In [10]: series.corr(frame)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-10-599dbd7f0707> in <module>()
----> 1 series.corr(frame)
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/series.py in corr(self, other, method, min_periods)
1280 correlation : float
1281 """
-> 1282 this, other = self.align(other, join='inner', copy=False)
1283 if len(this) == 0:
1284 return np.nan
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/generic.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis)
3372 copy=copy, fill_value=fill_value,
3373 method=method, limit=limit,
-> 3374 fill_axis=fill_axis)
3375 elif isinstance(other, Series):
3376 return self._align_series(other, join=join, axis=axis, level=level,
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/generic.py in _align_frame(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis)
3396
3397 if axis is None or axis == 1:
-> 3398 if not self.columns.equals(other.columns):
3399 join_columns, clidx, cridx = \
3400 self.columns.join(other.columns, how=join, level=level,
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/generic.py in __getattr__(self, name)
2143 or name in self._metadata
2144 or name in self._accessors):
-> 2145 return object.__getattribute__(self, name)
2146 else:
2147 if name in self._info_axis:
AttributeError: 'Series' object has no attribute 'columns'
我可以做到,但它不是矢量化的,也不那么优雅:
In [11]: pd.Series({col:series.corr(frame[col]) for col in frame})
Out[11]:
0 0.286678
1 -0.438003
2 -0.011778
3 -0.387740
dtype: float64
您可以使用 corrwith
:
>>> frame.corrwith(series)
0 0.399534
1 0.321166
2 -0.101875
3 0.604326
dtype: float64
A related method corrwith is implemented on DataFrame to compute the correlation between like-labeled Series contained in different DataFrame objects.