序列与 Pandas 中 DataFrame 的每一列的相关性,矢量化

Correlation of a Series to each column of a DataFrame in Pandas, vectorized

是否可以以向量化的方式计算 Series 与 DataFrame 中每一列的相关性?这适用于滚动相关和 EWM 相关,但不适用于普通相关。

例如:

In [3]: series = pd.Series(pd.np.random.rand(12))

In [4]: frame = pd.DataFrame(pd.np.random.rand(12,4))

In [7]: pd.ewmcorr(series, frame, span=3)
Out[7]: 
           0         1         2         3
0        NaN       NaN       NaN       NaN
1  -1.000000 -1.000000  1.000000  1.000000
2   0.644915 -0.980088 -0.802944 -0.922638
3   0.499564 -0.919574 -0.240631 -0.256109
4  -0.172139 -0.913296  0.482402 -0.282733
5  -0.394725 -0.693024  0.168029  0.177241
6  -0.219131 -0.475347  0.192552  0.149787
7  -0.461821  0.353778  0.538289 -0.005628
8   0.573406  0.681704 -0.491689  0.194916
9   0.655414 -0.079153 -0.464814 -0.331571
10  0.735604 -0.389858 -0.647369  0.220238
11  0.205766 -0.249702 -0.463639 -0.106032

In [8]: pd.rolling_corr(series, frame, window=3)
Out[8]: 
           0         1         2         3
0        NaN       NaN       NaN       NaN
1        NaN       NaN       NaN       NaN
2   0.496697 -0.957551 -0.673210 -0.849874
3   0.886848 -0.937174 -0.479519 -0.505008
4  -0.180454 -0.950213  0.331308  0.987414
5  -0.998852 -0.770988  0.582625  0.821079
6  -0.849263 -0.142453 -0.690959  0.805143
7  -0.617343  0.768797  0.299155  0.415997
8   0.930545  0.883782 -0.287360 -0.073551
9   0.917790 -0.171220 -0.993951 -0.207630
10  0.916901 -0.246603 -0.990313  0.862856
11  0.426314 -0.876191 -0.643768 -0.225983

In [10]: series.corr(frame)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-599dbd7f0707> in <module>()
----> 1 series.corr(frame)

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/series.py in corr(self, other, method, min_periods)
   1280         correlation : float
   1281         """
-> 1282         this, other = self.align(other, join='inner', copy=False)
   1283         if len(this) == 0:
   1284             return np.nan

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/generic.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis)
   3372                                      copy=copy, fill_value=fill_value,
   3373                                      method=method, limit=limit,
-> 3374                                      fill_axis=fill_axis)
   3375         elif isinstance(other, Series):
   3376             return self._align_series(other, join=join, axis=axis, level=level,

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/generic.py in _align_frame(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis)
   3396 
   3397         if axis is None or axis == 1:
-> 3398             if not self.columns.equals(other.columns):
   3399                 join_columns, clidx, cridx = \
   3400                     self.columns.join(other.columns, how=join, level=level,

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/generic.py in __getattr__(self, name)
   2143                 or name in self._metadata
   2144                 or name in self._accessors):
-> 2145             return object.__getattribute__(self, name)
   2146         else:
   2147             if name in self._info_axis:

AttributeError: 'Series' object has no attribute 'columns'

我可以做到,但它不是矢量化的,也不那么优雅:

In [11]: pd.Series({col:series.corr(frame[col]) for col in frame})
Out[11]: 
0    0.286678
1   -0.438003
2   -0.011778
3   -0.387740
dtype: float64

您可以使用 corrwith:

>>> frame.corrwith(series)
0    0.399534
1    0.321166
2   -0.101875
3    0.604326
dtype: float64

A related method corrwith is implemented on DataFrame to compute the correlation between like-labeled Series contained in different DataFrame objects.