如何使数据框表现得像 pandas_datareader

Question

如果您看到以下代码：

from pandas_datareader import data as web
import pandas as pd

stocks = 'f', 'fb'

df = web.DataReader(stocks,'yahoo')

结果 df 看起来像这样：

Attributes  Adj Close              Close  ...        Open      Volume            
Symbols             f          fb      f  ...          fb           f          fb
Date                                      ...                                    
2017-06-05   9.280543  153.630005  11.25  ...  153.639999  42558600.0  12520400.0
2017-06-06   9.173302  152.809998  11.12  ...  153.410004  44543700.0  13457100.0
2017-06-07   9.132055  153.119995  11.07  ...  153.270004  37344200.0  12066700.0
2017-06-08   9.156803  154.710007  11.10  ...  154.080002  40757400.0  17799400.0
2017-06-09   9.181552  149.600006  11.13  ...  154.770004  30285900.0  35577700.0
              ...         ...    ...  ...         ...         ...         ...
2022-05-27  13.630000  195.130005  13.63  ...  191.360001  54195700.0  22562700.0
2022-05-31  13.680000  193.639999  13.68  ...  194.889999  79689900.0  26131100.0
2022-06-01  13.550000  188.639999  13.55  ...  196.509995  50726200.0  36623500.0
2022-06-02  13.890000  198.860001  13.89  ...  188.449997  42979700.0  31951600.0
2022-06-03  13.500000  190.779999  13.50  ...  195.979996  43574400.0  19447300.0

[1260 rows x 12 columns]

如果您想查看 'f'

的收盘价

df['Close'].f
Out[17]: 
Date
2017-06-05    11.25
2017-06-06    11.12
2017-06-07    11.07
2017-06-08    11.10
2017-06-09    11.13
 
2022-05-27    13.63
2022-05-31    13.68
2022-06-01    13.55
2022-06-02    13.89
2022-06-03    13.50
Name: f, Length: 1260, dtype: float64

这个方法叫什么？例如，如果您有几个具有不同名称但列值相同的随机数数据帧；怎样才能将它们结合起来使其表现得像这样呢？

Answer 1

您看到的是一个数据框，其列具有多个级别（MultiIndex）。这些级别都可以有一个名称，在这种情况下似乎有名称（“属性”和“符号”），但也存在无名级别。

为了仔细观察，我会使用 print(df.columns)。

由于有两级列，以下也将起作用：df[('Close', 'f')] 即使用元组作为“完整列名”。如果您仔细查看 df.columns.

，您也会看到这些元组

我们可以使用 pd.concat 合并两个数据帧，并使用新的列级别进行合并。默认情况下，这成为最顶层，我们必须“反对”。


# Given dataframes a, b
# Concatenate in the column direction. Use keys to give the new
# column level names and and give the level itself the name Symbols.


(pd.concat([a, b], axis='columns', keys=pd.Index(["f", "fb"], name="Symbols"))
 # swap hierarchy order of column levels
 .swaplevel(-2, -1, axis=1)
 # restore sorting to that of a's columns - assuming a, b have the same cols
 .reindex(columns=a.columns, level=0)
)

您还可以查看 df.stack("Symbols")，它将符号级别向下移动到索引级别（如果需要，您可以重置该索引级别，将其保留为一列）。可以使用 stack/unstack 像这样来回移动，所以通过 unstack 的路径是达到相同目标的另一种方法。

如果 Symbol 是一列，您可以这样做：df.set_index("Symbol", append=True).unstack("Symbol") 将它变成另一个列级别。

如何使数据框表现得像 pandas_datareader

How to make dataframe behave such as pandas_datareader

pandas