使用另一个 DataFrame 创建或修改 DataFrame

Create or modify DataFrame using another DataFrame

我目前有一个 Pandas DataFrame 看起来像这样:

   DATESTAMP   price                name   pct_chg
0  2006-01-02  62.987301            a      0.000000
1  2006-01-03  61.990700            a     -0.015822
2  2006-01-04  62.987301            a      0.016077
3  2006-01-05  62.987301            a      0.000000
4  2006-01-06  61.990700            a     -0.015822
6  2006-01-04  100.1                b      0.000000
7  2006-01-05  100.5                b     -0.015822
8  2006-01-06  100.7                b      0.016077
9  2006-01-07  100.8                b      0.016090

问题在于不同的项目(用唯一列 name 指定)具有不同的起源时间以及不同的存活时间

我想在一个新的 DataFrame 中总结列 pct_chg,索引为 DATESTAMP,列为 name。我还希望新的 DataFrame 以这样一种方式拥有索引,即它以 "oldest" 现有日期记录(在本例中为 2006-01-02)开始并以 "newest"(在这种情况 2006-01-07).

结果看起来像

            a          b
2006-01-02  0.000000   NaN
2006-01-03  -0.015822  NaN
2006-01-04  0.016077   0.000000
2006-01-05  0.000000   -0.015822
2006-01-06  -0.015822  0.016077
2006-01-07  NaN        0.016090

您可以使用 set_index with unstack:

print (df.set_index(['DATESTAMP','name'])['pct_chg'].unstack())
name               a         b
DATESTAMP                     
2006-01-02  0.000000       NaN
2006-01-03 -0.015822       NaN
2006-01-04  0.016077  0.000000
2006-01-05  0.000000 -0.015822
2006-01-06 -0.015822  0.016077
2006-01-07       NaN  0.016090

pivot的另一个解决方案:

print (df.pivot(index='DATESTAMP', columns='name', values='pct_chg'))
name               a         b
DATESTAMP                     
2006-01-02  0.000000       NaN
2006-01-03 -0.015822       NaN
2006-01-04  0.016077  0.000000
2006-01-05  0.000000 -0.015822
2006-01-06 -0.015822  0.016077
2006-01-07       NaN  0.016090