在 pandas 中重塑 table

Question

我想在 pandas 中重塑一个 table 。我有一个 table 的形式：

date | country |state | population | num_cars
1    | c1      | s1   | 1          | 1
2    | c1      | s1   | 1          | 1
1    | c1      | s2   | 1          | 1
.
2    | c2      | s2   | 1          | 2
2    | c2      | s2   | 1          | 2

我想把它变成这个形状：

date |1_population | c1_s1_population | c1_s2_population...| c2_s1_populationc1_num_cars |c2_11_num_cars...

解释一下，初始数据有按国家/地区划分的流行音乐和数字，日期范围内的状态。现在我想为每个级别（国家，国家/地区）转换成多列时间序列

我该怎么做？

Answer 1

作为源数据样本，我使用了一个带有 2 个假设的 DataFrame 国家，每个 3 个州：

    date country state  population  num_cars
0   1990     Xxx   Aaa         100        15
1   2010     Xxx   Aaa         120        18
2   1990     Xxx   Bbb          80         9
3   2010     Xxx   Bbb          88        11
4   1990     Xxx   Ccc          75         6
5   2010     Xxx   Ccc          82         8
6   1990     Yyy   Ggg          40         5
7   2010     Yyy   Ggg          50         6
8   1990     Yyy   Hhh          30         3
9   2010     Yyy   Hhh          38         4
10  1990     Yyy   Jjj          29         3
11  2010     Yyy   Jjj          35         4

要解决您的问题，请从定义重新格式化函数开始：

def reformat(grp, col):
    pop = grp[col]
    pop.name = grp.date.iloc[0]
    return pop

从一组行 (grp) 中获取特定名称的列 (col)，从第一行（分组键）开始将名称设置为 date 并且 returns它。

作为初始步骤，将 df 分组 country 和 state:

gr = df.set_index(['country', 'state']).groupby('date')

然后计算 2 个数据帧，作为重新格式化的结果（应用将上述函数应用于每个组，对于感兴趣的两列：

df1 = gr.apply(reformat, col='population')
df2 = gr.apply(reformat, col='num_cars')

并且有两个部分结果，将它们合并到索引上：

pd.merge(df1, df2, left_index=True, right_index=True,
    suffixes=('_pop', '_cars'))

结果是：

country Xxx_pop         Yyy_pop         Xxx_cars         Yyy_cars        
state       Aaa Bbb Ccc     Ggg Hhh Jjj      Aaa Bbb Ccc      Ggg Hhh Jjj
date                                                                     
1990        100  80  75      40  30  29       15   9   6        5   3   3
2010        120  88  82      50  38  35       18  11   8        6   4   4

如您所见，列上 MultiIndex 的顶层是 "Country / population" 和 "Country / car No"。另一层包含州名。

要跟踪此解决方案的工作原理，请分别执行每个步骤并检查它的结果。

在 pandas 中重塑 table

Reshaping table in pandas

python

reshape

dataframe

pandas