在 Pandas 中重塑 GroupBy,如果缺失则用 nan 填充
Reshape GroupBy in Pandas and pad with nan if missing
给定一个数据框,每个组中包含不同数量的元素('groupby' 由某个变量决定),我需要重新整形为具有预定义列数的矩阵。例如:
summary_x participant_id_x response_date cuts
0 3.0 11 2016-05-05 a
1 3.0 11 2016-05-06 a
2 4.0 11 2016-05-07 a
3 4.0 11 2016-05-08 a
4 3.0 11 2016-05-09 a
5 3.0 11 2016-05-10 a
6 3.0 11 2016-05-11 a
7 3.0 11 2016-05-12 a
8 3.0 11 2016-05-13 a
9 3.0 11 2016-05-14 a
13 4.0 11 2016-05-22 b
14 4.0 11 2016-05-23 b
15 3.0 11 2016-05-24 b
16 3.0 11 2016-05-25 b
17 3.0 11 2016-05-26 b
18 3.0 11 2016-05-27 b
19 3.0 11 2016-05-28 b
20 3.0 11 2016-06-02 c
21 3.0 11 2016-06-03 c
22 3.0 11 2016-06-04 c
23 3.0 11 2016-06-05 c
24 3.0 11 2016-06-06 c
25 3.0 11 2016-06-07 c
26 3.0 11 2016-06-08 c
27 3.0 11 2016-06-09 c
28 3.0 11 2016-06-10 c
29 5.0 11 2016-06-11 c
每个组(by'cuts')包含 10 个元素,但组 'b' 仅包含 7 个。我想将 'summary_x' 中的矩阵重塑为 (3,10) ,其中缺失值将用 nans:
填充
pd.DataFrame(df.summary_x.values.reshape((-1,10)))
0 1 2 3 4 5 6 7 8 9
0 3.0 3.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0
1 nan nan nan 4.0 4.0 3.0 3.0 3.0 3.0 3.0
2 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 5.0
有什么解决办法吗?
您可以使用 cumcount
和 [::-1]
来更改列(行)的顺序:
g = df.groupby('cuts').cumcount(ascending=False)
df = pd.pivot(index=df['cuts'], columns=g, values=df['summary_x']).iloc[:,::-1]
.reset_index(drop=True)
df.columns = np.arange(len(df.columns))
print (df)
0 1 2 3 4 5 6 7 8 9
0 3.0 3.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0
1 NaN NaN NaN 4.0 4.0 3.0 3.0 3.0 3.0 3.0
2 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 5.0
另一个解决方案:
L = df[::-1].groupby('cuts')['summary_x'].apply(list).values.tolist()
df = pd.DataFrame(L).iloc[:, ::-1]
df.columns = np.arange(len(df.columns))
print (df)
0 1 2 3 4 5 6 7 8 9
0 3.0 3.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0
1 NaN NaN NaN 4.0 4.0 3.0 3.0 3.0 3.0 3.0
2 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 5.0
但是如果NaN
s可以到底还:
g = df.groupby('cuts').cumcount()
df = pd.pivot(index=df['cuts'], columns=g, values=df['summary_x']).reset_index(drop=True)
print (df)
0 1 2 3 4 5 6 7 8 9
0 3.0 3.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0
1 4.0 4.0 3.0 3.0 3.0 3.0 3.0 NaN NaN NaN
2 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 5.0
给定一个数据框,每个组中包含不同数量的元素('groupby' 由某个变量决定),我需要重新整形为具有预定义列数的矩阵。例如:
summary_x participant_id_x response_date cuts
0 3.0 11 2016-05-05 a
1 3.0 11 2016-05-06 a
2 4.0 11 2016-05-07 a
3 4.0 11 2016-05-08 a
4 3.0 11 2016-05-09 a
5 3.0 11 2016-05-10 a
6 3.0 11 2016-05-11 a
7 3.0 11 2016-05-12 a
8 3.0 11 2016-05-13 a
9 3.0 11 2016-05-14 a
13 4.0 11 2016-05-22 b
14 4.0 11 2016-05-23 b
15 3.0 11 2016-05-24 b
16 3.0 11 2016-05-25 b
17 3.0 11 2016-05-26 b
18 3.0 11 2016-05-27 b
19 3.0 11 2016-05-28 b
20 3.0 11 2016-06-02 c
21 3.0 11 2016-06-03 c
22 3.0 11 2016-06-04 c
23 3.0 11 2016-06-05 c
24 3.0 11 2016-06-06 c
25 3.0 11 2016-06-07 c
26 3.0 11 2016-06-08 c
27 3.0 11 2016-06-09 c
28 3.0 11 2016-06-10 c
29 5.0 11 2016-06-11 c
每个组(by'cuts')包含 10 个元素,但组 'b' 仅包含 7 个。我想将 'summary_x' 中的矩阵重塑为 (3,10) ,其中缺失值将用 nans:
填充pd.DataFrame(df.summary_x.values.reshape((-1,10)))
0 1 2 3 4 5 6 7 8 9
0 3.0 3.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0
1 nan nan nan 4.0 4.0 3.0 3.0 3.0 3.0 3.0
2 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 5.0
有什么解决办法吗?
您可以使用 cumcount
和 [::-1]
来更改列(行)的顺序:
g = df.groupby('cuts').cumcount(ascending=False)
df = pd.pivot(index=df['cuts'], columns=g, values=df['summary_x']).iloc[:,::-1]
.reset_index(drop=True)
df.columns = np.arange(len(df.columns))
print (df)
0 1 2 3 4 5 6 7 8 9
0 3.0 3.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0
1 NaN NaN NaN 4.0 4.0 3.0 3.0 3.0 3.0 3.0
2 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 5.0
另一个解决方案:
L = df[::-1].groupby('cuts')['summary_x'].apply(list).values.tolist()
df = pd.DataFrame(L).iloc[:, ::-1]
df.columns = np.arange(len(df.columns))
print (df)
0 1 2 3 4 5 6 7 8 9
0 3.0 3.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0
1 NaN NaN NaN 4.0 4.0 3.0 3.0 3.0 3.0 3.0
2 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 5.0
但是如果NaN
s可以到底还:
g = df.groupby('cuts').cumcount()
df = pd.pivot(index=df['cuts'], columns=g, values=df['summary_x']).reset_index(drop=True)
print (df)
0 1 2 3 4 5 6 7 8 9
0 3.0 3.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0
1 4.0 4.0 3.0 3.0 3.0 3.0 3.0 NaN NaN NaN
2 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 5.0