基于一列作为变量,两列作为值,将 pandas df 从长转换为宽
Convert pandas df from long to wide based on one column as variable and two columns for values
我有一个数据框
{'Author': {0: 1, 1: 1, 2: 2, 3: 2},
'Article': {0: 11, 1: 11, 2: 22, 3: 22},
'Year': {0: 2017, 1: 2018, 2: 2017, 3: 2018},
'First': {0: 1, 1: 0, 2: 0, 3: 0},
'Second': {0: 0, 1: 1, 2: 1, 3: 1}}
想要为 Year
从长转换为宽,创建基于 First
和 Second
的值列。
预期输出
Author Article Year First Second First_2017 First_2018 Second_2017 Second_2018
1 11 2017 1 0 1 0 0 1
1 12 2018 0 1 1 0 0 1
2 22 2017 0 0 0 0 0 1
2 23 2018 0 1 0 0 0 1
如果需要测试是否在 ['First','Second']
列中至少存在一个 1
,请使用 DataFrame.pivot_table
和 any
,展平 MultiIndex
并附加到原始:
df1 = df.pivot_table(index='Author',
columns='Year',
values=['First','Second'],
aggfunc='any')
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df = df.join(df1.astype(int), on='Author')
print (df)
Author Article Year First Second First_2017 First_2018 Second_2017 \
0 1 11 2017 1 0 1 0 0
1 1 11 2018 0 1 1 0 0
2 2 22 2017 0 1 0 0 1
3 2 22 2018 0 1 0 0 1
Second_2018
0 1
1 1
2 1
3 1
df2 = df.pivot(index=['Author', 'Article'], columns='Year')
df2.columns = df2.columns.map(lambda x: '_'.join(map(str, x)))
df.merge(df2, left_on=['Author', 'Article'], right_index=True)
输出:
Author Article Year First Second First_2017 First_2018 Second_2017 Second_2018
0 1 11 2017 1 0 1 0 0 1
1 1 11 2018 0 1 1 0 0 1
2 2 22 2017 0 1 0 0 1 1
3 2 22 2018 0 1 0 0 1 1
我有一个数据框
{'Author': {0: 1, 1: 1, 2: 2, 3: 2},
'Article': {0: 11, 1: 11, 2: 22, 3: 22},
'Year': {0: 2017, 1: 2018, 2: 2017, 3: 2018},
'First': {0: 1, 1: 0, 2: 0, 3: 0},
'Second': {0: 0, 1: 1, 2: 1, 3: 1}}
想要为 Year
从长转换为宽,创建基于 First
和 Second
的值列。
预期输出
Author Article Year First Second First_2017 First_2018 Second_2017 Second_2018
1 11 2017 1 0 1 0 0 1
1 12 2018 0 1 1 0 0 1
2 22 2017 0 0 0 0 0 1
2 23 2018 0 1 0 0 0 1
如果需要测试是否在 ['First','Second']
列中至少存在一个 1
,请使用 DataFrame.pivot_table
和 any
,展平 MultiIndex
并附加到原始:
df1 = df.pivot_table(index='Author',
columns='Year',
values=['First','Second'],
aggfunc='any')
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df = df.join(df1.astype(int), on='Author')
print (df)
Author Article Year First Second First_2017 First_2018 Second_2017 \
0 1 11 2017 1 0 1 0 0
1 1 11 2018 0 1 1 0 0
2 2 22 2017 0 1 0 0 1
3 2 22 2018 0 1 0 0 1
Second_2018
0 1
1 1
2 1
3 1
df2 = df.pivot(index=['Author', 'Article'], columns='Year')
df2.columns = df2.columns.map(lambda x: '_'.join(map(str, x)))
df.merge(df2, left_on=['Author', 'Article'], right_index=True)
输出:
Author Article Year First Second First_2017 First_2018 Second_2017 Second_2018
0 1 11 2017 1 0 1 0 0 1
1 1 11 2018 0 1 1 0 0 1
2 2 22 2017 0 1 0 0 1 1
3 2 22 2018 0 1 0 0 1 1