如何在 pandas 中生成以下内容?
How do I generate the following in pandas?
我有一个像这样的数据框:
+----+----+--------+-------+--------+
| p | a | col1 | col2 | col3 |
+----+----+--------+-------+--------+
| p1 | a1 | MANGO1 | APPLE | GUAVA |
| p2 | a2 | MANGO2 | APPLE | GRAPES |
| p3 | a2 | MANGO1 | APPLE | ORANGE |
| p1 | a1 | MANGO2 | APPLE | KIWI |
| p2 | a2 | MANGO1 | APPLE | ORANGE |
+----+----+--------+-------+--------+
我想让它转换成:
+----+----+--------+--------+-------+-------+--------+--------+------+
| p | a | MANGO1 | MANGO2 | APPLE | GUAVA | GRAPES | ORANGE | KIWI |
+----+----+--------+--------+-------+-------+--------+--------+------+
| p1 | a1 | YES | YES | YES | YES | NO | NO | YES |
| p2 | a2 | YES | YES | YES | YES | YES | NO | NO |
| p3 | a2 | YES | NO | YES | NO | NO | YES | NO |
+----+----+--------+--------+-------+-------+--------+--------+------+
想法是按列 p 和 a 分组。转置其他列并将 YES NO 作为值。
您可以先使用 melt
来展平您的数据框,然后 pivot_table
来重塑您的数据框:
out = (df.melt(['p', 'a']).assign(variable='YES')
.pivot_table('variable', ['p', 'a'], 'value', fill_value='NO', aggfunc='first')
.rename_axis(columns=None).reset_index())
输出:
>>> out
p a APPLE GRAPES GUAVA KIWI MANGO1 MANGO2 ORANGE
0 p1 a1 YES NO YES YES YES YES NO
1 p2 a2 YES YES NO NO YES YES YES
2 p3 a2 YES NO NO NO YES NO YES
设置 MRE:
data = {'p': ['p1', 'p2', 'p3', 'p1', 'p2'],
'a': ['a1', 'a2', 'a2', 'a1', 'a2'],
'col1': ['MANGO1', 'MANGO2', 'MANGO1', 'MANGO2', 'MANGO1'],
'col2': ['APPLE', 'APPLE', 'APPLE', 'APPLE', 'APPLE'],
'col3': ['GUAVA', 'GRAPES', 'ORANGE', 'KIWI', 'ORANGE']}
df = pd.DataFrame(data)
我有一个像这样的数据框:
+----+----+--------+-------+--------+
| p | a | col1 | col2 | col3 |
+----+----+--------+-------+--------+
| p1 | a1 | MANGO1 | APPLE | GUAVA |
| p2 | a2 | MANGO2 | APPLE | GRAPES |
| p3 | a2 | MANGO1 | APPLE | ORANGE |
| p1 | a1 | MANGO2 | APPLE | KIWI |
| p2 | a2 | MANGO1 | APPLE | ORANGE |
+----+----+--------+-------+--------+
我想让它转换成:
+----+----+--------+--------+-------+-------+--------+--------+------+
| p | a | MANGO1 | MANGO2 | APPLE | GUAVA | GRAPES | ORANGE | KIWI |
+----+----+--------+--------+-------+-------+--------+--------+------+
| p1 | a1 | YES | YES | YES | YES | NO | NO | YES |
| p2 | a2 | YES | YES | YES | YES | YES | NO | NO |
| p3 | a2 | YES | NO | YES | NO | NO | YES | NO |
+----+----+--------+--------+-------+-------+--------+--------+------+
想法是按列 p 和 a 分组。转置其他列并将 YES NO 作为值。
您可以先使用 melt
来展平您的数据框,然后 pivot_table
来重塑您的数据框:
out = (df.melt(['p', 'a']).assign(variable='YES')
.pivot_table('variable', ['p', 'a'], 'value', fill_value='NO', aggfunc='first')
.rename_axis(columns=None).reset_index())
输出:
>>> out
p a APPLE GRAPES GUAVA KIWI MANGO1 MANGO2 ORANGE
0 p1 a1 YES NO YES YES YES YES NO
1 p2 a2 YES YES NO NO YES YES YES
2 p3 a2 YES NO NO NO YES NO YES
设置 MRE:
data = {'p': ['p1', 'p2', 'p3', 'p1', 'p2'],
'a': ['a1', 'a2', 'a2', 'a1', 'a2'],
'col1': ['MANGO1', 'MANGO2', 'MANGO1', 'MANGO2', 'MANGO1'],
'col2': ['APPLE', 'APPLE', 'APPLE', 'APPLE', 'APPLE'],
'col3': ['GUAVA', 'GRAPES', 'ORANGE', 'KIWI', 'ORANGE']}
df = pd.DataFrame(data)