在没有任何聚合的情况下在数据框中传播特定列?
Spread specific columns in dataframe without any aggregation?
这是我的玩具 df:
{'id': {0: 1089577, 1: 1089577, 2: 1089577, 3: 1089577, 4: 1089577},
'title': {0: 'Hungarian Goulash Stew',
1: 'Hungarian Goulash Stew',
2: 'Hungarian Goulash Stew',
3: 'Hungarian Goulash Stew',
4: 'Hungarian Goulash Stew'},
'readyInMinutes': {0: 120, 1: 120, 2: 120, 3: 120, 4: 120},
'nutrients.amount': {0: 323.18, 1: 15.14, 2: 4.43, 3: 38.95, 4: 34.64},
'nutrients.name': {0: 'Calories',
1: 'Fat',
2: 'Saturated Fat',
3: 'Carbohydrates',
4: 'Net Carbohydrates'},
'nutrients.percentOfDailyNeeds': {0: 16.16,
1: 23.3,
2: 27.69,
3: 12.98,
4: 12.6},
'nutrients.title': {0: 'Calories',
1: 'Fat',
2: 'Saturated Fat',
3: 'Carbohydrates',
4: 'Net Carbohydrates'},
'nutrients.unit': {0: 'kcal', 1: 'g', 2: 'g', 3: 'g', 4: 'g'}}
我想将 nutrients.title
展开为专栏。 Sp 我将得到 Fat, Saturated Fat ... 列及其相应的值,没有任何聚合。
什么函数可以在没有任何聚合的情况下做到这一点?只是“重塑”而已。
我希望它是:
我怎样才能这样“传播”它?
试试 pivot_table:
# Rename Columns
df.columns = df.columns.map(lambda x: f".{x.split('.')[-1]}" if '.' in x else x)
# Create Pivot Table
df = df.pivot_table(
index=['id', 'title', 'readyInMinutes'],
columns=['.title'],
values=['.amount',
'.percentOfDailyNeeds',
'.unit'],
aggfunc='first'
).reset_index() \
.swaplevel(0, 1, axis=1)
# Re-Order Columns So that nutrients.title are grouped
df = df.reindex(sorted(df.columns), axis=1)
# Reduce Levels by join
df.columns = df.columns.map(''.join)
print(df.to_string(index=False))
输出:
id readyInMinutes title Calories.amount Calories.percentOfDailyNeeds Calories.unit Carbohydrates.amount Carbohydrates.percentOfDailyNeeds Carbohydrates.unit Fat.amount Fat.percentOfDailyNeeds Fat.unit Net Carbohydrates.amount Net Carbohydrates.percentOfDailyNeeds Net Carbohydrates.unit Saturated Fat.amount Saturated Fat.percentOfDailyNeeds Saturated Fat.unit
1089577 120 Hungarian Goulash Stew 323.18 16.16 kcal 38.95 12.98 g 15.14 23.3 g 34.64 12.6 g 4.43 27.69 g
具有删节输出的步骤
- 更改列名称:
print(df.columns.values)
# ['id' 'title' 'readyInMinutes' 'nutrients.amount' 'nutrients.name'
# 'nutrients.percentOfDailyNeeds' 'nutrients.title' 'nutrients.unit']
print(df.columns.map(lambda x: f".{x.split('.')[-1]}" if '.' in x else x).values)
# ['id' 'title' 'readyInMinutes' '.amount' '.name' '.percentOfDailyNeeds'
# '.title' '.unit']
- 使用单个 header 列对多个值列进行透视以创建 multi-level 列索引:
print(df.pivot_table(
index=['id', 'title', 'readyInMinutes'],
columns=['.title'],
values=['.amount',
'.percentOfDailyNeeds',
'.unit'],
aggfunc='first'
).to_string())
.amount
.title Calories Carbohydrates Fat Net Carbohydrates Saturated Fat
id title readyInMinutes
1089577 Hungarian Goulash Stew 120 323.18 38.95 15.14 34.64 4.43
- 修复索引和交换级别,使标签位于顶部(
Calories
、Carbohydrates
等)
.reset_index().swaplevel(0, 1, axis=1)
.title Calories Carbohydrates Fat Net Carbohydrates Saturated Fat
id title readyInMinutes .amount .amount .amount .amount .amount
0 1089577 Hungarian Goulash Stew 120 323.18 38.95 15.14 34.64 4.43
- 对列进行排序,使标签在一起:
df = df.reindex(sorted(df.columns), axis=1)
.title Calories Carbohydrates
id readyInMinutes title .amount .percentOfDailyNeeds .unit .amount .percentOfDailyNeeds .unit
0 1089577 120 Hungarian Goulash Stew 323.18 16.16 kcal 38.95 12.98 g
- 使用连接降低级别(创建
Calories.amount
、Calories.unit
等)
df.columns = df.columns.map(''.join)
id readyInMinutes title Calories.amount Calories.percentOfDailyNeeds Calories.unit
0 1089577 120 Hungarian Goulash Stew 323.18 16.16 kcal
您可以按如下方式使用df.pivot()
:
(df.pivot(index=['id', 'title', 'readyInMinutes'],
columns='nutrients.title',
values='nutrients.amount')
.rename_axis(None, axis=1)
).reset_index()
结果:
id title readyInMinutes Calories Carbohydrates Fat Net Carbohydrates Saturated Fat
0 1089577 Hungarian Goulash Stew 120 323.18 38.95 15.14 34.64 4.43
这是我的玩具 df:
{'id': {0: 1089577, 1: 1089577, 2: 1089577, 3: 1089577, 4: 1089577},
'title': {0: 'Hungarian Goulash Stew',
1: 'Hungarian Goulash Stew',
2: 'Hungarian Goulash Stew',
3: 'Hungarian Goulash Stew',
4: 'Hungarian Goulash Stew'},
'readyInMinutes': {0: 120, 1: 120, 2: 120, 3: 120, 4: 120},
'nutrients.amount': {0: 323.18, 1: 15.14, 2: 4.43, 3: 38.95, 4: 34.64},
'nutrients.name': {0: 'Calories',
1: 'Fat',
2: 'Saturated Fat',
3: 'Carbohydrates',
4: 'Net Carbohydrates'},
'nutrients.percentOfDailyNeeds': {0: 16.16,
1: 23.3,
2: 27.69,
3: 12.98,
4: 12.6},
'nutrients.title': {0: 'Calories',
1: 'Fat',
2: 'Saturated Fat',
3: 'Carbohydrates',
4: 'Net Carbohydrates'},
'nutrients.unit': {0: 'kcal', 1: 'g', 2: 'g', 3: 'g', 4: 'g'}}
我想将 nutrients.title
展开为专栏。 Sp 我将得到 Fat, Saturated Fat ... 列及其相应的值,没有任何聚合。
什么函数可以在没有任何聚合的情况下做到这一点?只是“重塑”而已。
我希望它是:
我怎样才能这样“传播”它?
试试 pivot_table:
# Rename Columns
df.columns = df.columns.map(lambda x: f".{x.split('.')[-1]}" if '.' in x else x)
# Create Pivot Table
df = df.pivot_table(
index=['id', 'title', 'readyInMinutes'],
columns=['.title'],
values=['.amount',
'.percentOfDailyNeeds',
'.unit'],
aggfunc='first'
).reset_index() \
.swaplevel(0, 1, axis=1)
# Re-Order Columns So that nutrients.title are grouped
df = df.reindex(sorted(df.columns), axis=1)
# Reduce Levels by join
df.columns = df.columns.map(''.join)
print(df.to_string(index=False))
输出:
id readyInMinutes title Calories.amount Calories.percentOfDailyNeeds Calories.unit Carbohydrates.amount Carbohydrates.percentOfDailyNeeds Carbohydrates.unit Fat.amount Fat.percentOfDailyNeeds Fat.unit Net Carbohydrates.amount Net Carbohydrates.percentOfDailyNeeds Net Carbohydrates.unit Saturated Fat.amount Saturated Fat.percentOfDailyNeeds Saturated Fat.unit 1089577 120 Hungarian Goulash Stew 323.18 16.16 kcal 38.95 12.98 g 15.14 23.3 g 34.64 12.6 g 4.43 27.69 g
具有删节输出的步骤
- 更改列名称:
print(df.columns.values)
# ['id' 'title' 'readyInMinutes' 'nutrients.amount' 'nutrients.name'
# 'nutrients.percentOfDailyNeeds' 'nutrients.title' 'nutrients.unit']
print(df.columns.map(lambda x: f".{x.split('.')[-1]}" if '.' in x else x).values)
# ['id' 'title' 'readyInMinutes' '.amount' '.name' '.percentOfDailyNeeds'
# '.title' '.unit']
- 使用单个 header 列对多个值列进行透视以创建 multi-level 列索引:
print(df.pivot_table(
index=['id', 'title', 'readyInMinutes'],
columns=['.title'],
values=['.amount',
'.percentOfDailyNeeds',
'.unit'],
aggfunc='first'
).to_string())
.amount .title Calories Carbohydrates Fat Net Carbohydrates Saturated Fat id title readyInMinutes 1089577 Hungarian Goulash Stew 120 323.18 38.95 15.14 34.64 4.43
- 修复索引和交换级别,使标签位于顶部(
Calories
、Carbohydrates
等).reset_index().swaplevel(0, 1, axis=1)
.title Calories Carbohydrates Fat Net Carbohydrates Saturated Fat id title readyInMinutes .amount .amount .amount .amount .amount 0 1089577 Hungarian Goulash Stew 120 323.18 38.95 15.14 34.64 4.43
- 对列进行排序,使标签在一起:
df = df.reindex(sorted(df.columns), axis=1)
.title Calories Carbohydrates id readyInMinutes title .amount .percentOfDailyNeeds .unit .amount .percentOfDailyNeeds .unit 0 1089577 120 Hungarian Goulash Stew 323.18 16.16 kcal 38.95 12.98 g
- 使用连接降低级别(创建
Calories.amount
、Calories.unit
等)
df.columns = df.columns.map(''.join)
id readyInMinutes title Calories.amount Calories.percentOfDailyNeeds Calories.unit 0 1089577 120 Hungarian Goulash Stew 323.18 16.16 kcal
您可以按如下方式使用df.pivot()
:
(df.pivot(index=['id', 'title', 'readyInMinutes'],
columns='nutrients.title',
values='nutrients.amount')
.rename_axis(None, axis=1)
).reset_index()
结果:
id title readyInMinutes Calories Carbohydrates Fat Net Carbohydrates Saturated Fat
0 1089577 Hungarian Goulash Stew 120 323.18 38.95 15.14 34.64 4.43