如何用 python pandas 分组并计算新字段?
How to groupby and calculate new field with python pandas?
我想按名为 'Fruit' 的数据框中的特定列进行分组,并计算 'Good'
的特定水果的百分比
查看下面我的初始数据框
import pandas as pd
df = pd.DataFrame({'Fruit': ['Apple','Apple','Banana'], 'Condition': ['Good','Bad','Good']})
数据框
Fruit Condition
0 Apple Good
1 Apple Bad
2 Banana Good
下面是我想要的输出数据框
Fruit Percentage
0 Apple 50%
1 Banana 100%
注意:因为有 1 个“好”苹果和 1 个“坏”苹果,所以好苹果的百分比是 50%。
下面是我覆盖所有列的尝试
groupedDF = df.groupby('Fruit')
groupedDF.apply(lambda x: x[(x['Condition'] == 'Good')].count()/x.count())
请参阅下面的结果 table,它似乎是在现有列而不是新列中计算百分比:
Fruit Condition
Fruit
Apple 0.5 0.5
Banana 1.0 1.0
我们可以比较 Condition
和 eq
and take advantage of the fact that True
is (1) and False
is (0) when processed as numbers and take the groupby mean
而不是 Fruits
:
new_df = (
df['Condition'].eq('Good').groupby(df['Fruit']).mean().reset_index()
)
new_df
:
Fruit Condition
0 Apple 0.5
1 Banana 1.0
我们可以进一步 map
to a format string and rename
将输出变成显示的所需输出:
new_df = (
df['Condition'].eq('Good')
.groupby(df['Fruit']).mean()
.map('{:.0%}'.format) # Change to Percent Format
.rename('Percentage') # Rename Column to Percentage
.reset_index() # Restore RangeIndex and make Fruit a Column
)
new_df
:
Fruit Percentage
0 Apple 50%
1 Banana 100%
*自然也可以进行进一步的操作。
我想按名为 'Fruit' 的数据框中的特定列进行分组,并计算 'Good'
的特定水果的百分比查看下面我的初始数据框
import pandas as pd
df = pd.DataFrame({'Fruit': ['Apple','Apple','Banana'], 'Condition': ['Good','Bad','Good']})
数据框
Fruit Condition
0 Apple Good
1 Apple Bad
2 Banana Good
下面是我想要的输出数据框
Fruit Percentage
0 Apple 50%
1 Banana 100%
注意:因为有 1 个“好”苹果和 1 个“坏”苹果,所以好苹果的百分比是 50%。
下面是我覆盖所有列的尝试
groupedDF = df.groupby('Fruit')
groupedDF.apply(lambda x: x[(x['Condition'] == 'Good')].count()/x.count())
请参阅下面的结果 table,它似乎是在现有列而不是新列中计算百分比:
Fruit Condition
Fruit
Apple 0.5 0.5
Banana 1.0 1.0
我们可以比较 Condition
和 eq
and take advantage of the fact that True
is (1) and False
is (0) when processed as numbers and take the groupby mean
而不是 Fruits
:
new_df = (
df['Condition'].eq('Good').groupby(df['Fruit']).mean().reset_index()
)
new_df
:
Fruit Condition
0 Apple 0.5
1 Banana 1.0
我们可以进一步 map
to a format string and rename
将输出变成显示的所需输出:
new_df = (
df['Condition'].eq('Good')
.groupby(df['Fruit']).mean()
.map('{:.0%}'.format) # Change to Percent Format
.rename('Percentage') # Rename Column to Percentage
.reset_index() # Restore RangeIndex and make Fruit a Column
)
new_df
:
Fruit Percentage
0 Apple 50%
1 Banana 100%
*自然也可以进行进一步的操作。