Python - 添加分组模式作为原始数据集中的附加列
Python - Adding grouped mode as additional column in original dataset
所以我有类似这样的数据:
import pandas as pd
df = pd.DataFrame({'Order ID':[555,556,557,558,559,560,561,562,563,564,565,566],
'State':["MA","MA","MA","MA","MA","MA","CT","CT","CT","CT","CT","CT"],
'County':["Essex","Essex","Essex","Worcester","Worcester","Worcester","Bristol","Bristol","Bristol","Hartford","Hartford","Hartford"],
'AP':[50,50,75,100,100,125,150,150,175,200,200,225]})
但我需要添加一个列来显示按州和县分组的 AP 模式。我可以通过这种方式获得模式:
(df.groupby(['State', 'County']).AP.agg(Mode = (lambda x: x.value_counts().index[0])).reset_index().round(0))
我只是不确定如何将该数据添加到原始数据中,使其看起来像这样:
Order ID
State
County
AP
Mode
555
MA
Essex
50
50
556
MA
Essex
50
50
557
MA
Essex
75
50
558
MA
Worcester
100
100
559
MA
Worcester
100
100
560
MA
Worcester
125
100
561
CT
Bristol
150
150
562
CT
Bristol
150
150
563
CT
Bristol
175
150
564
CT
Hartford
200
200
565
CT
Hartford
200
200
566
CT
Hartford
225
200
对新列使用 GroupBy.transform
:
df['Mode'] = (df.groupby(['State', 'County']).AP
.transform(lambda x: x.value_counts().index[0]))
df['Mode'] = df.groupby(['State', 'County']).AP.transform(lambda x: x.mode().iat[0])
所以我有类似这样的数据:
import pandas as pd
df = pd.DataFrame({'Order ID':[555,556,557,558,559,560,561,562,563,564,565,566],
'State':["MA","MA","MA","MA","MA","MA","CT","CT","CT","CT","CT","CT"],
'County':["Essex","Essex","Essex","Worcester","Worcester","Worcester","Bristol","Bristol","Bristol","Hartford","Hartford","Hartford"],
'AP':[50,50,75,100,100,125,150,150,175,200,200,225]})
但我需要添加一个列来显示按州和县分组的 AP 模式。我可以通过这种方式获得模式:
(df.groupby(['State', 'County']).AP.agg(Mode = (lambda x: x.value_counts().index[0])).reset_index().round(0))
我只是不确定如何将该数据添加到原始数据中,使其看起来像这样:
Order ID | State | County | AP | Mode |
---|---|---|---|---|
555 | MA | Essex | 50 | 50 |
556 | MA | Essex | 50 | 50 |
557 | MA | Essex | 75 | 50 |
558 | MA | Worcester | 100 | 100 |
559 | MA | Worcester | 100 | 100 |
560 | MA | Worcester | 125 | 100 |
561 | CT | Bristol | 150 | 150 |
562 | CT | Bristol | 150 | 150 |
563 | CT | Bristol | 175 | 150 |
564 | CT | Hartford | 200 | 200 |
565 | CT | Hartford | 200 | 200 |
566 | CT | Hartford | 225 | 200 |
对新列使用 GroupBy.transform
:
df['Mode'] = (df.groupby(['State', 'County']).AP
.transform(lambda x: x.value_counts().index[0]))
df['Mode'] = df.groupby(['State', 'County']).AP.transform(lambda x: x.mode().iat[0])