以两个及更多后续 pandas 行为条件(不仅仅是分组计算)
Condition on two and more subsequent pandas rows (not just grouped calculations)
我有 df,其中包含学生姓名、his/her 分数、class 职称和考试日期。
我需要添加一个如图所示的列,表示学生的成绩是否提高(3-4 个条件标记,如“分数增加”、“分数减少”、“相等”或“初始成绩”)。
我已经根据这个对 df 进行了排序,现在需要比较行和下一个中的一些条件,如果全部为真,则应该 return 一个标记。
有没有一种有效的方法来做到这一点(我的实际 table 将包含 1m 行,这就是为什么它不应该消耗内存)?
提前谢谢你?
df=pd.DataFrame({"score":[10,20,15,10,20,30],
"student":['John', 'Alex', "John", "John", "Alex", "John"],
"class":['english', 'math', "english",'math','math', 'english'],
"date":['01/01/2022','02/01/2022', '05/01/2022', '17/02/2022', '02/01/2022', '03/01/2022']})
df=df.sort_values(['student','class', 'date'])
使用 groupby
和 diff()
获取分数变化,然后使用 numpy.select
:
分配值
import numpy as np
changes = df.groupby(["student","class"], sort=False)["score"].diff()
df["progress"] = np.select([changes.eq(0),changes.gt(0),changes.lt(0)],
["equal score","score increased","score decreased"],
"initial")
>>> df
score student class date progress
1 20 Alex math 02/01/2022 initial
4 20 Alex math 02/01/2022 equal score
0 10 John english 01/01/2022 initial
5 30 John english 03/01/2022 score increased
2 15 John english 05/01/2022 score decreased
3 10 John math 17/02/2022 initial
您可以使用 groupby.diff
to compute the difference, then numpy.sign
to get the sign and map
the texts you want. Use fillna
作为默认值(“初始”):
df['progress'] = (np.sign(df.groupby(['student', 'class'])
['score'].diff())
.map({0: 'equal', 1: 'increases', -1: 'decreases'})
.fillna('initial')
)
输出:
score student class date progress
1 20 Alex math 02/01/2022 initial
4 20 Alex math 02/01/2022 equal
0 10 John english 01/01/2022 initial
5 30 John english 03/01/2022 increases
2 15 John english 05/01/2022 decreases
3 10 John math 17/02/2022 initial
这是我使用的渐进式方法
df['RN'] = df.sort_values(['date'], ascending=[True]).groupby(['student', 'class']).cumcount() + 1
#df.sort_values(['student', 'RN']) #To visually see progress of student before changes
df['Progress'] = df['RN'].apply(lambda x : str(x).replace('1', 'initial'))
df = df.sort_values(['student', 'RN'])
df['score_shift'] = df['score'].shift()
df['score_shift'].fillna(0, inplace = True)
df['score_shift'] = df['score_shift'].astype(int)
condlist = [df['Progress'] == 'initial', df['score_shift'] == df['score'], df['score_shift'] > df['score'], df['score_shift'] < df['score']]
choicelist = ['initial', 'equal', 'decrease', 'increase']
df['Progress'] = np.select(condlist, choicelist)
df
我有 df,其中包含学生姓名、his/her 分数、class 职称和考试日期。 我需要添加一个如图所示的列,表示学生的成绩是否提高(3-4 个条件标记,如“分数增加”、“分数减少”、“相等”或“初始成绩”)。 我已经根据这个对 df 进行了排序,现在需要比较行和下一个中的一些条件,如果全部为真,则应该 return 一个标记。 有没有一种有效的方法来做到这一点(我的实际 table 将包含 1m 行,这就是为什么它不应该消耗内存)? 提前谢谢你?
df=pd.DataFrame({"score":[10,20,15,10,20,30],
"student":['John', 'Alex', "John", "John", "Alex", "John"],
"class":['english', 'math', "english",'math','math', 'english'],
"date":['01/01/2022','02/01/2022', '05/01/2022', '17/02/2022', '02/01/2022', '03/01/2022']})
df=df.sort_values(['student','class', 'date'])
使用 groupby
和 diff()
获取分数变化,然后使用 numpy.select
:
import numpy as np
changes = df.groupby(["student","class"], sort=False)["score"].diff()
df["progress"] = np.select([changes.eq(0),changes.gt(0),changes.lt(0)],
["equal score","score increased","score decreased"],
"initial")
>>> df
score student class date progress
1 20 Alex math 02/01/2022 initial
4 20 Alex math 02/01/2022 equal score
0 10 John english 01/01/2022 initial
5 30 John english 03/01/2022 score increased
2 15 John english 05/01/2022 score decreased
3 10 John math 17/02/2022 initial
您可以使用 groupby.diff
to compute the difference, then numpy.sign
to get the sign and map
the texts you want. Use fillna
作为默认值(“初始”):
df['progress'] = (np.sign(df.groupby(['student', 'class'])
['score'].diff())
.map({0: 'equal', 1: 'increases', -1: 'decreases'})
.fillna('initial')
)
输出:
score student class date progress
1 20 Alex math 02/01/2022 initial
4 20 Alex math 02/01/2022 equal
0 10 John english 01/01/2022 initial
5 30 John english 03/01/2022 increases
2 15 John english 05/01/2022 decreases
3 10 John math 17/02/2022 initial
这是我使用的渐进式方法
df['RN'] = df.sort_values(['date'], ascending=[True]).groupby(['student', 'class']).cumcount() + 1
#df.sort_values(['student', 'RN']) #To visually see progress of student before changes
df['Progress'] = df['RN'].apply(lambda x : str(x).replace('1', 'initial'))
df = df.sort_values(['student', 'RN'])
df['score_shift'] = df['score'].shift()
df['score_shift'].fillna(0, inplace = True)
df['score_shift'] = df['score_shift'].astype(int)
condlist = [df['Progress'] == 'initial', df['score_shift'] == df['score'], df['score_shift'] > df['score'], df['score_shift'] < df['score']]
choicelist = ['initial', 'equal', 'decrease', 'increase']
df['Progress'] = np.select(condlist, choicelist)
df