宽转长后获取Pandas中特定index/key对应的值
Get corresponding value for particular index/key in Pandas after converting wide to long
我打算实施以下逻辑来获取学生分数。
查找得分超过 60 分的学生
然后根据 Subject Student Key 得到那个学生的分数!
输入数据
data = [['Maths', 100,80,20], ['Science', 80,20,10]]
df = pd.DataFrame(data, columns = ['Subject', 'Student A','Student B','Student C'])
df.set_index("Subject",inplace=True)
Student A Student B Student C
Subject
Maths 100 80 20
Science 80 20 10
让学生获得超过 60 分
df=df[df.gt(60)]
rank_df = df.rank(axis=0,method='average',pct=False,ascending=False)
marks_list = []
for i in range(0,len(rank_df)):
label_series = rank_df.iloc[i,:]
labels_notna = label_series.sort_values(ascending=True)[label_series.notna()].index
marks_list.append(",".join(labels_notna))
df['Student gt 60'] = marks_list
new_df = df['Student gt 60'].str.split(',', expand = True)
new_df.reset_index(inplace=True)
new_df.columns=["Subject","Top 1","Top 2"]
new_df = pd.melt(new_df, id_vars=['Subject'], value_name='Student')
data= new_df[["Subject","Student"]]
data.loc[~data["Student"].isna()]
Subject Student
0 Maths Student A
1 Science Student A
2 Maths Student B
我计划在同一数据框中为 Subject/Student 键获取相关分数,但无法计算出来。
要求输出:
Subject Student Score
0 Maths Student A 100
1 Maths Student B 80
2 Science Student A 80
谁能帮我指点一下!
我建议首先堆叠数据框以获得 MultiIndex 系列(第一级的主题和第二级的学生),然后索引该系列选择所有具有足够分数的学生:
df_stacked = df.stack()
df_stacked[df_stacked.gt(60)]
# Out:
# Subject
# Maths Student A 100
# Student B 80
# Science Student A 80
# dtype: int64
首先,按照您最终想要的方式定位数据:
vertical = df.unstack()
这给你:
Subject
Student A Maths 100
Science 80
Student B Maths 80
Science 20
Student C Maths 20
Science 10
然后简单地:
vertical[vertical > 60]
给你最终结果:
Subject
Student A Maths 100
Science 80
Student B Maths 80
您可以对其进行 reset_index()
处理,使其看起来更像您的示例输出。
我打算实施以下逻辑来获取学生分数。
查找得分超过 60 分的学生
然后根据 Subject Student Key 得到那个学生的分数!
输入数据
data = [['Maths', 100,80,20], ['Science', 80,20,10]]
df = pd.DataFrame(data, columns = ['Subject', 'Student A','Student B','Student C'])
df.set_index("Subject",inplace=True)
Student A Student B Student C
Subject
Maths 100 80 20
Science 80 20 10
让学生获得超过 60 分
df=df[df.gt(60)]
rank_df = df.rank(axis=0,method='average',pct=False,ascending=False)
marks_list = []
for i in range(0,len(rank_df)):
label_series = rank_df.iloc[i,:]
labels_notna = label_series.sort_values(ascending=True)[label_series.notna()].index
marks_list.append(",".join(labels_notna))
df['Student gt 60'] = marks_list
new_df = df['Student gt 60'].str.split(',', expand = True)
new_df.reset_index(inplace=True)
new_df.columns=["Subject","Top 1","Top 2"]
new_df = pd.melt(new_df, id_vars=['Subject'], value_name='Student')
data= new_df[["Subject","Student"]]
data.loc[~data["Student"].isna()]
Subject Student
0 Maths Student A
1 Science Student A
2 Maths Student B
我计划在同一数据框中为 Subject/Student 键获取相关分数,但无法计算出来。
要求输出:
Subject Student Score
0 Maths Student A 100
1 Maths Student B 80
2 Science Student A 80
谁能帮我指点一下!
我建议首先堆叠数据框以获得 MultiIndex 系列(第一级的主题和第二级的学生),然后索引该系列选择所有具有足够分数的学生:
df_stacked = df.stack()
df_stacked[df_stacked.gt(60)]
# Out:
# Subject
# Maths Student A 100
# Student B 80
# Science Student A 80
# dtype: int64
首先,按照您最终想要的方式定位数据:
vertical = df.unstack()
这给你:
Subject
Student A Maths 100
Science 80
Student B Maths 80
Science 20
Student C Maths 20
Science 10
然后简单地:
vertical[vertical > 60]
给你最终结果:
Subject
Student A Maths 100
Science 80
Student B Maths 80
您可以对其进行 reset_index()
处理,使其看起来更像您的示例输出。