select groupby python 中另一列之后的列中的单个值

select a single value from a column after groupby another columns in python

我在列 first_registersecond_register 但似乎没有用。

假设我有这样一个数据框:

import numpy as np
import pandas as pd
df = pd.DataFrame({'class': [1, 1, 1, 2, 2, 2, 0, 0, 1],
                   'first_register': ["70/20", "70/20", "70/20", "71/20", "71/20", "71/20", np.NAN, np.NAN, np.NAN],
                   'second_register': [np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, "72/20", "72/20", "73/20"]})

我尝试过但根本不起作用的方法:

group_by_df = df.groupby(["first_register", "second_register"])
label_class = group_by_df["class"].unique()
print(label_class)

如何 select/access 每组数据帧中的每个 class 标签?

所需的输出可以是这样的有序列表,以表示从第一组到最后一组的每组 class:

label_class = [1, 2, 0, 1]

使用dropna=False:

group_by_df = df.groupby(["first_register", "second_register"], dropna=False)
label_class = group_by_df["class"].unique()


first_register  second_register
70/20           NaN                [1]
71/20           NaN                [2]
NaN             72/20              [0]
                73/20              [1]
Name: class, dtype: object

如果你知道 unique class 的长度是 1 或者你想得到第一个或最后一个:

label_class = group_by_df["class"].first()

或者:

label_class = group_by_df["class"].last()

使用GroupBy.first:

out = df.groupby(["first_register", "second_register"], dropna=False)["class"].first()
print (out)

first_register  second_register
70/20           NaN                1
71/20           NaN                2
NaN             72/20              0
                73/20              1
Name: class, dtype: int64


label_class = out.tolist()
print (label_class)
[1, 2, 0, 1]