select groupby python 中另一列之后的列中的单个值
select a single value from a column after groupby another columns in python
我在列 first_register 和 second_register 但似乎没有用。
假设我有这样一个数据框:
import numpy as np
import pandas as pd
df = pd.DataFrame({'class': [1, 1, 1, 2, 2, 2, 0, 0, 1],
'first_register': ["70/20", "70/20", "70/20", "71/20", "71/20", "71/20", np.NAN, np.NAN, np.NAN],
'second_register': [np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, "72/20", "72/20", "73/20"]})
我尝试过但根本不起作用的方法:
group_by_df = df.groupby(["first_register", "second_register"])
label_class = group_by_df["class"].unique()
print(label_class)
如何 select/access 每组数据帧中的每个 class 标签?
所需的输出可以是这样的有序列表,以表示从第一组到最后一组的每组 class:
label_class = [1, 2, 0, 1]
使用dropna=False
:
group_by_df = df.groupby(["first_register", "second_register"], dropna=False)
label_class = group_by_df["class"].unique()
first_register second_register
70/20 NaN [1]
71/20 NaN [2]
NaN 72/20 [0]
73/20 [1]
Name: class, dtype: object
如果你知道 unique class 的长度是 1 或者你想得到第一个或最后一个:
label_class = group_by_df["class"].first()
或者:
label_class = group_by_df["class"].last()
out = df.groupby(["first_register", "second_register"], dropna=False)["class"].first()
print (out)
first_register second_register
70/20 NaN 1
71/20 NaN 2
NaN 72/20 0
73/20 1
Name: class, dtype: int64
label_class = out.tolist()
print (label_class)
[1, 2, 0, 1]
我在列 first_register 和 second_register 但似乎没有用。
假设我有这样一个数据框:
import numpy as np
import pandas as pd
df = pd.DataFrame({'class': [1, 1, 1, 2, 2, 2, 0, 0, 1],
'first_register': ["70/20", "70/20", "70/20", "71/20", "71/20", "71/20", np.NAN, np.NAN, np.NAN],
'second_register': [np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, "72/20", "72/20", "73/20"]})
我尝试过但根本不起作用的方法:
group_by_df = df.groupby(["first_register", "second_register"])
label_class = group_by_df["class"].unique()
print(label_class)
如何 select/access 每组数据帧中的每个 class 标签?
所需的输出可以是这样的有序列表,以表示从第一组到最后一组的每组 class:
label_class = [1, 2, 0, 1]
使用dropna=False
:
group_by_df = df.groupby(["first_register", "second_register"], dropna=False)
label_class = group_by_df["class"].unique()
first_register second_register
70/20 NaN [1]
71/20 NaN [2]
NaN 72/20 [0]
73/20 [1]
Name: class, dtype: object
如果你知道 unique class 的长度是 1 或者你想得到第一个或最后一个:
label_class = group_by_df["class"].first()
或者:
label_class = group_by_df["class"].last()
out = df.groupby(["first_register", "second_register"], dropna=False)["class"].first()
print (out)
first_register second_register
70/20 NaN 1
71/20 NaN 2
NaN 72/20 0
73/20 1
Name: class, dtype: int64
label_class = out.tolist()
print (label_class)
[1, 2, 0, 1]