Pandas 根据另一列获取一列中的唯一值 python

Question

这里我有一个如下所示的数据框：

Variable    Groups
1           [0-10]
1           [0-10]
2           [0-10]
2           [0-10]
3           [0-10]
3           [10-20]
4           [10-20]
4           [10-20]
5           [10-20]
5           [10-20]

我只想获取 Variable 列的唯一值，但不想丢失不同 Groups 中的任何重复值，例如：

Variable    Groups
1           [0-10]
2           [0-10]
3           [0-10]
3           [10-20]
4           [10-20]
5           [10-20]

注意还有一个重复的 3，因为每个组中都有一个。我试过了

df_unique = df['Groups'].groupby(df['Variable']).unique().apply(pd.Series)

但这只是返回一团糟。不知道该怎么做，感谢帮助。

Answer 1

您需要编写一个组合两列的表达式，并将unique应用于组合。

Answer 2

可以使用SeriesGroupBy.unique() together with .explode() and .reset_index()，如下：

df.groupby('Variable')['Groups'].unique().explode().reset_index()

另一种方案是使用GroupBy.first()，如下：

df.groupby(['Variable', 'Groups'], as_index=False).first()

结果：

   Variable   Groups
0         1   [0-10]
1         2   [0-10]
2         3   [0-10]
3         3  [10-20]
4         4  [10-20]
5         5  [10-20]

Answer 3

这是另一个选项：

df.groupby(['variable',df['groups'].explode()]).head(1)

Pandas 根据另一列获取一列中的唯一值 python

Pandas get unique values in one column based off of another column python

python

unique

dataframe

pandas