删除 python 中元组数组中的重复值

Question

我有一个采购产品的聚会。每次客户购买产品时，都会生成一个具有相同派对编号的新行。

我已经根据派对编号对产品进行了分组，现在我遇到了一个包含元组数组的列

Party Nbr	Product
1	(a, a, a, a, b, c)
2	(a, d, a, a)
3	(a, a, b, b, b)

我找不到如何从产品列的每一行中删除所有重复项。

groupby 的代码：

pf = prod.groupby(['Party Nbr'])['Product name'].apply(tuple).reset_index().rename(columns= {'Product name': 'Product'})

pf['Product'] = tuple(set(pf['Product']))


ValueError: Length of values (4663) does not match length of index (32539)

有人能帮帮我吗？

Answer 1

要从 tuple 中删除重复项，您可以使用 set 类型，它会自动删除重复项。您可以通过一个简单的调用来完成：

In [1]: a=(1,2,2,1,1,1,1,3)

In [2]: tuple(set(a))
Out[2]: (1, 2, 3)

Answer 2

假设您正在使用 pandas，我将您的 table 重新创建到数据框中，并展示了如何进行转换。

In [11]: df = pd.DataFrame({
              "party": [1, 2, 3], 
              "product": [
                  ("a", "a", "a", "a", "b", "c"),
                  ("a", "d", "a", "a"),
                  ("a", "a", "b", "b", "b")]})

In [12]: df
Out[12]: 
   party             product
0      1  (a, a, a, a, b, c)
1      2        (a, d, a, a)
2      3     (a, a, b, b, b)

In [13]: df["product"] = df["product"].apply(set).apply(tuple)

In [14]: df
Out[17]: 
   party    product
0      1  (c, b, a)
1      2     (a, d)
2      3     (b, a)

注意：如评论中所述，产品的顺序不保留，要保留顺序，可以使用自定义函数代替链接set & tuple.

删除 python 中元组数组中的重复值

remove duplicate values in a tuple array in python

python

tuples

group-by

dataframe