替换 groupby 后的值

Replacing value after groupby

我有一个杂货店记录的数据框:

df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
               columns=['customer', 'product'])

|客户 |产品 | | ------ | ------| |汤姆|苹果1| |汤姆|香蕉35| |杰夫|梨0| 我想得到一个客户曾经购买过的所有产品,所以我使用了

product_by_customer = df.groupby('customer')['product'].unique()
product_by_customer
customer
Jeff [pear0]
Tom [apple1, banana35]

我想去掉产品名称后面的数字。我试过了

product_by_customer.str.replace('[0-9]', '')

但它用 NaN 替换了所有内容。

我想要的输出是 |客户|| |--------|--------| |杰夫|梨| |汤姆|苹果、香蕉|

感谢任何帮助!

产品列中的值是 nd 数组类型。因此没有进行替换。试试下面的代码。

import re

df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
               columns=['customer', 'product'])
df1 = df.groupby(["customer"])["product"].unique().reset_index()
df1["product"] = df1["product"].apply(lambda x: [re.sub("\d","", v ) for v in x])


df1
Out[52]: 
  customer          product
0     Jeff           [pear]
1      Tom  [apple, banana]

我们正在做的是使用 lambda 函数,我们将访问每个数组值,然后替换数字。

df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
               columns=['customer', 'product'])
df1 = df.copy()
df1["product"] = df1["product"].str.replace('[0-9]', '')
product_by_customer = df1.groupby('customer')['product'].unique()
product_by_customer

输出:

customer
Jeff             [pear]
Tom     [apple, banana]
Name: product, dtype: object

make copy df 和groupby之前的change怎么样?

可以先替换再聚合:

product_by_customer = df["product"].str.replace('[0-9]', '')
    .groupby(df['customer']).unique()

print(product_by_customer)

customer
Jeff             [pear]
Tom     [apple, banana]
Name: product, dtype: object

或聚合删除数字:

import re

f = lambda x: [re.sub("[0-9]", "", v) for v in x.unique()]
product_by_customer = df.groupby('customer')['product'].agg(f)

print(product_by_customer)

customer
Jeff             [pear]
Tom     [apple, banana]
Name: product, dtype: object

类似的想法是通过 dict.fromkeys 技巧删除可能的重复项:

f = lambda x: list(dict.fromkeys(x.str.replace('[0-9]', '', regex=True)))
product_by_customer = df.groupby('customer')['product'].agg(f)

print (product_by_customer)

customer
Jeff             [pear]
Tom     [apple, banana]
Name: product, dtype: object