替换 groupby 后的值
Replacing value after groupby
我有一个杂货店记录的数据框:
df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
columns=['customer', 'product'])
|客户 |产品 |
| ------ | ------|
|汤姆|苹果1|
|汤姆|香蕉35|
|杰夫|梨0|
我想得到一个客户曾经购买过的所有产品,所以我使用了
product_by_customer = df.groupby('customer')['product'].unique()
product_by_customer
customer
Jeff
[pear0]
Tom
[apple1, banana35]
我想去掉产品名称后面的数字。我试过了
product_by_customer.str.replace('[0-9]', '')
但它用 NaN 替换了所有内容。
我想要的输出是
|客户||
|--------|--------|
|杰夫|梨|
|汤姆|苹果、香蕉|
感谢任何帮助!
产品列中的值是 nd 数组类型。因此没有进行替换。试试下面的代码。
import re
df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
columns=['customer', 'product'])
df1 = df.groupby(["customer"])["product"].unique().reset_index()
df1["product"] = df1["product"].apply(lambda x: [re.sub("\d","", v ) for v in x])
df1
Out[52]:
customer product
0 Jeff [pear]
1 Tom [apple, banana]
我们正在做的是使用 lambda 函数,我们将访问每个数组值,然后替换数字。
df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
columns=['customer', 'product'])
df1 = df.copy()
df1["product"] = df1["product"].str.replace('[0-9]', '')
product_by_customer = df1.groupby('customer')['product'].unique()
product_by_customer
输出:
customer
Jeff [pear]
Tom [apple, banana]
Name: product, dtype: object
make copy df 和groupby之前的change怎么样?
可以先替换再聚合:
product_by_customer = df["product"].str.replace('[0-9]', '')
.groupby(df['customer']).unique()
print(product_by_customer)
customer
Jeff [pear]
Tom [apple, banana]
Name: product, dtype: object
或聚合删除数字:
import re
f = lambda x: [re.sub("[0-9]", "", v) for v in x.unique()]
product_by_customer = df.groupby('customer')['product'].agg(f)
print(product_by_customer)
customer
Jeff [pear]
Tom [apple, banana]
Name: product, dtype: object
类似的想法是通过 dict.fromkeys
技巧删除可能的重复项:
f = lambda x: list(dict.fromkeys(x.str.replace('[0-9]', '', regex=True)))
product_by_customer = df.groupby('customer')['product'].agg(f)
print (product_by_customer)
customer
Jeff [pear]
Tom [apple, banana]
Name: product, dtype: object
我有一个杂货店记录的数据框:
df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
columns=['customer', 'product'])
|客户 |产品 | | ------ | ------| |汤姆|苹果1| |汤姆|香蕉35| |杰夫|梨0| 我想得到一个客户曾经购买过的所有产品,所以我使用了
product_by_customer = df.groupby('customer')['product'].unique()
product_by_customer
customer | |
---|---|
Jeff | [pear0] |
Tom | [apple1, banana35] |
我想去掉产品名称后面的数字。我试过了
product_by_customer.str.replace('[0-9]', '')
但它用 NaN 替换了所有内容。
我想要的输出是 |客户|| |--------|--------| |杰夫|梨| |汤姆|苹果、香蕉|
感谢任何帮助!
产品列中的值是 nd 数组类型。因此没有进行替换。试试下面的代码。
import re
df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
columns=['customer', 'product'])
df1 = df.groupby(["customer"])["product"].unique().reset_index()
df1["product"] = df1["product"].apply(lambda x: [re.sub("\d","", v ) for v in x])
df1
Out[52]:
customer product
0 Jeff [pear]
1 Tom [apple, banana]
我们正在做的是使用 lambda 函数,我们将访问每个数组值,然后替换数字。
df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
columns=['customer', 'product'])
df1 = df.copy()
df1["product"] = df1["product"].str.replace('[0-9]', '')
product_by_customer = df1.groupby('customer')['product'].unique()
product_by_customer
输出:
customer
Jeff [pear]
Tom [apple, banana]
Name: product, dtype: object
make copy df 和groupby之前的change怎么样?
可以先替换再聚合:
product_by_customer = df["product"].str.replace('[0-9]', '')
.groupby(df['customer']).unique()
print(product_by_customer)
customer
Jeff [pear]
Tom [apple, banana]
Name: product, dtype: object
或聚合删除数字:
import re
f = lambda x: [re.sub("[0-9]", "", v) for v in x.unique()]
product_by_customer = df.groupby('customer')['product'].agg(f)
print(product_by_customer)
customer
Jeff [pear]
Tom [apple, banana]
Name: product, dtype: object
类似的想法是通过 dict.fromkeys
技巧删除可能的重复项:
f = lambda x: list(dict.fromkeys(x.str.replace('[0-9]', '', regex=True)))
product_by_customer = df.groupby('customer')['product'].agg(f)
print (product_by_customer)
customer
Jeff [pear]
Tom [apple, banana]
Name: product, dtype: object