有没有一种方法可以对 pandas 中数据框中的一组重复项进行编号?
Is there a way to number repeated items from a group on a dataframe in pandas?
我有一个包含区域、客户和一些交付的数据框。
此列用作 购买类型 ,第一次和最后一次购买标记为 'first' 和 'last',但任何中间交付都标记为“交付“
有没有办法转换交付并获得“交付1”,“交付2”等标签?
import pandas as pd
data = [['NY', 'A','FIRST', 10], ['NY', 'A','DELIVERY', 20], ['NY', 'A','DELIVERY', 30], ['NY', 'A','LAST', 25],
['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY', 10], ['NY', 'B','LAST', 20],
['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY', 10], ['NY', 'A','DELIVERY', 12], ['NY', 'A','DELIVERY', 25], ['NY', 'A','LAST', 20]
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Region', 'Client', 'purchaseType', 'price'])
# print dataframe.
df
期望的输出:
data2 = [['NY', 'A','FIRST', 10], ['NY', 'A','DELIVERY1', 20], ['NY', 'A','DELIVERY2', 30], ['NY', 'A','LAST', 25],
['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY1', 10], ['NY', 'B','LAST', 20],
['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY1', 10], ['NY', 'A','DELIVERY2', 12], ['NY', 'A','DELIVERY3', 25], ['NY', 'A','LAST', 20]
]
df2 = pd.DataFrame(data2, columns = ['Region', 'Client', 'purchaseType', 'price'])
print(df2)
提前致谢!
我们可以试试 GroupBy.cumcount
and Series.str.cat
blocks = df['purchaseType'].eq('FIRST').cumsum()
fill_values = df['purchaseType'].str.cat(df.groupby(blocks)
.cumcount().astype(str),
sep='')
df.loc[df['purchaseType'].eq('DELIVERY'), 'purchaseType'] = fill_values
print(df)
# Region Client purchaseType price
# 0 NY A FIRST 10
# 1 NY A DELIVERY1 20
# 2 NY A DELIVERY2 30
# 3 NY A LAST 25
# 4 NY B FIRST 15
# 5 NY B DELIVERY1 10
# 6 NY B LAST 20
# 7 FL A FIRST 15
# 8 FL A DELIVERY1 10
# 9 NY A DELIVERY2 12
# 10 NY A DELIVERY3 25
# 11 NY A LAST 20
您可以使用np.where
来决定在何处添加数字后缀:
df['purchaseType'] = df.groupby((df['purchaseType']=='FIRST').cumsum())['purchaseType'].transform(
lambda x: np.where(x=='DELIVERY', x+np.arange(len(x)).astype(str), x)
)
print(df)
打印:
Region Client purchaseType price
0 NY A FIRST 10
1 NY A DELIVERY1 20
2 NY A DELIVERY2 30
3 NY A LAST 25
4 NY B FIRST 15
5 NY B DELIVERY1 10
6 NY B LAST 20
7 FL A FIRST 15
8 FL A DELIVERY1 10
9 NY A DELIVERY2 12
10 NY A DELIVERY3 25
11 NY A LAST 20
我有一个包含区域、客户和一些交付的数据框。 此列用作 购买类型 ,第一次和最后一次购买标记为 'first' 和 'last',但任何中间交付都标记为“交付“ 有没有办法转换交付并获得“交付1”,“交付2”等标签?
import pandas as pd
data = [['NY', 'A','FIRST', 10], ['NY', 'A','DELIVERY', 20], ['NY', 'A','DELIVERY', 30], ['NY', 'A','LAST', 25],
['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY', 10], ['NY', 'B','LAST', 20],
['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY', 10], ['NY', 'A','DELIVERY', 12], ['NY', 'A','DELIVERY', 25], ['NY', 'A','LAST', 20]
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Region', 'Client', 'purchaseType', 'price'])
# print dataframe.
df
期望的输出:
data2 = [['NY', 'A','FIRST', 10], ['NY', 'A','DELIVERY1', 20], ['NY', 'A','DELIVERY2', 30], ['NY', 'A','LAST', 25],
['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY1', 10], ['NY', 'B','LAST', 20],
['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY1', 10], ['NY', 'A','DELIVERY2', 12], ['NY', 'A','DELIVERY3', 25], ['NY', 'A','LAST', 20]
]
df2 = pd.DataFrame(data2, columns = ['Region', 'Client', 'purchaseType', 'price'])
print(df2)
提前致谢!
我们可以试试 GroupBy.cumcount
and Series.str.cat
blocks = df['purchaseType'].eq('FIRST').cumsum()
fill_values = df['purchaseType'].str.cat(df.groupby(blocks)
.cumcount().astype(str),
sep='')
df.loc[df['purchaseType'].eq('DELIVERY'), 'purchaseType'] = fill_values
print(df)
# Region Client purchaseType price
# 0 NY A FIRST 10
# 1 NY A DELIVERY1 20
# 2 NY A DELIVERY2 30
# 3 NY A LAST 25
# 4 NY B FIRST 15
# 5 NY B DELIVERY1 10
# 6 NY B LAST 20
# 7 FL A FIRST 15
# 8 FL A DELIVERY1 10
# 9 NY A DELIVERY2 12
# 10 NY A DELIVERY3 25
# 11 NY A LAST 20
您可以使用np.where
来决定在何处添加数字后缀:
df['purchaseType'] = df.groupby((df['purchaseType']=='FIRST').cumsum())['purchaseType'].transform(
lambda x: np.where(x=='DELIVERY', x+np.arange(len(x)).astype(str), x)
)
print(df)
打印:
Region Client purchaseType price
0 NY A FIRST 10
1 NY A DELIVERY1 20
2 NY A DELIVERY2 30
3 NY A LAST 25
4 NY B FIRST 15
5 NY B DELIVERY1 10
6 NY B LAST 20
7 FL A FIRST 15
8 FL A DELIVERY1 10
9 NY A DELIVERY2 12
10 NY A DELIVERY3 25
11 NY A LAST 20