python pandas如何为类似组数据添加唯一标识符
python pandas how to add unique identifier for similar group data
这是我的数据框
product_title variation_list
Chauvet DJ GigBar Move Effect Light System ['Black', 'White']
Rane Twelve MKII DJ Controller ['New', 'Blemished']
我预期的数据框将如下所示
group_id product_title variation_list unique_id
FAT-1301 Chauvet DJ GigBar Move Effect Light System Black FAT-01
FAT-1301 Chauvet DJ GigBar Move Effect Light System White FAT-02
FAT-1302 Rane Twelve MKII DJ Controller New FAT-03
FAT-1302 Rane Twelve MKII DJ Controller Blemished FAT-04
基本上我想添加额外的两列 group_id
,它将为同一组数据指定全局 ID 和 unique_id
列,将为每个数据指定唯一值。
使用 explode
-
import pandas as pd
d = {'product_title':['Chauvet DJ GigBar Move Effect Light System',' Chauvet DJ GigBar Move Effect Light System'],
'variation_list' :[['Black', 'White'], ['New', 'Blemished']]}
df = pd.DataFrame(d)
df.insert(0, "group_id", df.index + 1)
df = df.explode(['variation_list']).reset_index()
df.insert(4, "unique_id", df.index + 1)
df.drop(columns=['index'], inplace=True)
df.group_id = df.group_id.apply(lambda x: 'FAT-'+ str(x) )
df.unique_id = df.unique_id.apply(lambda x: 'FAT-'+ str(x) )
print(df)
输出-
group_id product_title variation_list unique_id
0 FAT-1 Chauvet DJ GigBar Move Effect Light System Black FAT-1
1 FAT-1 Chauvet DJ GigBar Move Effect Light System White FAT-2
2 FAT-2 Chauvet DJ GigBar Move Effect Light System New FAT-3
3 FAT-2 Chauvet DJ GigBar Move Effect Light System Blemished FAT-4
df2 = df.reset_index().explode('variation_list')
df2['group_id'] = 'FAT' + df2['index'].add(1).astype(str)
df2['unique_id'] = 'FAT' + (df2.reset_index(drop = True).index+1).astype(str)
df2
index product_title ... group_id unique_id
0 0 Chauvet DJ GigBar Move Effect Light System ... FAT1 FAT1
0 0 Chauvet DJ GigBar Move Effect Light System ... FAT1 FAT2
1 1 Chauvet DJ GigBar Move Effect Light System ... FAT2 FAT3
1 1 Chauvet DJ GigBar Move Effect Light System ... FAT2 FAT4
这是我的数据框
product_title variation_list
Chauvet DJ GigBar Move Effect Light System ['Black', 'White']
Rane Twelve MKII DJ Controller ['New', 'Blemished']
我预期的数据框将如下所示
group_id product_title variation_list unique_id
FAT-1301 Chauvet DJ GigBar Move Effect Light System Black FAT-01
FAT-1301 Chauvet DJ GigBar Move Effect Light System White FAT-02
FAT-1302 Rane Twelve MKII DJ Controller New FAT-03
FAT-1302 Rane Twelve MKII DJ Controller Blemished FAT-04
基本上我想添加额外的两列 group_id
,它将为同一组数据指定全局 ID 和 unique_id
列,将为每个数据指定唯一值。
使用 explode
-
import pandas as pd
d = {'product_title':['Chauvet DJ GigBar Move Effect Light System',' Chauvet DJ GigBar Move Effect Light System'],
'variation_list' :[['Black', 'White'], ['New', 'Blemished']]}
df = pd.DataFrame(d)
df.insert(0, "group_id", df.index + 1)
df = df.explode(['variation_list']).reset_index()
df.insert(4, "unique_id", df.index + 1)
df.drop(columns=['index'], inplace=True)
df.group_id = df.group_id.apply(lambda x: 'FAT-'+ str(x) )
df.unique_id = df.unique_id.apply(lambda x: 'FAT-'+ str(x) )
print(df)
输出-
group_id product_title variation_list unique_id
0 FAT-1 Chauvet DJ GigBar Move Effect Light System Black FAT-1
1 FAT-1 Chauvet DJ GigBar Move Effect Light System White FAT-2
2 FAT-2 Chauvet DJ GigBar Move Effect Light System New FAT-3
3 FAT-2 Chauvet DJ GigBar Move Effect Light System Blemished FAT-4
df2 = df.reset_index().explode('variation_list')
df2['group_id'] = 'FAT' + df2['index'].add(1).astype(str)
df2['unique_id'] = 'FAT' + (df2.reset_index(drop = True).index+1).astype(str)
df2
index product_title ... group_id unique_id
0 0 Chauvet DJ GigBar Move Effect Light System ... FAT1 FAT1
0 0 Chauvet DJ GigBar Move Effect Light System ... FAT1 FAT2
1 1 Chauvet DJ GigBar Move Effect Light System ... FAT2 FAT3
1 1 Chauvet DJ GigBar Move Effect Light System ... FAT2 FAT4