如何将数据帧的字符串值映射到某个数字以绘制集群?
how to map string value of dataframe to some number to plot Cluster?
我想通过我知道的唯一方法绘制我的数据集的集群是将字符串映射到某个整数值。喜欢
data_mapped=data.copy()
data_mapped['Language']=data_mapped['Language'].map({'English':0,'French':1,'German':2})
data_mapped
但是在这个例子中,我只有 3 个唯一的语言值可以通过这种方法映射
现在我不知道如何将多个唯一字符串值转换为整数值并绘制簇?
我想按某些列进行聚类,例如 (Color,fabric,dress_type)
我想对整个数据进行聚类?
现在我的数据集是:
file_list=glob.glob('json_file/[!Merg_all]*json')
merg_all_list=[]
for file in file_list:
print(file)
raw_data=pd.read_json(str(file))
raw_data.head()
for i in raw_data['product']:
merg_all_list.append(i)
json 文件:
[{"product": {"brand_name": "So Kamal", "designer": "So Kamal", "title": "So Kamal Women Summer Collection Mustard Lawn 1PC -Unstitched Shirt DPL19 49 LA00964-Std-MST", "description": "description specifications of so kamal women summer collection mustard lawn 1pc unstitched shirt dpl19 49 la00964 std mst brand so kamal sku 105972128_pk 1253666066 features 1pc unstitched main material lawn season summer material family lawn what's in the box 1x 1pc unstitched suit", "dress_type": "shirt", "where_to_wear": "", "color": "mustard", "stitched": false, "season": "summer", "price": 1120, "currency": "Rs", "product_id": "So Kamal Women Summer Collection Mustard Lawn 1PC -Unstitched Shirt DPL19 49 LA00964-Std-MST", "collection_url": "https://lawncollection.pk/brands/", "source": "https://lawncollection.pk/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst.html", "fabric": "lawn", "gender": "women", "frontpic": "https://lawncollection.pk/public/images/products//2019/04/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst-image1.jpeg", "backpic": "https://lawncollection.pk/public/images/products//2019/04/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst-image2.jpeg", "otherpics": ["https://lawncollection.pk/public/images/products//2019/04/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst-image1.jpeg", "https://lawncollection.pk/public/images/products//2019/04/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst-image2.jpeg"], "sku": "SKU: 105972128_PK-1253666066", "details": "https://lawncollection.pk/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst.html https: lawncollection.pk so kamal women summer collection mustard lawn 1pc unstitched shirt dpl19 49 la00964 std mst.html so kamal so kamal women summer collection mustard lawn 1pc -unstitched shirt dpl19 49 la00964-std-mst description specifications of so kamal women summer collection mustard lawn 1pc unstitched shirt dpl19 49 la00964 std mst brand so kamal sku 105972128_pk 1253666066 features 1pc unstitched main material lawn season summer material family lawn what's in the box 1x 1pc unstitched suit", "Category1_list": "unstitched", "size": {"xs": false, "s": false, "m": false, "xl": false, "xxl": false}}}]
数据框
brand_name designer title description dress_type where_to_wear color stitched season price ... source fabric gender frontpic backpic otherpics details Category1_list size sku
0 Polo Ralph Lauren Polo Ralph Lauren Long Sleeve Knit Magic Fleece Sweatshirt - Casual graphic print sweatshirt- Crew neckli... sweatshirt black True 8544 ... https://www.zalora.com.ph/polo-ralph-lauren-lo... cotton man static.ph.zalora.net/p/polo-ralph-lauren-3175-... static.ph.zalora.net/p/polo-ralph-lauren-3175-... [static.ph.zalora.net/p/polo-ralph-lauren-3175... https://www.zalora.com.ph/polo-ralph-lauren-lo... {'xs': False, 's': False, 'm': False, 'xl': Fa... NaN
1 Polo Ralph Lauren Polo Ralph Lauren Basic Mesh Polo Shirt - Colour block polo shirt with brand print- Un... shirt red True 9265 ... https://www.zalora.com.ph/polo-ralph-lauren-ba... cotton man static.ph.zalora.net/p/polo-ralph-lauren-7554-... static.ph.zalora.net/p/polo-ralph-lauren-7555-... [static.ph.zalora.net/p/polo-ralph-lauren-7554... https://www.zalora.com.ph/polo-ralph-lauren-ba... {'xs': False, 's': False, 'm': False, 'xl': Fa... NaN
2 MANGO Man MANGO Man Faux Shearling Denim Jacket - Denim jacket with wash detail- Collar neckli... jacket blue True 4995 ... https://www.zalora.com.ph/mango-man-faux-shear... denim man static.ph.zalora.net/p/mango-man-9782-7201341-... static.ph.zalora.net/p/mango-man-9783-7201341-... [static.ph.zalora.net/p/mango-man-9782-7201341... https://www.zalora.com.ph/mango-man-faux-shear... {'xs': False, 's': False, 'm': False, 'xl': Fa... NaN
3 Polo Ralph Lauren Polo Ralph Lauren Knit Magic Fleece Hoodie - Embroidered front hoodie- Unlined- Hooded ne... True 10598 ... https://www.zalora.com.ph/polo-ralph-lauren-kn... cotton man static.ph.zalora.net/p/polo-ralph-lauren-2320-... static.ph.zalora.net/p/polo-ralph-lauren-2320-... [static.ph.zalora.net/p/polo-ralph-lauren-2320... https://www.zalora.com.ph/polo-ralph-lauren-kn... {'xs': False, 's': True, 'm': True, 'xl': True... NaN
4 MANGO Man MANGO Man Turtleneck Flecked Sweater - Solid hue speckle-knit sweatshirt- High neck... sweatshirt brown True 2995 ... https://www.zalora.com.ph/mango-man-turtleneck... cotton man static.ph.zalora.net/p/mango-man-1900-5990341-... static.ph.zalora.net/p/mango-man-1900-5990341-... [static.ph.zalora.net/p/mango-man-1900-5990341... https://www.zalora.com.ph/mango-man-turtleneck... {'xs': False, 's': False, 'm': False, 'xl': Fa... NaN
为您的数据选择合适的可视化技术。
对于分类数据,条形图比散点图更合适,因为您不需要 x 轴是数字。
还要选择合适的算法... K-means 仅对 连续 变量有意义。将类别编码为 k-means 的整数只是 错误 。在您的情况下,k-means 会假设英语和德语的平均值正好是法语。
从我的 Qasim 教授那里得到了答案,我认为这会对人们有所帮助。
brand1=pd.factorize(clothes_fac['brand_name'])
clothes_fac.brand_name=brand1[0]
clothes_fac.head(5)
这是将每个唯一值转换为某个整数的方法..
我想通过我知道的唯一方法绘制我的数据集的集群是将字符串映射到某个整数值。喜欢
data_mapped=data.copy()
data_mapped['Language']=data_mapped['Language'].map({'English':0,'French':1,'German':2})
data_mapped
但是在这个例子中,我只有 3 个唯一的语言值可以通过这种方法映射
现在我不知道如何将多个唯一字符串值转换为整数值并绘制簇? 我想按某些列进行聚类,例如 (Color,fabric,dress_type) 我想对整个数据进行聚类?
现在我的数据集是:
file_list=glob.glob('json_file/[!Merg_all]*json')
merg_all_list=[]
for file in file_list:
print(file)
raw_data=pd.read_json(str(file))
raw_data.head()
for i in raw_data['product']:
merg_all_list.append(i)
json 文件:
[{"product": {"brand_name": "So Kamal", "designer": "So Kamal", "title": "So Kamal Women Summer Collection Mustard Lawn 1PC -Unstitched Shirt DPL19 49 LA00964-Std-MST", "description": "description specifications of so kamal women summer collection mustard lawn 1pc unstitched shirt dpl19 49 la00964 std mst brand so kamal sku 105972128_pk 1253666066 features 1pc unstitched main material lawn season summer material family lawn what's in the box 1x 1pc unstitched suit", "dress_type": "shirt", "where_to_wear": "", "color": "mustard", "stitched": false, "season": "summer", "price": 1120, "currency": "Rs", "product_id": "So Kamal Women Summer Collection Mustard Lawn 1PC -Unstitched Shirt DPL19 49 LA00964-Std-MST", "collection_url": "https://lawncollection.pk/brands/", "source": "https://lawncollection.pk/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst.html", "fabric": "lawn", "gender": "women", "frontpic": "https://lawncollection.pk/public/images/products//2019/04/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst-image1.jpeg", "backpic": "https://lawncollection.pk/public/images/products//2019/04/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst-image2.jpeg", "otherpics": ["https://lawncollection.pk/public/images/products//2019/04/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst-image1.jpeg", "https://lawncollection.pk/public/images/products//2019/04/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst-image2.jpeg"], "sku": "SKU: 105972128_PK-1253666066", "details": "https://lawncollection.pk/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst.html https: lawncollection.pk so kamal women summer collection mustard lawn 1pc unstitched shirt dpl19 49 la00964 std mst.html so kamal so kamal women summer collection mustard lawn 1pc -unstitched shirt dpl19 49 la00964-std-mst description specifications of so kamal women summer collection mustard lawn 1pc unstitched shirt dpl19 49 la00964 std mst brand so kamal sku 105972128_pk 1253666066 features 1pc unstitched main material lawn season summer material family lawn what's in the box 1x 1pc unstitched suit", "Category1_list": "unstitched", "size": {"xs": false, "s": false, "m": false, "xl": false, "xxl": false}}}]
数据框
brand_name designer title description dress_type where_to_wear color stitched season price ... source fabric gender frontpic backpic otherpics details Category1_list size sku
0 Polo Ralph Lauren Polo Ralph Lauren Long Sleeve Knit Magic Fleece Sweatshirt - Casual graphic print sweatshirt- Crew neckli... sweatshirt black True 8544 ... https://www.zalora.com.ph/polo-ralph-lauren-lo... cotton man static.ph.zalora.net/p/polo-ralph-lauren-3175-... static.ph.zalora.net/p/polo-ralph-lauren-3175-... [static.ph.zalora.net/p/polo-ralph-lauren-3175... https://www.zalora.com.ph/polo-ralph-lauren-lo... {'xs': False, 's': False, 'm': False, 'xl': Fa... NaN
1 Polo Ralph Lauren Polo Ralph Lauren Basic Mesh Polo Shirt - Colour block polo shirt with brand print- Un... shirt red True 9265 ... https://www.zalora.com.ph/polo-ralph-lauren-ba... cotton man static.ph.zalora.net/p/polo-ralph-lauren-7554-... static.ph.zalora.net/p/polo-ralph-lauren-7555-... [static.ph.zalora.net/p/polo-ralph-lauren-7554... https://www.zalora.com.ph/polo-ralph-lauren-ba... {'xs': False, 's': False, 'm': False, 'xl': Fa... NaN
2 MANGO Man MANGO Man Faux Shearling Denim Jacket - Denim jacket with wash detail- Collar neckli... jacket blue True 4995 ... https://www.zalora.com.ph/mango-man-faux-shear... denim man static.ph.zalora.net/p/mango-man-9782-7201341-... static.ph.zalora.net/p/mango-man-9783-7201341-... [static.ph.zalora.net/p/mango-man-9782-7201341... https://www.zalora.com.ph/mango-man-faux-shear... {'xs': False, 's': False, 'm': False, 'xl': Fa... NaN
3 Polo Ralph Lauren Polo Ralph Lauren Knit Magic Fleece Hoodie - Embroidered front hoodie- Unlined- Hooded ne... True 10598 ... https://www.zalora.com.ph/polo-ralph-lauren-kn... cotton man static.ph.zalora.net/p/polo-ralph-lauren-2320-... static.ph.zalora.net/p/polo-ralph-lauren-2320-... [static.ph.zalora.net/p/polo-ralph-lauren-2320... https://www.zalora.com.ph/polo-ralph-lauren-kn... {'xs': False, 's': True, 'm': True, 'xl': True... NaN
4 MANGO Man MANGO Man Turtleneck Flecked Sweater - Solid hue speckle-knit sweatshirt- High neck... sweatshirt brown True 2995 ... https://www.zalora.com.ph/mango-man-turtleneck... cotton man static.ph.zalora.net/p/mango-man-1900-5990341-... static.ph.zalora.net/p/mango-man-1900-5990341-... [static.ph.zalora.net/p/mango-man-1900-5990341... https://www.zalora.com.ph/mango-man-turtleneck... {'xs': False, 's': False, 'm': False, 'xl': Fa... NaN
为您的数据选择合适的可视化技术。
对于分类数据,条形图比散点图更合适,因为您不需要 x 轴是数字。
还要选择合适的算法... K-means 仅对 连续 变量有意义。将类别编码为 k-means 的整数只是 错误 。在您的情况下,k-means 会假设英语和德语的平均值正好是法语。
从我的 Qasim 教授那里得到了答案,我认为这会对人们有所帮助。
brand1=pd.factorize(clothes_fac['brand_name'])
clothes_fac.brand_name=brand1[0]
clothes_fac.head(5)
这是将每个唯一值转换为某个整数的方法..