为每个类别找到独特的价值
Find unique value for each category
我有两列:
category names
vegetables [broccoli, ginger]
fruit [apple, grapes, dragonfruit]
vegetables [pine]
vegetables [bottleguord, pumpkin]
fruit [mango, guava]
我需要找到每个类别包含的唯一值。
这就是你如何创建一个新的 df
import numpy as np
import pandas as pd
df = pd.DataFrame({'category':['vegetables', 'fruit', 'vegetables', 'vegetables', 'fruit'],
'Names':['[broccoli, ginger]','[apple, grapes, dragonfruit]','[pine]','[bottleguord, pumpkin]', '[mango, guava]']})
我就是这样做的。
g = df.groupby('category')['names'].apply(lambda x: list(np.unique(x)))
预期输出:
index = ['Vegetables' ,'Fruits']
new_df = pd.DataFrame(index=index)
这就是我调整代码的方式:print(df.assign(Names=df['Names'].str[1:-1].str.split(', ')).explode('Names').groupby('Category')['Names'].apply(lambda x: len(set(x))))
catgeory len_unique_val
Vegetables 5
Fruits 4
您可以explode
、groupby
和apply
转换为python集。
假设列表作为输入:
(df.explode('names')
.groupby('category')
['names']
.apply(set)
)
输出:
category
fruit {dragonfruit, guava, grapes, apple, mango}
vegetables {ginger, pine, broccoli, bottleguord, pumpkin}
假设字符串作为输入:
(df.assign(Names=df['Names'].str[1:-1].str.split(', '))
.explode('Names')
.groupby('category')
['Names']
.apply(lambda x: '['+', '.join(set(x))+']')
)
输出:
category
fruit [dragonfruit, guava, grapes, apple, mango]
vegetables [ginger, pine, broccoli, bottleguord, pumpkin]
我有两列:
category names
vegetables [broccoli, ginger]
fruit [apple, grapes, dragonfruit]
vegetables [pine]
vegetables [bottleguord, pumpkin]
fruit [mango, guava]
我需要找到每个类别包含的唯一值。 这就是你如何创建一个新的 df
import numpy as np
import pandas as pd
df = pd.DataFrame({'category':['vegetables', 'fruit', 'vegetables', 'vegetables', 'fruit'],
'Names':['[broccoli, ginger]','[apple, grapes, dragonfruit]','[pine]','[bottleguord, pumpkin]', '[mango, guava]']})
我就是这样做的。
g = df.groupby('category')['names'].apply(lambda x: list(np.unique(x)))
预期输出:
index = ['Vegetables' ,'Fruits']
new_df = pd.DataFrame(index=index)
这就是我调整代码的方式:print(df.assign(Names=df['Names'].str[1:-1].str.split(', ')).explode('Names').groupby('Category')['Names'].apply(lambda x: len(set(x))))
catgeory len_unique_val
Vegetables 5
Fruits 4
您可以explode
、groupby
和apply
转换为python集。
假设列表作为输入:
(df.explode('names')
.groupby('category')
['names']
.apply(set)
)
输出:
category
fruit {dragonfruit, guava, grapes, apple, mango}
vegetables {ginger, pine, broccoli, bottleguord, pumpkin}
假设字符串作为输入:
(df.assign(Names=df['Names'].str[1:-1].str.split(', '))
.explode('Names')
.groupby('category')
['Names']
.apply(lambda x: '['+', '.join(set(x))+']')
)
输出:
category
fruit [dragonfruit, guava, grapes, apple, mango]
vegetables [ginger, pine, broccoli, bottleguord, pumpkin]