为每个类别找到独特的价值

Question

我有两列：

category     names
vegetables   [broccoli, ginger]
fruit        [apple, grapes, dragonfruit]
vegetables   [pine]
vegetables   [bottleguord, pumpkin]
fruit        [mango, guava]

我需要找到每个类别包含的唯一值。这就是你如何创建一个新的 df

import numpy as np
import pandas as pd    
df = pd.DataFrame({'category':['vegetables', 'fruit', 'vegetables', 'vegetables', 'fruit'],
                       'Names':['[broccoli, ginger]','[apple, grapes, dragonfruit]','[pine]','[bottleguord, pumpkin]', '[mango, guava]']})

我就是这样做的。

g = df.groupby('category')['names'].apply(lambda x: list(np.unique(x)))

预期输出：

index = ['Vegetables' ,'Fruits']
new_df = pd.DataFrame(index=index)

这就是我调整代码的方式：print(df.assign(Names=df['Names'].str[1:-1].str.split(', ')).explode('Names').groupby('Category')['Names'].apply(lambda x: len(set(x))))

catgeory    len_unique_val
Vegetables   5
Fruits       4

Answer 1

您可以explode、groupby和apply转换为python集。

假设列表作为输入：

(df.explode('names')
   .groupby('category')
   ['names']
   .apply(set)
)

输出：

category
fruit             {dragonfruit, guava, grapes, apple, mango}
vegetables    {ginger, pine, broccoli, bottleguord, pumpkin}

假设字符串作为输入：

(df.assign(Names=df['Names'].str[1:-1].str.split(', '))
   .explode('Names')
   .groupby('category')
   ['Names']
   .apply(lambda x: '['+', '.join(set(x))+']')
)

输出：

category
fruit             [dragonfruit, guava, grapes, apple, mango]
vegetables    [ginger, pine, broccoli, bottleguord, pumpkin]

为每个类别找到独特的价值

Find unique value for each category

python

data-analysis

dataframe

python-3.x

假设列表作为输入：

假设字符串作为输入：