如何根据 python pandas 中不同行的字符串形式创建新列

Question

假设我有以下 df:

test = pd.DataFrame({'Food': ['Apple Cake', 'Orange Tomato', 'Brocolli Apple', 'Cake Orange', 'Tomato Apple']})
test


       Food
0   Apple Cake
1   Orange Tomato
2   Brocolli Apple
3   Cake Orange
4   Tomato Apple

我想创建一个新列，用食物类型替换实际食物：

test1 = pd.DataFrame({'Food': ['Apple Cake', 'Orange Tomato', 'Brocolli Apple', 'Cake Orange', 'Tomato Apple'], 'Type' : ['Fruit Dessert', 'Fruit Veggie', 'Veggie Fruit', 'Dessert Fruit', 'Veggie Fruit']})
test1


       Food             Type
0   Apple Cake      Fruit Dessert
1   Orange Tomato   Fruit Veggie
2   Brocolli Apple  Veggie Fruit
3   Cake Orange     Dessert Fruit
4   Tomato Apple    Veggie Fruit

我该怎么做？我会制作以下内容的字典吗：

{'Fruit' : ['Apple', 'Orange'], 'Veggies': ['Brocolli', 'Tomato'], 'Dessert': 'Cake'}

然后用那本字典做点什么？我似乎无法弄清楚。谢谢！

Answer 1

我的方法是：

反转列表的字典，使每个值成为一个键，其各自的键作为字典
拆分字符串，stacking成pd.Series，映射得到的字典，groupby一级索引，join返回

d  = {'Fruit' :['Apple', 'Orange'], 'Veggies':['Brocolli', 'Tomato'], 'Dessert': 'Cake'}

d_inv = {i: k  for k,v in d.items() for i in (v if isinstance(v, list) else [v])}
# {'Apple': 'Fruit', 'Orange': 'Fruit', 'Brocolli': 'Veggies', 'Tomato': 
# 'Veggies', 'Cake': 'Dessert'}

test['type'] = (test.Food.str.split(expand=True)
                         .stack()
                         .map(d_inv)
                         .groupby(level=0)
                         .agg(' '.join))

print(test)

        Food           type
0      Apple Cake  Fruit Dessert
1   Orange Tomato  Fruit Veggies
2  Brocolli Apple  Veggies Fruit
3     Cake Orange  Dessert Fruit
4    Tomato Apple  Veggies Fruit

Answer 2

我不认为你可以一次完成...但你可以多次完成：

test = pd.DataFrame({'Food': ['Apple Cake', 'Orange Tomato', 'Brocolli Apple', 'Cake Orange', 'Tomato Apple']})

dict = {'Fruit' : [r'Apple', r'Orange'], 'Veggies': [r'Brocolli', r'Tomato'], 'Dessert': [r'Cake']}

test['Type'] = test['Food']
for k in dict.keys():
    test['Type'] = test['Type'].replace(regex=dict[k], value=k)

test

       Food             Type
0   Apple Cake      Fruit Dessert
1   Orange Tomato   Fruit Veggie
2   Brocolli Apple  Veggie Fruit
3   Cake Orange     Dessert Fruit
4   Tomato Apple    Veggie Fruit

如何根据 python pandas 中不同行的字符串形式创建新列

How to create a new column based on a string formation of a different row in python pandas

python

regex

string

string-formatting

pandas