如何根据 python pandas 中不同行的字符串形式创建新列
How to create a new column based on a string formation of a different row in python pandas
假设我有以下 df:
test = pd.DataFrame({'Food': ['Apple Cake', 'Orange Tomato', 'Brocolli Apple', 'Cake Orange', 'Tomato Apple']})
test
Food
0 Apple Cake
1 Orange Tomato
2 Brocolli Apple
3 Cake Orange
4 Tomato Apple
我想创建一个新列,用食物类型替换实际食物:
test1 = pd.DataFrame({'Food': ['Apple Cake', 'Orange Tomato', 'Brocolli Apple', 'Cake Orange', 'Tomato Apple'], 'Type' : ['Fruit Dessert', 'Fruit Veggie', 'Veggie Fruit', 'Dessert Fruit', 'Veggie Fruit']})
test1
Food Type
0 Apple Cake Fruit Dessert
1 Orange Tomato Fruit Veggie
2 Brocolli Apple Veggie Fruit
3 Cake Orange Dessert Fruit
4 Tomato Apple Veggie Fruit
我该怎么做?我会制作以下内容的字典吗:
{'Fruit' : ['Apple', 'Orange'], 'Veggies': ['Brocolli', 'Tomato'], 'Dessert': 'Cake'}
然后用那本字典做点什么?我似乎无法弄清楚。谢谢!
我的方法是:
- 反转列表的字典,使每个值成为一个键,其各自的键作为字典
- 拆分字符串,
stack
ing成pd.Series
,映射得到的字典,groupby
一级索引,join
返回
d = {'Fruit' :['Apple', 'Orange'], 'Veggies':['Brocolli', 'Tomato'], 'Dessert': 'Cake'}
d_inv = {i: k for k,v in d.items() for i in (v if isinstance(v, list) else [v])}
# {'Apple': 'Fruit', 'Orange': 'Fruit', 'Brocolli': 'Veggies', 'Tomato':
# 'Veggies', 'Cake': 'Dessert'}
test['type'] = (test.Food.str.split(expand=True)
.stack()
.map(d_inv)
.groupby(level=0)
.agg(' '.join))
print(test)
Food type
0 Apple Cake Fruit Dessert
1 Orange Tomato Fruit Veggies
2 Brocolli Apple Veggies Fruit
3 Cake Orange Dessert Fruit
4 Tomato Apple Veggies Fruit
我不认为你可以一次完成...但你可以多次完成:
test = pd.DataFrame({'Food': ['Apple Cake', 'Orange Tomato', 'Brocolli Apple', 'Cake Orange', 'Tomato Apple']})
dict = {'Fruit' : [r'Apple', r'Orange'], 'Veggies': [r'Brocolli', r'Tomato'], 'Dessert': [r'Cake']}
test['Type'] = test['Food']
for k in dict.keys():
test['Type'] = test['Type'].replace(regex=dict[k], value=k)
test
Food Type
0 Apple Cake Fruit Dessert
1 Orange Tomato Fruit Veggie
2 Brocolli Apple Veggie Fruit
3 Cake Orange Dessert Fruit
4 Tomato Apple Veggie Fruit
假设我有以下 df:
test = pd.DataFrame({'Food': ['Apple Cake', 'Orange Tomato', 'Brocolli Apple', 'Cake Orange', 'Tomato Apple']})
test
Food
0 Apple Cake
1 Orange Tomato
2 Brocolli Apple
3 Cake Orange
4 Tomato Apple
我想创建一个新列,用食物类型替换实际食物:
test1 = pd.DataFrame({'Food': ['Apple Cake', 'Orange Tomato', 'Brocolli Apple', 'Cake Orange', 'Tomato Apple'], 'Type' : ['Fruit Dessert', 'Fruit Veggie', 'Veggie Fruit', 'Dessert Fruit', 'Veggie Fruit']})
test1
Food Type
0 Apple Cake Fruit Dessert
1 Orange Tomato Fruit Veggie
2 Brocolli Apple Veggie Fruit
3 Cake Orange Dessert Fruit
4 Tomato Apple Veggie Fruit
我该怎么做?我会制作以下内容的字典吗:
{'Fruit' : ['Apple', 'Orange'], 'Veggies': ['Brocolli', 'Tomato'], 'Dessert': 'Cake'}
然后用那本字典做点什么?我似乎无法弄清楚。谢谢!
我的方法是:
- 反转列表的字典,使每个值成为一个键,其各自的键作为字典
- 拆分字符串,
stack
ing成pd.Series
,映射得到的字典,groupby
一级索引,join
返回
d = {'Fruit' :['Apple', 'Orange'], 'Veggies':['Brocolli', 'Tomato'], 'Dessert': 'Cake'}
d_inv = {i: k for k,v in d.items() for i in (v if isinstance(v, list) else [v])}
# {'Apple': 'Fruit', 'Orange': 'Fruit', 'Brocolli': 'Veggies', 'Tomato':
# 'Veggies', 'Cake': 'Dessert'}
test['type'] = (test.Food.str.split(expand=True)
.stack()
.map(d_inv)
.groupby(level=0)
.agg(' '.join))
print(test)
Food type
0 Apple Cake Fruit Dessert
1 Orange Tomato Fruit Veggies
2 Brocolli Apple Veggies Fruit
3 Cake Orange Dessert Fruit
4 Tomato Apple Veggies Fruit
我不认为你可以一次完成...但你可以多次完成:
test = pd.DataFrame({'Food': ['Apple Cake', 'Orange Tomato', 'Brocolli Apple', 'Cake Orange', 'Tomato Apple']})
dict = {'Fruit' : [r'Apple', r'Orange'], 'Veggies': [r'Brocolli', r'Tomato'], 'Dessert': [r'Cake']}
test['Type'] = test['Food']
for k in dict.keys():
test['Type'] = test['Type'].replace(regex=dict[k], value=k)
test
Food Type
0 Apple Cake Fruit Dessert
1 Orange Tomato Fruit Veggie
2 Brocolli Apple Veggie Fruit
3 Cake Orange Dessert Fruit
4 Tomato Apple Veggie Fruit