Pandas - 检查字符串列是否包含一对字符串
Pandas - check if a string column contains a pair of strings
假设我有一个这样的 DataFrame:
df = pd.DataFrame({'consumption':['squirrel eats apple', 'monkey eats apple',
'monkey eats banana', 'badger eats banana'],
'food':['apple', 'apple', 'banana', 'banana'],
'creature':['squirrel', 'badger', 'monkey', 'elephant']})
consumption creature food
0 squirrel eats apple squirrel apple
1 monkey eats apple badger apple
2 monkey eats banana monkey banana
3 badger eats banana elephant banana
我想找到 'creature' 和 'food' 在 'consumption' 列中组合出现的行,即如果 apple 和 squirrel 一起出现,则 True 但如果 Apple 与 Elephant 一起出现这是假的。同样,如果 Monkey & Banana 一起出现,则为 True,但 Monkey-Apple 为 false。
我尝试的方法类似于:
creature_list = list(df['creature'])
creature_list = '|'.join(map(str, creature_list))
food_list = list(df['food'])
food_list = '|'.join(map(str, food_list))
np.where((df['consumption'].str.contains('('+creature_list+')', case = False))
& (df['consumption'].str.contains('('+food_list+')', case = False)), 1, 0)
但这不起作用,因为我在所有情况下都得到 True。
如何检查字符串对?
我相信有更好的方法来做到这一点。但这是一种方式。
import pandas as pd
import re
df = pd.DataFrame({'consumption':['squirrel eats apple', 'monkey eats apple', 'monkey eats banana', 'badger eats banana'], 'food':['apple', 'apple', 'banana', 'banana'], 'creature':['squirrel', 'badger', 'monkey', 'elephant']})
test = []
for i in range(len(df.consumption)):
test.append(bool(re.search(df.creature[i],df.consumption[i])) & bool((re.search(df.food[i], df.consumption[i]))))
df['test'] = test
这是一种可行的方法:
def match_consumption(r):
if (r['creature'] in r['consumption']) and (r['food'] in r['consumption']):
return True
else:
return False
df['match'] = df.apply(match_consumption, axis=1)
df
consumption creature food match
0 squirrel eats apple squirrel apple True
1 monkey eats apple badger apple False
2 monkey eats banana monkey banana True
3 badger eats banana elephant banana False
检查字符串是否相等太简单了?您可以测试字符串 <creature> eats <food>
是否等于 consumption
列中的相应值:
(df.consumption == df.creature + " eats " + df.food)
假设我有一个这样的 DataFrame:
df = pd.DataFrame({'consumption':['squirrel eats apple', 'monkey eats apple',
'monkey eats banana', 'badger eats banana'],
'food':['apple', 'apple', 'banana', 'banana'],
'creature':['squirrel', 'badger', 'monkey', 'elephant']})
consumption creature food
0 squirrel eats apple squirrel apple
1 monkey eats apple badger apple
2 monkey eats banana monkey banana
3 badger eats banana elephant banana
我想找到 'creature' 和 'food' 在 'consumption' 列中组合出现的行,即如果 apple 和 squirrel 一起出现,则 True 但如果 Apple 与 Elephant 一起出现这是假的。同样,如果 Monkey & Banana 一起出现,则为 True,但 Monkey-Apple 为 false。
我尝试的方法类似于:
creature_list = list(df['creature'])
creature_list = '|'.join(map(str, creature_list))
food_list = list(df['food'])
food_list = '|'.join(map(str, food_list))
np.where((df['consumption'].str.contains('('+creature_list+')', case = False))
& (df['consumption'].str.contains('('+food_list+')', case = False)), 1, 0)
但这不起作用,因为我在所有情况下都得到 True。
如何检查字符串对?
我相信有更好的方法来做到这一点。但这是一种方式。
import pandas as pd
import re
df = pd.DataFrame({'consumption':['squirrel eats apple', 'monkey eats apple', 'monkey eats banana', 'badger eats banana'], 'food':['apple', 'apple', 'banana', 'banana'], 'creature':['squirrel', 'badger', 'monkey', 'elephant']})
test = []
for i in range(len(df.consumption)):
test.append(bool(re.search(df.creature[i],df.consumption[i])) & bool((re.search(df.food[i], df.consumption[i]))))
df['test'] = test
这是一种可行的方法:
def match_consumption(r):
if (r['creature'] in r['consumption']) and (r['food'] in r['consumption']):
return True
else:
return False
df['match'] = df.apply(match_consumption, axis=1)
df
consumption creature food match
0 squirrel eats apple squirrel apple True
1 monkey eats apple badger apple False
2 monkey eats banana monkey banana True
3 badger eats banana elephant banana False
检查字符串是否相等太简单了?您可以测试字符串 <creature> eats <food>
是否等于 consumption
列中的相应值:
(df.consumption == df.creature + " eats " + df.food)