我可以在每个单元格都是列表列表的 Python Pandas 列上使用正则表达式搜索或匹配吗?

Can I use regular expressions search or match on a Python Pandas column where each cell is a list of lists?

import pandas as pd
import numpy as np

cycling = pd.DataFrame(
        'qty' : [1,0,2,1,1],
        'item' : ['frame','frame',np.nan,'order including a saddle and other things','brake'],
        'desc' : [np.nan,['bike','wheel'],['bike',['tire','tube']],['saddle',['seatpost','bag']],['bike','brakes']]

Here is the DataFrame


cycling['saddle1'] = [int(bool(re.search(r"saddle",x))) for x in cycling['item'].replace(np.nan,'missing')]

我的原始数据集有缺失值,我想在指标列中将其解析为 0;否则我不关心他们。上面的代码非常适合每个单元格 the fourth row is correctly identified 中包含字符串的列,但我无法修改它以在单元格包含列表或列表列表(如 desc 列)时工作。我试过了:

cycling['saddle2'] = [int(bool(re.search(r"saddle",x))) for y in cycling['desc'].replace(np.nan,'missing') for x in y]


TypeError                                 Traceback (most recent call last)
<ipython-input-45-4c72cdaa87a4> in <module>()
----> 1 cycling['saddle2'] = [int(bool(re.search(r"saddle",x))) for y in cycling['desc'].replace(np.nan,'missing') for x in y]
      2 cycling.head()

1 frames
/usr/lib/python3.6/re.py in search(pattern, string, flags)
    180     """Scan through string looking for a match to the pattern, returning
    181     a match object, or None if no match was found."""
--> 182     return _compile(pattern, flags).search(string)
    184 def sub(pattern, repl, string, count=0, flags=0):

TypeError: expected string or bytes-like object

您可以使用 map 而不是 运行 for 循环(它很慢)。您可以将列表转换为 str 以调用正则表达式。像这样:-

import pandas as pd
import numpy as np
import re

cycling = pd.DataFrame(
        'qty' : [1,0,2,1,1],
        'item' : ['frame','frame',np.nan,'order including a saddle and other things','brake'],
        'desc' : [np.nan,['bike','wheel'],['bike',['tire','tube']],['saddle',['seatpost','bag']],['bike','brakes']]
cycling['saddle1'] = cycling['item'].replace(np.nan,'missing').map(lambda x :int(bool(re.search(r"saddle",x))))
cycling['saddle2'] = cycling['desc'].replace(np.nan,'missing').map(lambda x :int(bool(re.search(r"saddle",str(x)))))

