在 pandas 数据帧映射函数中使用 eval 语句的正确方法

right way to use eval statement in pandas dataframe map function

我有一个 pandas 数据框,其中一列是 'organization',该列的内容是一个字符串,其中包含一个列表:

data['organization'][0]
Out[6] "['loony tunes']"

data['organization'][1]
Out[7] "['the three stooges']"

我想用字符串中的列表替换字符串。我尝试使用 map,其中 map 中的函数是 eval:

data['organization'] = data['organization'].map(eval)

但我得到的是:

Traceback (most recent call last):
  File "C:\Users\xxx\Anaconda3\lib\site-   packages\IPython\core\interactiveshell.py", line 3035, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-3dbc0abf8c2e>", line 1, in <module>
    data['organization'] = data['organization'].map(eval)
  File "C:\Users\xxx\Anaconda3\lib\site-packages\pandas\core\series.py", line 2015, in map
    mapped = map_f(values, arg)
  File "pandas\src\inference.pyx", line 1046, in pandas.lib.map_infer     (pandas\lib.c:56983)
TypeError: eval() arg 1 must be a string, bytes or code object

于是我求助于下面的代码块,效率极低:

for index, line in data['organization'].iteritems():
    print(index)
    if type(line) != str:
        data['organization'][index] = []
    try:
        data['organization'][index] = eval(data['organization'][index])
    except:
        continue

我做错了什么?我如何使用 eval (或矢量化实现)而不是上面笨拙的循环?

我认为问题可能是 pd.series 数据 ['organization'] 中的某些元素不是字符串,所以我实现了以下内容:

def is_string(x):
    if type(x) != str:
        x = ''

data['organization'] = data['organization'].map(is_string)

但我尝试时仍然遇到同样的错误:

data['organization'] = data['organization'].map(eval)

提前致谢。

通常不赞成使用 eval,因为它 允许任意 python 代码成为 运行。所以你应该强烈尽量不要使用它。

在这种情况下,您不需要计算表达式,只需要解析值。这意味着您可以使用 ast 的 literal_eval:

In [11]: s = pd.Series(["['loony tunes']", "['the three stooges']"])

In [12]: from ast import literal_eval

In [13]: s.apply(literal_eval)
Out[13]:
0          [loony tunes]
1    [the three stooges]
dtype: object

In [14]: s.apply(literal_eval)[0]  # look, it works!
Out[14]: ['loony tunes']

来自docs

ast.literal_eval(node_or_string)

Safely evaluate an expression node or a Unicode or Latin-1 encoded string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.