Pandas DataFrame：无法将字符串转换为浮点数

Question

我在 pandas 数据框中有一列 Column1，其类型为 str，值采用以下形式：

import pandas as pd
df = pd.read_table("filename.dat")
type(df["Column1"].ix[0])   #outputs 'str'
print(df["Column1"].ix[0])

输出 '1/350'。所以，这目前是一个字符串。我想将其转换为浮点数。

我试过这个：

df["Column1"] = df["Column1"].astype('float64', raise_on_error = False)

但这并没有将值更改为浮点数。

这也失败了：

df["Column1"] = df["Column1"].convert_objects(convert_numeric=True)

这失败了：

df["Column1"] = df["Column1"].apply(pd.to_numeric, args=('coerce',))

如何将 "Column1" 列的所有值转换为浮点数？我能以某种方式使用正则表达式来删除括号吗？

编辑：

行

df["Meth"] = df["Meth"].apply(eval)

有效，但前提是我使用它两次，即

df["Meth"] = df["Meth"].apply(eval)
df["Meth"] = df["Meth"].apply(eval)

为什么会这样？

Answer 1

您可以通过对列应用 eval 来实现：

data = {'one':['1/20', '2/30']}
df = pd.DataFrame(data)

In [8]: df['one'].apply(eval)
Out[8]:
0    0.050000
1    0.066667
Name: one, dtype: float64

Answer 2

您需要计算表达式（例如“1/350”）以获得结果，为此您可以使用 Python 的 eval() 函数。

通过围绕它包装 Panda 的 apply() 函数，然后您可以对列中的每个值执行 eval() 函数。示例：

df["Column1"].apply(eval)

在解释文字时，您还可以使用 ast.literal_eval function as noted in the docs. Update: This won't work, as the use of literal_eval() is still restricted to additions and subtractions (source).

备注：正如其他答案和评论中提到的，使用 eval() 并非没有风险，因为您基本上是在执行传入的任何输入。在其他情况下换句话说，如果您的输入包含恶意代码，您就可以免费通过它。

替代选项：

# Define a custom div function
def div(a,b):
    return int(a)/int(b)

# Split each string and pass the values to div
df_floats = df['col1'].apply(lambda x: div(*x.split('/')))

第二种选择如果不干净数据：

通过使用正则表达式，我们可以删除任何出现的非数字。分子前分母后

# Define a custom div function (unchanged)
def div(a,b):
    return int(a)/int(b)

# We'll import the re module and define a precompiled pattern
import re
regex = re.compile('\D*(\d+)/(\d+)\D*')

df_floats = df['col1'].apply(lambda x: div(*regex.findall(x)[0]))

我们会损失一点性能，但好处是即使输入像 '!erefdfs?^dfsdf1/350dqsd qsd qs d'，我们仍然会得到 1/350 的值。

性能：

当在具有 100.000 行的数据帧上对两个选项进行计时时，第二个选项（使用用户定义的 div 函数）显然胜出：

使用 eval：1 个循环，3 个循环中的最佳：每个循环 1.41 秒
使用 div：10 个循环，最好的 3 个循环：每个循环 159 毫秒
使用 re：1 个循环，3 个循环中的最佳：每个循环 275 毫秒

Answer 3

我讨厌提倡使用 eval。我不想花时间在这个答案上，但我不得不这样做，因为我不想让你使用 eval.

所以我写了这个适用于 pd.Series

的函数

def do_math_in_string(s):
    op_map = {'/': '__div__', '*': '__mul__', '+': '__add__', '-': '__sub__'}
    df = s.str.extract(r'(\d+)(\D+)(\d+)', expand=True)
    df = df.stack().str.strip().unstack()
    df.iloc[:, 0] = pd.to_numeric(df.iloc[:, 0]).astype(float)
    df.iloc[:, 2] = pd.to_numeric(df.iloc[:, 2]).astype(float)
    def do_op(x):
        return getattr(x[0], op_map[x[1]])(x[2])
    return df.T.apply(do_op)

示范[=23=]

s = pd.Series(['1/2', '3/4', '4/5'])

do_math_in_string(s)

0    0.50
1    0.75
2    0.80
dtype: float64

do_math_in_string(pd.Series(['1/2', '3/4', '4/5', '6+5', '11-7', '9*10']))

0     0.50
1     0.75
2     0.80
3    11.00
4     4.00
5    90.00
dtype: float64

请不要使用eval。

Pandas DataFrame：无法将字符串转换为浮点数

Pandas DataFrame: Cannot convert string into a float

python

string

valueconverter

dataframe

pandas

示范[=23=]
`s = pd.Series(['1/2', '3/4', '4/5']) do_math_in_string(s) 0 0.50 1 0.75 2 0.80 dtype: float64`

`do_math_in_string(pd.Series(['1/2', '3/4', '4/5', '6+5', '11-7', '9*10'])) 0 0.50 1 0.75 2 0.80 3 11.00 4 4.00 5 90.00 dtype: float64`

请不要使用`eval`。

Pandas DataFrame：无法将字符串转换为浮点数

Pandas DataFrame: Cannot convert string into a float

python

string

valueconverter

dataframe

pandas

示范[​​=23=] s = pd.Series(['1/2', '3/4', '4/5']) do_math_in_string(s) 0 0.50 1 0.75 2 0.80 dtype: float64 do_math_in_string(pd.Series(['1/2', '3/4', '4/5', '6+5', '11-7', '9*10'])) 0 0.50 1 0.75 2 0.80 3 11.00 4 4.00 5 90.00 dtype: float64 请不要使用eval。

示范[=23=]
`s = pd.Series(['1/2', '3/4', '4/5']) do_math_in_string(s) 0 0.50 1 0.75 2 0.80 dtype: float64`

`do_math_in_string(pd.Series(['1/2', '3/4', '4/5', '6+5', '11-7', '9*10'])) 0 0.50 1 0.75 2 0.80 3 11.00 4 4.00 5 90.00 dtype: float64`

请不要使用`eval`。