Pandas 如果可能，将字符串转换为数字；否则保留字符串值

Question

我有一个 Pandas 数据框，其中的列看起来像这样：

df:

Column0   Column1     Column2
'MSC'       '1'        'R2'
'MIS'       'Tuesday'  '22'
'13'        'Finance'  'Monday'

总的来说，这些列中有实际的字符串，还有字符串格式的数值（整数）。

我发现 this 很好 post 关于 pd.to_numeric 和 astype() 方法，但我看不出是否或如何在我的案例中使用它们。

使用：

pd.to_numeric(df, errors = 'ignore')

只是导致跳过整列。我不想跳过整列，只想跳过那些无法转换的列中的字符串，继续下一个条目并尝试转换下一个字符串。

所以最后，我的数据框将如下所示：

df:

Column0   Column1     Column2
'MSC'       1          'R2'
'MIS'      'Tuesday'    22
 13        'Finance'  'Monday'

是否有一种有效的方法可以遍历这些列并实现这一点？

此致，一月

编辑： 感谢您的所有建议！由于我仍然是 python 初学者，@coldspeed 和 @sacul 的答案对我来说更容易理解，所以我会选择其中之一！

Answer 1

100% 同意评论——在列中混合 dtype 是一个糟糕的主意，性能方面。

不过，作为参考，我会使用 pd.to_numeric 和 fillna:

df2 = df.apply(pd.to_numeric, errors='coerce').fillna(df)
print(df2)
  Column0  Column1 Column2
0     MSC        1      R2
1     MIS  Tuesday      22
2      13  Finance  Monday

列转换为 object dtype 以防止强制转换。当你提取 values:

时你可以看到这个

print(df2.values.tolist())
[['MSC', 1.0, 'R2'], ['MIS', 'Tuesday', 22.0], [13.0, 'Finance', 'Monday']]

Answer 2

我会根据结果应用 pd.to_numeric 和 errors='coerce'，并 update 原始数据框（请参阅评论中的注意事项）：

# show original string type:
df.loc[0,'Column1']
# '1'

df.update(df.apply(pd.to_numeric, errors='coerce'))

>>> df
  Column0  Column1 Column2
0     MSC        1      R2
1     MIS  Tuesday      22
2      13  Finance  Monday

# show updated float type:
df.loc[0,'Column1']
# 1.0

Answer 3

或者您可以简单地使用 str 的 isnumeric() 方法。我喜欢它，因为语法清晰，尽管根据 coldspeed 的评论，这在大 df 上会变得非常慢。

df = df.applymap(lambda x: int(x) if x.isnumeric() else x)

示例：

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([['a','b','c'],['1','1a','c']],columns=['Col1','Col2','Col3'])

In [3]: df
Out[3]:
  Col1 Col2 Col3
0    a    b    c
1    1   1a    c

In [4]: df.Col1.map(lambda x: int(x) if x.isnumeric() else x)
Out[4]:
0    a
1    1
Name: Col1, dtype: object

Answer 4

使用 to_numeric + ignore

df=df.applymap(lambda x : pd.to_numeric(x,errors='ignore'))
df
  Column0  Column1 Column2
0     MSC        1      R2
1     MIS  Tuesday      22
2      13  Finance  Monday
df.applymap(type)
                 Column0                Column1                Column2
0          <class 'str'>  <class 'numpy.int64'>          <class 'str'>
1          <class 'str'>          <class 'str'>  <class 'numpy.int64'>
2  <class 'numpy.int64'>          <class 'str'>          <class 'str'>

Pandas 如果可能，将字符串转换为数字；否则保留字符串值

Pandas convert strings to numeric if possible; else keep string values

python

string

numeric

dataframe

pandas