修改数据框中每个单元格的有效方法

Question

我正在处理一个 python 项目并获得了一个包含多列和多行的数据框。

我想去掉数据框每个单元格中除数字以外的所有内容。不使用循环是否可以做到这一点？

以下是数据示例：

         a       b       c       d       e       f        g      h   
1    att-7   att-3  att-10  att-10   att-15  att-11    att-2  att-7  
2    att-9   att-7  att-12   att-4   att-10   att-4   att-13  att-4  
3   att-10   att-6   att-1   att-1   att-13  att-12    att-9  att-6

我想像这样申请：

def modify_string(cell):
    return cell.str.extract(r'(\d+)')

df_modified = df.apply(lambda x: modify_string(x))

这里可以避免循环吗？由于数据相对较大，最有效的方法是什么？你会如何解决这个问题？

Answer 1

df1
df2 = df1.astype('str').replace('att-', '', regex=True)
df2

更新：如果之后需要使用值作为数字，只需添加以下内容

df2 = df2.astype('int64')

index	a	b	c	d	e	f	g	h
1	7	3	10	10	15	11	2	7
2	9	7	12	4	10	4	13	4
3	10	6	1	1	13	12	9	6

Answer 2

使用 applymap 的第一种方法将按元素应用函数。它依赖于后面跟有“-”的数字。

df.applymap(lambda x: x.split('-')[-1])

如果情况并非总是如此，您也可以使用 str.extract 并提取数字。

df.stack().str.extract(r'(\d+)',expand=False).unstack()

输出：

    a  b   c   d   e   f   g  h
1   7  3  10  10  15  11   2  7
2   9  7  12   4  10   4  13  4
3  10  6   1   1  13  12   9  6

Answer 3

我会使用：https://pypi.org/project/pandarallel/ 和简单的应用功能。

修改数据框中每个单元格的有效方法

Efficient way to modify every cell in a dataframe

python

dataframe

pandas