如何删除除字母、数字和！？ . ; , @ ' 在 python pandas df 中使用正则表达式？

Question

我正在尝试删除除字母、数字和 ! ？ . ; , @' 来自我的 python pandas 列文本。我已经阅读了有关该主题的其他一些问题，但仍然无法使我的工作正常进行。

这是我正在做的一个例子：

import pandas as pd
df = pd.DataFrame({'id':[1,2,3,4],
                  'text':['hey+ guys! wuzup',
                              'hello p3ople!What\'s up?',
                              'hey, how-  thing == do##n',
                              'my name is bond, james b0nd']}
                )

那么我们有以下table:

id                         text
1              hey+ guys! wuzup
2      hello p3ople!What\'s up?
3     hey, how-  thing == do##n
4   my name is bond, james b0nd

现在，尝试删除除字母、数字和 ! ？ . ; , @'

第一次尝试：

df.loc[:,'text'] = df['text'].str.replace(r"^(?!(([a-zA-z]|[\!\?\.\;\,\@\'\"]|\d))+)$",' ',regex=True)

输出

id                         text
1              hey+ guys! wuzup
2       hello p3ople!What's up?
3      hey, how- thing == do##n
4   my name is bond, james b0nd

第二次尝试

df.loc[:,'text'] = df['text'].str.replace(r"(?i)\b(?:(([a-zA-Z\!\?\.\;\,\@\'\"\:\d])))",' ',regex=True)

输出

id                         text
1                  ey+ uys uzup
2              ello 3ople hat p
3            ey ow- hing == o##
4          y ame s ond ames 0nd

第三次尝试

df.loc[:,'text'] = df['text'].str.replace(r'(?i)(?<!\w)(?:[a-zA-Z\!\?\.\;\,\@\'\"\:\d])',' ',regex=True)

输出

id                         text
1                 ey+ uys! uzup
2           ello 3ople! hat' p?
3           ey, ow- hing == o##
4         y ame s ond, ames 0nd

战后，我也尝试使用相同的正则表达式模式使用 re.sub() 函数，但仍然没有得到预期的结果。正本预期结果如下：

id                         text
1               hey guys! wuzup
2       hello p3ople!What's up?
3          hey, how-  thing don
4   my name is bond, james b0nd

有人可以帮我吗？

我在该主题上看到的链接：

https://stackabuse.com/using-regex-for-text-manipulation-in-python/

Answer 1

这是您要找的吗？

df.text.str.replace("(?i)[^0-9a-z!?.;,@' -]",'')
Out: 
0                hey guys! wuzup
1        hello p3ople!What's up?
2          hey, how-  thing  don
3    my name is bond, james b0nd
Name: text, dtype: object

如何删除除字母、数字和！？ . ; , @ ' 在 python pandas df 中使用正则表达式？

how to remove everything but letters, numbers and ! ? . ; , @ ' using regex in python pandas df?

regex

string

text-mining

python-3.x

pandas

如何删除除字母、数字和！ ？ . ; , @ ' 在 python pandas df 中使用正则表达式？

how to remove everything but letters, numbers and ! ? . ; , @ ' using regex in python pandas df?

regex

string

text-mining

python-3.x

pandas

如何删除除字母、数字和！？ . ; , @ ' 在 python pandas df 中使用正则表达式？