如何删除并仅保留某些非字母数字字符？

Question

我有 df 看起来像这样：

email                                    id
{'email': ['test@test.com']}           {'id': ['123abc_d456_789_fgh']}

当我像这样删除非字母数字字符时：

df.email = df.email.str.replace('[^a-zA-Z]', '')
df.email = df.email.str.replace('email', '')


df.id = df.id.str.replace('[^a-zA-Z]', '')
df.id = df.id.str.replace('id', '')

列如下所示：

email                    id
testtestcom              123abcd456789fgh

如何告诉代码不要在方括号中删除任何内容，而将所有非字母数字字符放在括号外？

新的 df 应该是这样的：

email                        id
test@test.com                123abc_d456_789_fgh

Answer 1

根据评论，您可能要做的是捕获捕获组中方括号之间的内容。

在替换中使用第一个捕获组。

\{'[^']+':\s*\['([^][]+)'\]}

那将匹配

\{ 匹配 {
'[^']+'匹配'，然后不匹配'1+次
:字面匹配
\s*\['匹配0+次空白字符然后[
([^][]+) 捕获组，不匹配 [ 或 ]
'\] 匹配 ]
}字面匹配

Regex demo | Python demo

Answer 2

这是硬编码的，但有效：

df.email = df.email.str.replace(".+\['|'].+", '')
df.id = df.id.str.replace(".+\['|'].+", '')

>>> 'test@test.com'
>>> '123abc_d456_789_fgh'

如何删除并仅保留某些非字母数字字符？

How to drop and keep only certain non alphanumeric characters?

replace

non-alphanumeric

python-3.x

pandas