从 Python 中一列的开头和结尾删除空格和标点符号（括号除外）

Question

给定一个小数据集如下：

df = pd.DataFrame({'text':[' a..b?!??', '%hgh&12','abc123(bj)!!!', '$$34（gz）']})
df

输出：

            text
0       a..b?!??
1        %hgh&12
2  abc123(bj)!!!
3    $$34（gz）

我需要删除text列左右两边的空格、标点符号，中英文括号除外.

预期结果：

            text
0           a..b
1         hgh&12
2     abc123(bj)
3       1234（gz）

我怎么能在 Python 中做到这一点？

我的代码：

df['text'] = df['text'].str.replace('[^\w\s]','')

输出：

0          ab
1       hgh12
2    abc123bj
3      1234gz
Name: text, dtype: object

谢谢。

Answer 1

我认为您需要 Series.str.strip，所有值来自 string.punctuation，不带括号，并且还添加了 :

df['text'] = df['text'].str.strip('!"#$%&*+,-./:;<=>?@[\]^_`{|}~ ' + "'")
print (df)
         text
0        a..b
1      hgh&12
2  abc123(bj)
3    1234（gz）

动态解应该是：

import string
rem = ['(',')']
add = [' ']
a = set(list(string.punctuation) + add) - set(rem)
    
df['text'] = df['text'].str.strip(''.join(a))

Answer 2

使用“剥离”功能。下面的小例子

df['text'] = df['text'].apply(lambda x: x.strip())

从 Python 中一列的开头和结尾删除空格和标点符号（括号除外）

Remove white spaces and punctuations (except parentheses) from starting and end of one column in Python

str-replace

python-3.x

pandas