如何在 DataFrame 的字符串列中应用正则表达式替换？

Question

我有一个名为 "Animals" 的 DataFrame，它看起来像这样：

 Words
 The Black Cat
 The Red Dog

我想在每个单词前添加一个加号，使其看起来像这样：

 Words
 +The +Black +Cat
 +The +Red +Dog

我已经尝试使用正则表达式进行此操作，但没有成功：

 df = re.sub(r'([a-z]+)', r'+', Animals)

Answer 1

您可以使用 str.replace 和以下正则表达式来更改列的所有行：

df.Words = df.Words.str.replace(r'(\b\S)', r'+')

DataFrame 看起来像这样：

>>> df
              Words
0  +The +Black +Cat
1    +The +Red +Dog

How do I apply a regex substitution in a string column of a DataFrame?