正则表达式 x.group()

Question

请告知导致结果的步骤，包括以下问题。谢谢！

df['text'].str.replace(r'(\w+day\b)', lambda x: x.groups()[0][:3])

Series.str的变换是什么？我无法检查它。
x.groups() 中的 x 是什么，groups() 有什么作用。
为什么 x.groups()[0][3] 中的 [0]？

给定以下数据框，df

0   Monday: The doctor's appointment is at 2:45pm.
1   Tuesday: The dentist's appointment is at 11:30...
2   Wednesday: At 7:00pm, there is a basketball game!
3   Thursday: Be back home by 11:15 pm at the latest.
4   Friday: Take the train at 08:10 am, arrive at ...

上面这段代码转换

至

0          Mon: The doctor's appointment is at 2:45pm.
1       Tue: The dentist's appointment is at 11:30 am.
2          Wed: At 7:00pm, there is a basketball game!
3         Thu: Be back home by 11:15 pm at the latest.
4    Fri: Take the train at 08:10 am, arrive at 09:...
Name: text, dtype: object

Answer 1

作为对@AnuragDabas 评论的补充，这里是使用 python 的 re 模块进行处理的细分：

>>> import re
>>> s = "Monday: The doctor's appointment is at 2:45pm."

>>> re.search(r'(\w+day\b)', s) # find any word ending in "day"
<re.Match object; span=(0, 6), match='Monday'>

>>> re.search(r'(\w+day\b)', s).groups() # get the matching groups
('Monday',)

>>> re.search(r'(\w+day\b)', s).groups()[0] # take the first element
'Monday'

>>> re.search(r'(\w+day\b)', s).groups()[0][:3] # get the first 3 characters
'Mon'

当在 pandas.Series.str.replace 的上下文中使用时，这会将 lambda 传递给 re.sub 函数（如文档中所定义）并使用输出作为匹配项的替换（因此“ABCDEFday”被替换为“ABC”）。

.str.replace第二个参数说明：

repl: str or callable

    Replacement string or a callable. The callable is passed the regex match object and must return a replacement string to be used. See re.sub().

注意。正则表达式在处理任何以 day 结尾的单词的方式上存在缺陷。因此，如果一行包含例如 Saturday: this is my birthday and not a workday!，这将给出 Sat: this is my bir and not a wor!

正则表达式 x.group()

regular expression x.group()

pandas

python-re