正则表达式 x.group()

regular expression x.group()

请告知导致结果的步骤,包括以下问题。谢谢!

df['text'].str.replace(r'(\w+day\b)', lambda x: x.groups()[0][:3])

  1. Series.str的变换是什么?我无法检查它。
  2. x.groups() 中的 x 是什么,groups() 有什么作用。
  3. 为什么 x.groups()[0][3] 中的 [0]

给定以下数据框,df

0   Monday: The doctor's appointment is at 2:45pm.
1   Tuesday: The dentist's appointment is at 11:30...
2   Wednesday: At 7:00pm, there is a basketball game!
3   Thursday: Be back home by 11:15 pm at the latest.
4   Friday: Take the train at 08:10 am, arrive at ...

上面这段代码转换

0          Mon: The doctor's appointment is at 2:45pm.
1       Tue: The dentist's appointment is at 11:30 am.
2          Wed: At 7:00pm, there is a basketball game!
3         Thu: Be back home by 11:15 pm at the latest.
4    Fri: Take the train at 08:10 am, arrive at 09:...
Name: text, dtype: object

作为对@AnuragDabas 评论的补充,这里是使用 python 的 re 模块进行处理的细分:

>>> import re
>>> s = "Monday: The doctor's appointment is at 2:45pm."

>>> re.search(r'(\w+day\b)', s) # find any word ending in "day"
<re.Match object; span=(0, 6), match='Monday'>

>>> re.search(r'(\w+day\b)', s).groups() # get the matching groups
('Monday',)

>>> re.search(r'(\w+day\b)', s).groups()[0] # take the first element
'Monday'

>>> re.search(r'(\w+day\b)', s).groups()[0][:3] # get the first 3 characters
'Mon'

当在 pandas.Series.str.replace 的上下文中使用时,这会将 lambda 传递给 re.sub 函数(如文档中所定义)并使用输出作为匹配项的替换(因此“ABCDEFday”被替换为“ABC”)。

.str.replace第二个参数说明:

repl: str or callable

    Replacement string or a callable. The callable is passed the regex match object and must return a replacement string to be used. See re.sub().

注意。正则表达式在处理 任何 day 结尾的单词的方式上存在缺陷。因此,如果一行包含例如 Saturday: this is my birthday and not a workday!,这将给出 Sat: this is my bir and not a wor!