正则表达式 x.group()
regular expression x.group()
请告知导致结果的步骤,包括以下问题。谢谢!
df['text'].str.replace(r'(\w+day\b)', lambda x: x.groups()[0][:3])
Series.str
的变换是什么?我无法检查它。
x.groups()
中的 x
是什么,groups()
有什么作用。
- 为什么
x.groups()[0][3]
中的 [0]
?
给定以下数据框,df
0 Monday: The doctor's appointment is at 2:45pm.
1 Tuesday: The dentist's appointment is at 11:30...
2 Wednesday: At 7:00pm, there is a basketball game!
3 Thursday: Be back home by 11:15 pm at the latest.
4 Friday: Take the train at 08:10 am, arrive at ...
上面这段代码转换
至
0 Mon: The doctor's appointment is at 2:45pm.
1 Tue: The dentist's appointment is at 11:30 am.
2 Wed: At 7:00pm, there is a basketball game!
3 Thu: Be back home by 11:15 pm at the latest.
4 Fri: Take the train at 08:10 am, arrive at 09:...
Name: text, dtype: object
作为对@AnuragDabas 评论的补充,这里是使用 python 的 re
模块进行处理的细分:
>>> import re
>>> s = "Monday: The doctor's appointment is at 2:45pm."
>>> re.search(r'(\w+day\b)', s) # find any word ending in "day"
<re.Match object; span=(0, 6), match='Monday'>
>>> re.search(r'(\w+day\b)', s).groups() # get the matching groups
('Monday',)
>>> re.search(r'(\w+day\b)', s).groups()[0] # take the first element
'Monday'
>>> re.search(r'(\w+day\b)', s).groups()[0][:3] # get the first 3 characters
'Mon'
当在 pandas.Series.str.replace
的上下文中使用时,这会将 lambda
传递给 re.sub
函数(如文档中所定义)并使用输出作为匹配项的替换(因此“ABCDEFday”被替换为“ABC”)。
.str.replace
第二个参数说明:
repl: str or callable
Replacement string or a callable. The callable is passed the regex match object and must return a replacement string to be used. See re.sub().
注意。正则表达式在处理 任何 以 day
结尾的单词的方式上存在缺陷。因此,如果一行包含例如 Saturday: this is my birthday and not a workday!
,这将给出 Sat: this is my bir and not a wor!
请告知导致结果的步骤,包括以下问题。谢谢!
df['text'].str.replace(r'(\w+day\b)', lambda x: x.groups()[0][:3])
Series.str
的变换是什么?我无法检查它。x.groups()
中的x
是什么,groups()
有什么作用。- 为什么
x.groups()[0][3]
中的[0]
?
给定以下数据框,df
0 Monday: The doctor's appointment is at 2:45pm.
1 Tuesday: The dentist's appointment is at 11:30...
2 Wednesday: At 7:00pm, there is a basketball game!
3 Thursday: Be back home by 11:15 pm at the latest.
4 Friday: Take the train at 08:10 am, arrive at ...
上面这段代码转换
至
0 Mon: The doctor's appointment is at 2:45pm.
1 Tue: The dentist's appointment is at 11:30 am.
2 Wed: At 7:00pm, there is a basketball game!
3 Thu: Be back home by 11:15 pm at the latest.
4 Fri: Take the train at 08:10 am, arrive at 09:...
Name: text, dtype: object
作为对@AnuragDabas 评论的补充,这里是使用 python 的 re
模块进行处理的细分:
>>> import re
>>> s = "Monday: The doctor's appointment is at 2:45pm."
>>> re.search(r'(\w+day\b)', s) # find any word ending in "day"
<re.Match object; span=(0, 6), match='Monday'>
>>> re.search(r'(\w+day\b)', s).groups() # get the matching groups
('Monday',)
>>> re.search(r'(\w+day\b)', s).groups()[0] # take the first element
'Monday'
>>> re.search(r'(\w+day\b)', s).groups()[0][:3] # get the first 3 characters
'Mon'
当在 pandas.Series.str.replace
的上下文中使用时,这会将 lambda
传递给 re.sub
函数(如文档中所定义)并使用输出作为匹配项的替换(因此“ABCDEFday”被替换为“ABC”)。
.str.replace
第二个参数说明:
repl: str or callable
Replacement string or a callable. The callable is passed the regex match object and must return a replacement string to be used. See re.sub().
注意。正则表达式在处理 任何 以 day
结尾的单词的方式上存在缺陷。因此,如果一行包含例如 Saturday: this is my birthday and not a workday!
,这将给出 Sat: this is my bir and not a wor!