匹配和替换 pandas 数据框中的月份和日期

Matching and replacing month and day in a pandas dataframe

我正在尝试用完整的月份和日期格式查找并替换 pandas 数据框中的部分月份和日期。但是字符串没有变化

代码

import pandas as pd
data = {'text':['event mon and nov', 'no event on friday', 'december is good', 'welcome jan again']}
df = pd.DataFrame(data)
month = {"jan":"january","feb":"february","mar":"march","apr":"april","may":"may","jun":"june",
        "jul":"july","aug":"august","sep":"september","oct":"october","nov":"november","dec":"december"}
day = {"sun":"sunday","mon":"monday","tue":"tuesday","wed":"wednesday","thu":"thursday","fri":"friday",
      "sat":"saturday"}
df["text_new"] = df["text"].apply(lambda x : re.compile(r"(?:(?:(?:j|J)an)|(?:(?:f|F)eb)| \
(?:(?:m|M)ar)|(?:(?:a|A)pr)|(?:(?:m|M)ay)|(?:(?:j|J)un)|(?:(?:j|J)ul)|(?:(?:a|A)ug)|(?:(?:s|S)ep)|(?:(?:o|O)ct)| \
(?:(?:n|N)ov)|(?:(?:d|D)ec))(?:\s)".join(month)).sub(lambda m: month.get(m.group()), x))

预期输出数据帧

0   event monday and november
1   no event on friday
2   december is good
3   welcome january again

提前致谢

您应该使用 pandas 正则表达式功能(使用 str.replace):

import re

dm = month.copy()
dm.update(day)
# or for python ≥3.9
# dm = day|month

regex = fr'\b({"|".join(dm)})\b'

                                        # if match, replace with dict value
df["text_new"] = df['text'].str.replace(regex, lambda x: dm.get(x.group(), x),
                                        regex=True, # use regex
                                        flags=re.I) # case insensitive

输出:

                 text                   text_new
0   event mon and nov  event monday and november
1  no event on friday         no event on friday
2    december is good           december is good
3   welcome jan again      welcome january again

使用 str.split 获取单词,然后用正确的值替换它们并重新组合您的行:

df['text_new'] = df['text'].str.split('\s+').explode().replace(month | day) \
                           .groupby(level=0).agg(' '.join)
print(df)

# Output
                 text                   text_new
0   event mon and nov  event monday and november
1  no event on friday         no event on friday
2    december is good           december is good
3   welcome jan again      welcome january again