由于某种原因，我用于删除 RT 的正则表达式无法正常工作

Question

我的数据框的头部看起来像这样

for i in df.index:
    txt = df.loc[i]["tweet"]
    txt=re.sub(r'@[A-Z0-9a-z_:]+','',txt)#replace username-tags
    txt=re.sub(r'^[RT]+','',txt)#replace RT-tags
    txt = re.sub('https?://[A-Za-z0-9./]+','',txt)#replace URLs
    txt=re.sub("[^a-zA-Z]", " ",txt)#replace hashtags
    df.at[i,"tweet"]=txt

但是，运行这不会删除 'RT' 标签。此外，我还想删除 'b' 标签。

原始结果tweet列：

b Yal suppose you would people waiting for a tub of paint and garden furniture the league is gone and any that thinks anything else is a complete tool of a human who really needs to get down off that cloud lucky to have it back for
b RT watching porn aftern normal people is like no turn it off they don xe x x t love each other
b RT If not now when nn
b Used red wine as a chaser for Captain Morgan xe x x s Fun times
b RT shackattack Hold the front page s Lockdown property project sent me up the walls

Answer 1

你的正则表达式不起作用，因为这个唱 ^ 意味着 在字符串的开头 。但是你要去掉的两个字符不在开头

将r'^[RT]+'更改为r'[RT]+'这两个字母将被删除。但要小心，因为所有其他匹配项也将被删除。

如果您也想删除字母 be，请尝试 r'^b\s([RT]+)?'。

我建议你在 https://regex101.com/

上自己试试

由于某种原因，我用于删除 RT 的正则表达式无法正常工作

My Regex to remove RT is not working for some reason

python

dataframe

pandas

python-re