由于某种原因,我用于删除 RT 的正则表达式无法正常工作
My Regex to remove RT is not working for some reason
我的数据框的头部看起来像这样
for i in df.index:
txt = df.loc[i]["tweet"]
txt=re.sub(r'@[A-Z0-9a-z_:]+','',txt)#replace username-tags
txt=re.sub(r'^[RT]+','',txt)#replace RT-tags
txt = re.sub('https?://[A-Za-z0-9./]+','',txt)#replace URLs
txt=re.sub("[^a-zA-Z]", " ",txt)#replace hashtags
df.at[i,"tweet"]=txt
但是,运行 这不会删除 'RT' 标签。此外,我还想删除 'b' 标签。
原始结果tweet
列:
b Yal suppose you would people waiting for a tub of paint and garden furniture the league is gone and any that thinks anything else is a complete tool of a human who really needs to get down off that cloud lucky to have it back for
b RT watching porn aftern normal people is like no turn it off they don xe x x t love each other
b RT If not now when nn
b Used red wine as a chaser for Captain Morgan xe x x s Fun times
b RT shackattack Hold the front page s Lockdown property project sent me up the walls
你的正则表达式不起作用,因为这个唱 ^
意味着 在字符串的开头 。但是你要去掉的两个字符不在开头
将r'^[RT]+'
更改为r'[RT]+'
这两个字母将被删除。但要小心,因为所有其他匹配项也将被删除。
如果您也想删除字母 be,请尝试 r'^b\s([RT]+)?'
。
我建议你在 https://regex101.com/
上自己试试
我的数据框的头部看起来像这样
for i in df.index:
txt = df.loc[i]["tweet"]
txt=re.sub(r'@[A-Z0-9a-z_:]+','',txt)#replace username-tags
txt=re.sub(r'^[RT]+','',txt)#replace RT-tags
txt = re.sub('https?://[A-Za-z0-9./]+','',txt)#replace URLs
txt=re.sub("[^a-zA-Z]", " ",txt)#replace hashtags
df.at[i,"tweet"]=txt
但是,运行 这不会删除 'RT' 标签。此外,我还想删除 'b' 标签。
原始结果tweet
列:
b Yal suppose you would people waiting for a tub of paint and garden furniture the league is gone and any that thinks anything else is a complete tool of a human who really needs to get down off that cloud lucky to have it back for
b RT watching porn aftern normal people is like no turn it off they don xe x x t love each other
b RT If not now when nn
b Used red wine as a chaser for Captain Morgan xe x x s Fun times
b RT shackattack Hold the front page s Lockdown property project sent me up the walls
你的正则表达式不起作用,因为这个唱 ^
意味着 在字符串的开头 。但是你要去掉的两个字符不在开头
将r'^[RT]+'
更改为r'[RT]+'
这两个字母将被删除。但要小心,因为所有其他匹配项也将被删除。
如果您也想删除字母 be,请尝试 r'^b\s([RT]+)?'
。
我建议你在 https://regex101.com/
上自己试试