如何在 R 中覆盖 html 文件

how to overwrite a html file in R

我正在尝试将 html 文件中的电子邮件地址替换为反垃圾邮件格式,然后再次将其导出为 nospam.html 文件。 我尝试使用 gsub() 函数来执行此操作,但它似乎不起作用。有什么问题? 谢谢!!!

datei <- scan("https://isor.univie.ac.at/about-us/People.html", sep = "\n", what= "character")
#pattern.email <- "[a-z]+[.]+[a-z]+?[@]+[a-z]+"
reg.email <- "\<[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\>" #works

stelle.email <-gregexpr(reg.email, datei, ignore.case = TRUE) #works

unlist(stelle.email)
res.email<- regmatches(datei, stelle.email)

datei2<-gsub(reg.email, "vornameDOTnameNO-SPAMunivieDOTacDOTat", x = datei)

write(datei2, file = "nospam.html")

知道 regmatches(对于 提取 匹配的子串)也有伴随的 regmatches<- 函数(对于 替换 个匹配的子串)。参见 ?regmatches

所以不需要gsub,只需:

datei <- scan("https://isor.univie.ac.at/about-us/People.html", sep = "\n", what= "character")
# Read 481 items
reg.email <- "\<[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\>" #works
stelle.email <- gregexpr(reg.email, datei, ignore.case = TRUE) #works

# for proof, first look at a substring with a "known" email:
substr(datei[268], 236, 281)

### the only new/different line of code, remove your gsub
regmatches(datei, stelle.email) <- "vornameDOTnameNO-SPAMunivieDOTacDOTat"

# now look at the same portion of that one substring, now updated
substr(datei[268], 236, 281)

write(...)