HTML 在 R 中编码文本
HTML encode text in R
我正在查看 Twitter 数据,然后将其输入到 html 文档中。文本通常包含特殊字符,例如未针对 html 正确编码的表情符号。例如推文:
If both #AvengersEndgame and #Joker are nominated for Best Picture, it will be Marvel vs DC for the first time in a Best Picture race. I think both films deserve the nod, but the Twitter discourse leading up to the ceremony will be
会变成:
If both #AvengersEndgame and #Joker are nominated for Best Picture, it will be Marvel vs DC for the first time in a Best Picture race. I think both films deserve the nod, but the Twitter discourse leading up to the ceremony will be 🔥 🔥 🔥
当输入 html 文档时。
手动工作我可以使用 https://www.textfixer.com/html/html-character-encoding.php 之类的工具将推文编码为:
If both #AvengersEndgame and #Joker are nominated for Best Picture, it will be Marvel vs DC for the first time in a Best Picture race. I think both films deserve the nod, but the Twitter discourse leading up to the ceremony will be "�";"�"; "�";"�"; "�";"�";
然后我可以将其提供给 html 文档并显示表情符号。 R 中是否有一个包或函数可以接受文本并 html 像上面的网络工具一样对其进行编码?
这是一个将非 ascii 字符编码为 HTML 实体的函数。
entity_encode <- function(x) {
cp <- utf8ToInt(x)
rr <- vector("character", length(cp))
ucp <- cp>128
rr[ucp] <- paste0("&#", as.character(cp[ucp]), ";")
rr[!ucp] <- sapply(cp[!ucp], function(z) rawToChar(as.raw(z)))
paste0(rr, collapse="")
}
这个returns
[1] "If both #AvengersEndgame and #Joker are nominated for Best Picture, it will be Marvel vs DC for the first time in a Best Picture race. I think both films deserve the nod, but the Twitter discourse leading up to the ceremony will be 🔥 🔥 🔥"
供您输入,但这些编码似乎是等效的。
我正在查看 Twitter 数据,然后将其输入到 html 文档中。文本通常包含特殊字符,例如未针对 html 正确编码的表情符号。例如推文:
If both #AvengersEndgame and #Joker are nominated for Best Picture, it will be Marvel vs DC for the first time in a Best Picture race. I think both films deserve the nod, but the Twitter discourse leading up to the ceremony will be
会变成:
If both #AvengersEndgame and #Joker are nominated for Best Picture, it will be Marvel vs DC for the first time in a Best Picture race. I think both films deserve the nod, but the Twitter discourse leading up to the ceremony will be 🔥 🔥 🔥
当输入 html 文档时。
手动工作我可以使用 https://www.textfixer.com/html/html-character-encoding.php 之类的工具将推文编码为:
If both #AvengersEndgame and #Joker are nominated for Best Picture, it will be Marvel vs DC for the first time in a Best Picture race. I think both films deserve the nod, but the Twitter discourse leading up to the ceremony will be "�";"�"; "�";"�"; "�";"�";
然后我可以将其提供给 html 文档并显示表情符号。 R 中是否有一个包或函数可以接受文本并 html 像上面的网络工具一样对其进行编码?
这是一个将非 ascii 字符编码为 HTML 实体的函数。
entity_encode <- function(x) {
cp <- utf8ToInt(x)
rr <- vector("character", length(cp))
ucp <- cp>128
rr[ucp] <- paste0("&#", as.character(cp[ucp]), ";")
rr[!ucp] <- sapply(cp[!ucp], function(z) rawToChar(as.raw(z)))
paste0(rr, collapse="")
}
这个returns
[1] "If both #AvengersEndgame and #Joker are nominated for Best Picture, it will be Marvel vs DC for the first time in a Best Picture race. I think both films deserve the nod, but the Twitter discourse leading up to the ceremony will be 🔥 🔥 🔥"
供您输入,但这些编码似乎是等效的。