gsub 图形字符的不同短语
gsub different phrases of graphical characters
我有一个包含多行字符的数据框,例如:
hello my name is sam <U+ab93>
hi i love fast cars <U+e>
my favourite colour is yellow <U+E><U+c><U+60>
如何删除此数据框中没有意义的所有术语?
我尝试了 apply(document, 1, function(x) gsub("<[:graph:]>", "", x))
但它不起作用。
对于
document = c("hello my name is sam <U+ab93>",
"hi i love fast cars <U+e>",
"my favourite colour is yellow <U+E><U+c><U+60>")
会是,
gsub("<[[:graph:]]+>", "", document )
[:graph:]
不是有效的 POSIX 字符 class.
或者,您也可以尝试
gsub("<[^>]*>", "", document)
使用 gsub
text = c("hello my name is sam <U+ab93>" , "hi i love fast cars <U+e>" ,
"my favourite colour is yellow <U+E><U+c><U+60>")
df <- data.frame(DOC = text)
df$DOC <- gsub(df$DOC , pattern = "<.*>", replacement = "")
我有一个包含多行字符的数据框,例如:
hello my name is sam <U+ab93>
hi i love fast cars <U+e>
my favourite colour is yellow <U+E><U+c><U+60>
如何删除此数据框中没有意义的所有术语?
我尝试了 apply(document, 1, function(x) gsub("<[:graph:]>", "", x))
但它不起作用。
对于
document = c("hello my name is sam <U+ab93>",
"hi i love fast cars <U+e>",
"my favourite colour is yellow <U+E><U+c><U+60>")
会是,
gsub("<[[:graph:]]+>", "", document )
[:graph:]
不是有效的 POSIX 字符 class.
或者,您也可以尝试
gsub("<[^>]*>", "", document)
使用 gsub
text = c("hello my name is sam <U+ab93>" , "hi i love fast cars <U+e>" ,
"my favourite colour is yellow <U+E><U+c><U+60>")
df <- data.frame(DOC = text)
df$DOC <- gsub(df$DOC , pattern = "<.*>", replacement = "")