将不同长度的字符串列表组合成一个数据框
Combine lists of strings of different lengths to a data frame
我有一个文本数据需要更正英文错误。
我想要一个table的输出,第一列是错误,第二列是所有改正建议。
例如:
sentence <- "This is a word but thhis isn't and this onne as well. I need hellp"
library(hunspell)
mistakesList <- hunspell(essay)[[1]]
suggestionsList <- hunspell_suggest(mistakesList)
我试过了
do.call(rbind, Map(data.frame, A=mistakesList, B=suggestionsList))
但是 returns
A B
thhis thhis this
onne.1 onne none
onne.2 onne one
onne.3 onne tonne
onne.4 onne Donne
onne.5 onne once
onne.6 onne Anne
onne.7 onne Yvonne
hellp.1 hellp hello
hellp.2 hellp hell
hellp.3 hellp help
hellp.4 hellp hell p
我想要一个 returns :
的数据框
mistakes suggestions
thhis this
onne none one tonne Donne once Anne Yvonne
hellp hello hell help hell p
我们可以保持 mistakesList
不变,并使用 toString
.
将 suggestionsList
转换为逗号分隔值
data.frame(mistakes = mistakesList, suggestions = sapply(suggestionsList, toString))
# mistakes suggestions
#1 thhis this
#2 onne none, one, tonne, Donne, once, Anne, neon
#3 hellp hello, hell, help, hell p
这有效:
X1 <- do.call(rbind, Map(data.frame, mistakes = mistakesList, suggestions = suggestionsList))
X1
library(plyr)
X2 <- ddply(X1, .(mistakes),summarize,
suggestions = paste(suggestions, collapse=", "))
X2
mistakes suggestions
1 thhis this
2 onne none, one, tonne, Donne, once, Anne, Yvonne
3 hellp hello, hell, help, hell p
我有一个文本数据需要更正英文错误。
我想要一个table的输出,第一列是错误,第二列是所有改正建议。
例如:
sentence <- "This is a word but thhis isn't and this onne as well. I need hellp"
library(hunspell)
mistakesList <- hunspell(essay)[[1]]
suggestionsList <- hunspell_suggest(mistakesList)
我试过了
do.call(rbind, Map(data.frame, A=mistakesList, B=suggestionsList))
但是 returns
A B
thhis thhis this
onne.1 onne none
onne.2 onne one
onne.3 onne tonne
onne.4 onne Donne
onne.5 onne once
onne.6 onne Anne
onne.7 onne Yvonne
hellp.1 hellp hello
hellp.2 hellp hell
hellp.3 hellp help
hellp.4 hellp hell p
我想要一个 returns :
的数据框mistakes suggestions
thhis this
onne none one tonne Donne once Anne Yvonne
hellp hello hell help hell p
我们可以保持 mistakesList
不变,并使用 toString
.
suggestionsList
转换为逗号分隔值
data.frame(mistakes = mistakesList, suggestions = sapply(suggestionsList, toString))
# mistakes suggestions
#1 thhis this
#2 onne none, one, tonne, Donne, once, Anne, neon
#3 hellp hello, hell, help, hell p
这有效:
X1 <- do.call(rbind, Map(data.frame, mistakes = mistakesList, suggestions = suggestionsList))
X1
library(plyr)
X2 <- ddply(X1, .(mistakes),summarize,
suggestions = paste(suggestions, collapse=", "))
X2
mistakes suggestions
1 thhis this
2 onne none, one, tonne, Donne, once, Anne, Yvonne
3 hellp hello, hell, help, hell p