ngram 文本作为 R 中的单独列
ngram text to be as separate column in R
我从 ngram 获得了几个文本的列表,并想将其作为列添加到原始数据表中。
> prep_test
prep_test
1: Women Athletic,Athletic Apparel,Apparel Pants,Pants Tights,Tights Leggings
2: Beauty Makeup,Makeup Face
3: Beauty Makeup,Makeup Face
4: Electronics Cell,Cell Phones,Phones Accessories,Accessories Cases,Cases Covers,Covers Skins
5: Women Shoes,Shoes Boots
6: Men Men,Men s,s Accessories,Accessories Belts
7: Electronics Cell,Cell Phones,Phones Accessories,Accessories Cell,Cell Phones,Phones Smartphones
8: Women Tops,Tops Blouses,Blouses Other
9: Women Athletic,Athletic Apparel,Apparel Pants,Pants Tights,Tights Leggings
10: Home Home,Home DÃ,DÃ cor,cor Home,Home Fragrance
str(prep_test)
Classes ‘data.table’ and 'data.frame': 10 obs. of 1 variable:
$ prep_test:List of 10
..$ : chr "Women Athletic" "Athletic Apparel" "Apparel Pants" "Pants Tights" ...
..$ : chr "Beauty Makeup" "Makeup Face"
..$ : chr "Beauty Makeup" "Makeup Face"
..$ : chr "Electronics Cell" "Cell Phones" "Phones Accessories" "Accessories Cases" ...
..$ : chr "Women Shoes" "Shoes Boots"
..$ : chr "Men Men" "Men s" "s Accessories" "Accessories Belts"
..$ : chr "Electronics Cell" "Cell Phones" "Phones Accessories" "Accessories Cell" ...
..$ : chr "Women Tops" "Tops Blouses" "Blouses Other"
..$ : chr "Women Athletic" "Athletic Apparel" "Apparel Pants" "Pants Tights" ...
..$ : chr "Home Home" "Home DÃ" "DÃ cor" "cor Home" ...
- attr(*, ".internal.selfref")=<externalptr>
为列生成 n-gram 的当前代码
bigram_fun <- function(y){
y <- gsub("[[:punct:][:blank:]]+", " ", y)
y <- ngram_asweka(y, min=2, max=2)
#y <- str_split_fixed(y, ",", n=Inf)
#y <- unlist(y)
return(y)
}
prep_test <- all[1:10, 9]
prep_test <- apply(prep_test, 1, bigram_fun)
prep_test <- data.table(prep_test)
prep_test
放在这里
> dput(prep_test)
list(c("Women Athletic", "Athletic Apparel", "Apparel Pants",
"Pants Tights", "Tights Leggings"), c("Beauty Makeup", "Makeup Face"
), c("Beauty Makeup", "Makeup Face"), c("Electronics Cell", "Cell Phones",
"Phones Accessories", "Accessories Cases", "Cases Covers", "Covers Skins"
), c("Women Shoes", "Shoes Boots"), c("Men Men", "Men s", "s Accessories",
"Accessories Belts"), c("Electronics Cell", "Cell Phones", "Phones Accessories",
"Accessories Cell", "Cell Phones", "Phones Smartphones"), c("Women Tops",
"Tops Blouses", "Blouses Other"), c("Women Athletic", "Athletic Apparel",
"Apparel Pants", "Pants Tights", "Tights Leggings"), c("Home Home",
"Home DÃ", "DÃ cor", "cor Home", "Home Fragrance"))
期望的结果
Bigram 1 Bigram 2 Bigram 3 Bigram 4 ...
"Women Athletic" "Athletic Apparel" "Apparel Pants" "Pants Tights"...
"Beauty Makeup" "Makeup Face" NA NA ...
"Beauty Makeup" "Makeup Face" NA NA ...
"Electronics Cell" "Cell Phones" "Phones Accessories" "Accessories Cases"
"Women Shoes" "Shoes Boots" NA NA
感谢任何答案,对于这里作为新手提出的糟糕问题感到抱歉
这应该有效:
library(plyr)
df = rbind.fill(lapply(mylist,function(x) {as.data.frame(t(x))}))
colnames(df) = sapply(seq(1:ncol(df)),function(x) {paste0("Bigram ",x)})
输出:
Bigram 1 Bigram 2 Bigram 3 Bigram 4 Bigram 5 Bigram 6
1 Women Athletic Athletic Apparel Apparel Pants Pants Tights Tights Leggings <NA>
2 Beauty Makeup Makeup Face <NA> <NA> <NA> <NA>
3 Beauty Makeup Makeup Face <NA> <NA> <NA> <NA>
4 Electronics Cell Cell Phones Phones Accessories Accessories Cases Cases Covers Covers Skins
5 Women Shoes Shoes Boots <NA> <NA> <NA> <NA>
6 Men Men Men s s Accessories Accessories Belts <NA> <NA>
7 Electronics Cell Cell Phones Phones Accessories Accessories Cell Cell Phones Phones Smartphones
8 Women Tops Tops Blouses Blouses Other <NA> <NA> <NA>
9 Women Athletic Athletic Apparel Apparel Pants Pants Tights Tights Leggings <NA>
10 Home Home Home DÃ DÃ cor cor Home Home Fragrance <NA>
希望对您有所帮助!
我们可以将 bigrams 转换为数据帧,绑定到熔化数据帧,然后转换为宽格式整齐的数据文件,如下所示。
theBigrams <- list(c("Women Athletic", "Athletic Apparel", "Apparel Pants",
"Pants Tights", "Tights Leggings"), c("Beauty Makeup", "Makeup Face"),
c("Beauty Makeup", "Makeup Face"), c("Electronics Cell", "Cell Phones",
"Phones Accessories", "Accessories Cases", "Cases Covers", "Covers Skins"
), c("Women Shoes", "Shoes Boots"), c("Men Men", "Men s", "s Accessories",
"Accessories Belts"), c("Electronics Cell", "Cell Phones", "Phones Accessories",
"Accessories Cell", "Cell Phones", "Phones Smartphones"), c("Women Tops",
"Tops Blouses", "Blouses Other"), c("Women Athletic", "Athletic Apparel",
"Apparel Pants", "Pants Tights", "Tights Leggings"), c("Home Home",
"Home DÃ", "DÃ cor", "cor Home", "Home Fragrance"))
meltedBigrams <- do.call(rbind,lapply(seq_along(theBigrams),function(i) {
x <- theBigrams[[i]]
bigram <- 1:length(x)
id <- rep(i,length(x))
data.frame(id,bigram,value=x,stringsAsFactors=FALSE)
}))
library(reshape2)
castData <- dcast(meltedBigrams,id ~ bigram )
castData
...输出:
> castData
id 1 2 3 4 5 6
1 1 Women Athletic Athletic Apparel Apparel Pants Pants Tights Tights Leggings <NA>
2 2 Beauty Makeup Makeup Face <NA> <NA> <NA> <NA>
3 3 Beauty Makeup Makeup Face <NA> <NA> <NA> <NA>
4 4 Electronics Cell Cell Phones Phones Accessories Accessories Cases Cases Covers Covers Skins
5 5 Women Shoes Shoes Boots <NA> <NA> <NA> <NA>
6 6 Men Men Men s s Accessories Accessories Belts <NA> <NA>
7 7 Electronics Cell Cell Phones Phones Accessories Accessories Cell Cell Phones Phones Smartphones
8 8 Women Tops Tops Blouses Blouses Other <NA> <NA> <NA>
9 9 Women Athletic Athletic Apparel Apparel Pants Pants Tights Tights Leggings <NA>
10 10 Home Home Home DÃ DÃ cor cor Home Home Fragrance <NA>
>
我从 ngram 获得了几个文本的列表,并想将其作为列添加到原始数据表中。
> prep_test
prep_test
1: Women Athletic,Athletic Apparel,Apparel Pants,Pants Tights,Tights Leggings
2: Beauty Makeup,Makeup Face
3: Beauty Makeup,Makeup Face
4: Electronics Cell,Cell Phones,Phones Accessories,Accessories Cases,Cases Covers,Covers Skins
5: Women Shoes,Shoes Boots
6: Men Men,Men s,s Accessories,Accessories Belts
7: Electronics Cell,Cell Phones,Phones Accessories,Accessories Cell,Cell Phones,Phones Smartphones
8: Women Tops,Tops Blouses,Blouses Other
9: Women Athletic,Athletic Apparel,Apparel Pants,Pants Tights,Tights Leggings
10: Home Home,Home DÃ,DÃ cor,cor Home,Home Fragrance
str(prep_test)
Classes ‘data.table’ and 'data.frame': 10 obs. of 1 variable:
$ prep_test:List of 10
..$ : chr "Women Athletic" "Athletic Apparel" "Apparel Pants" "Pants Tights" ...
..$ : chr "Beauty Makeup" "Makeup Face"
..$ : chr "Beauty Makeup" "Makeup Face"
..$ : chr "Electronics Cell" "Cell Phones" "Phones Accessories" "Accessories Cases" ...
..$ : chr "Women Shoes" "Shoes Boots"
..$ : chr "Men Men" "Men s" "s Accessories" "Accessories Belts"
..$ : chr "Electronics Cell" "Cell Phones" "Phones Accessories" "Accessories Cell" ...
..$ : chr "Women Tops" "Tops Blouses" "Blouses Other"
..$ : chr "Women Athletic" "Athletic Apparel" "Apparel Pants" "Pants Tights" ...
..$ : chr "Home Home" "Home DÃ" "DÃ cor" "cor Home" ...
- attr(*, ".internal.selfref")=<externalptr>
为列生成 n-gram 的当前代码
bigram_fun <- function(y){
y <- gsub("[[:punct:][:blank:]]+", " ", y)
y <- ngram_asweka(y, min=2, max=2)
#y <- str_split_fixed(y, ",", n=Inf)
#y <- unlist(y)
return(y)
}
prep_test <- all[1:10, 9]
prep_test <- apply(prep_test, 1, bigram_fun)
prep_test <- data.table(prep_test)
prep_test
放在这里
> dput(prep_test)
list(c("Women Athletic", "Athletic Apparel", "Apparel Pants",
"Pants Tights", "Tights Leggings"), c("Beauty Makeup", "Makeup Face"
), c("Beauty Makeup", "Makeup Face"), c("Electronics Cell", "Cell Phones",
"Phones Accessories", "Accessories Cases", "Cases Covers", "Covers Skins"
), c("Women Shoes", "Shoes Boots"), c("Men Men", "Men s", "s Accessories",
"Accessories Belts"), c("Electronics Cell", "Cell Phones", "Phones Accessories",
"Accessories Cell", "Cell Phones", "Phones Smartphones"), c("Women Tops",
"Tops Blouses", "Blouses Other"), c("Women Athletic", "Athletic Apparel",
"Apparel Pants", "Pants Tights", "Tights Leggings"), c("Home Home",
"Home DÃ", "DÃ cor", "cor Home", "Home Fragrance"))
期望的结果
Bigram 1 Bigram 2 Bigram 3 Bigram 4 ...
"Women Athletic" "Athletic Apparel" "Apparel Pants" "Pants Tights"...
"Beauty Makeup" "Makeup Face" NA NA ...
"Beauty Makeup" "Makeup Face" NA NA ...
"Electronics Cell" "Cell Phones" "Phones Accessories" "Accessories Cases"
"Women Shoes" "Shoes Boots" NA NA
感谢任何答案,对于这里作为新手提出的糟糕问题感到抱歉
这应该有效:
library(plyr)
df = rbind.fill(lapply(mylist,function(x) {as.data.frame(t(x))}))
colnames(df) = sapply(seq(1:ncol(df)),function(x) {paste0("Bigram ",x)})
输出:
Bigram 1 Bigram 2 Bigram 3 Bigram 4 Bigram 5 Bigram 6
1 Women Athletic Athletic Apparel Apparel Pants Pants Tights Tights Leggings <NA>
2 Beauty Makeup Makeup Face <NA> <NA> <NA> <NA>
3 Beauty Makeup Makeup Face <NA> <NA> <NA> <NA>
4 Electronics Cell Cell Phones Phones Accessories Accessories Cases Cases Covers Covers Skins
5 Women Shoes Shoes Boots <NA> <NA> <NA> <NA>
6 Men Men Men s s Accessories Accessories Belts <NA> <NA>
7 Electronics Cell Cell Phones Phones Accessories Accessories Cell Cell Phones Phones Smartphones
8 Women Tops Tops Blouses Blouses Other <NA> <NA> <NA>
9 Women Athletic Athletic Apparel Apparel Pants Pants Tights Tights Leggings <NA>
10 Home Home Home DÃ DÃ cor cor Home Home Fragrance <NA>
希望对您有所帮助!
我们可以将 bigrams 转换为数据帧,绑定到熔化数据帧,然后转换为宽格式整齐的数据文件,如下所示。
theBigrams <- list(c("Women Athletic", "Athletic Apparel", "Apparel Pants",
"Pants Tights", "Tights Leggings"), c("Beauty Makeup", "Makeup Face"),
c("Beauty Makeup", "Makeup Face"), c("Electronics Cell", "Cell Phones",
"Phones Accessories", "Accessories Cases", "Cases Covers", "Covers Skins"
), c("Women Shoes", "Shoes Boots"), c("Men Men", "Men s", "s Accessories",
"Accessories Belts"), c("Electronics Cell", "Cell Phones", "Phones Accessories",
"Accessories Cell", "Cell Phones", "Phones Smartphones"), c("Women Tops",
"Tops Blouses", "Blouses Other"), c("Women Athletic", "Athletic Apparel",
"Apparel Pants", "Pants Tights", "Tights Leggings"), c("Home Home",
"Home DÃ", "DÃ cor", "cor Home", "Home Fragrance"))
meltedBigrams <- do.call(rbind,lapply(seq_along(theBigrams),function(i) {
x <- theBigrams[[i]]
bigram <- 1:length(x)
id <- rep(i,length(x))
data.frame(id,bigram,value=x,stringsAsFactors=FALSE)
}))
library(reshape2)
castData <- dcast(meltedBigrams,id ~ bigram )
castData
...输出:
> castData
id 1 2 3 4 5 6
1 1 Women Athletic Athletic Apparel Apparel Pants Pants Tights Tights Leggings <NA>
2 2 Beauty Makeup Makeup Face <NA> <NA> <NA> <NA>
3 3 Beauty Makeup Makeup Face <NA> <NA> <NA> <NA>
4 4 Electronics Cell Cell Phones Phones Accessories Accessories Cases Cases Covers Covers Skins
5 5 Women Shoes Shoes Boots <NA> <NA> <NA> <NA>
6 6 Men Men Men s s Accessories Accessories Belts <NA> <NA>
7 7 Electronics Cell Cell Phones Phones Accessories Accessories Cell Cell Phones Phones Smartphones
8 8 Women Tops Tops Blouses Blouses Other <NA> <NA> <NA>
9 9 Women Athletic Athletic Apparel Apparel Pants Pants Tights Tights Leggings <NA>
10 10 Home Home Home DÃ DÃ cor cor Home Home Fragrance <NA>
>