如何使用 R 将频率转换为文本?
How to convert frequency into text by using R?
我有这样的数据框(ID,频率 A B C D E)
ID A B C D E
1 5 3 2 1 0
2 3 2 2 1 0
3 4 2 1 1 1
我想将此数据框转换为这样的基于测试的文档(ID 及其频率 ABCDE 作为单个列中的单词)。然后我可能会用LDA算法为每个ID识别热点
ID Text
1 "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
2 "A" "A" "A" "B" "B" "C" "C" "D"
3 "A" "A" "A" "A" "B" "B" "C" "D" "E"
您可以像这样使用 apply
和 rep
:
apply(df[-1], 1, function(i) rep(names(df)[-1], i))
对于每一行,apply
向 rep
函数提供重复每个变量名称的次数。 returns 向量列表:
[[1]]
[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
[[2]]
[1] "A" "A" "A" "B" "B" "C" "C" "D"
[[3]]
[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
其中每个列表元素都是 data.frame 的一行。
数据
df <- read.table(header=T, text="ID A B C D E
1 5 3 2 1 0
2 3 2 2 1 0
3 4 2 1 1 1")
我们可以使用data.table
library(data.table)
DT <- setDT(df1)[,.(list(rep(names(df1)[-1], unlist(.SD)))) ,ID]
DT$V1
#[[1]]
#[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
#[[2]]
#[1] "A" "A" "A" "B" "B" "C" "C" "D"
#[[3]]
#[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
或者 base R
选项是 split
lst <- lapply(split(df1[-1], df1$ID), rep, x=names(df1)[-1])
lst
#$`1`
#[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
#$`2`
#[1] "A" "A" "A" "B" "B" "C" "C" "D"
#$`3`
#[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
如果我们想将 'lst' 写入 csv 文件,一种选择是将 list
转换为 data.frame
,方法是在末尾附加 NA
以增加长度转换为 data.frame
时相等(因为 data.frame
是具有相等长度(列)的 list
)
res <- do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
或者使用 stringi
中的便捷函数
library(stringi)
res <- stri_list2matrix(lst, byrow=TRUE)
然后使用write.csv
write.csv(res, "yourdata.csv", quote=FALSE, row.names = FALSE)
我有这样的数据框(ID,频率 A B C D E)
ID A B C D E
1 5 3 2 1 0
2 3 2 2 1 0
3 4 2 1 1 1
我想将此数据框转换为这样的基于测试的文档(ID 及其频率 ABCDE 作为单个列中的单词)。然后我可能会用LDA算法为每个ID识别热点
ID Text
1 "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
2 "A" "A" "A" "B" "B" "C" "C" "D"
3 "A" "A" "A" "A" "B" "B" "C" "D" "E"
您可以像这样使用 apply
和 rep
:
apply(df[-1], 1, function(i) rep(names(df)[-1], i))
对于每一行,apply
向 rep
函数提供重复每个变量名称的次数。 returns 向量列表:
[[1]]
[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
[[2]]
[1] "A" "A" "A" "B" "B" "C" "C" "D"
[[3]]
[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
其中每个列表元素都是 data.frame 的一行。
数据
df <- read.table(header=T, text="ID A B C D E
1 5 3 2 1 0
2 3 2 2 1 0
3 4 2 1 1 1")
我们可以使用data.table
library(data.table)
DT <- setDT(df1)[,.(list(rep(names(df1)[-1], unlist(.SD)))) ,ID]
DT$V1
#[[1]]
#[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
#[[2]]
#[1] "A" "A" "A" "B" "B" "C" "C" "D"
#[[3]]
#[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
或者 base R
选项是 split
lst <- lapply(split(df1[-1], df1$ID), rep, x=names(df1)[-1])
lst
#$`1`
#[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
#$`2`
#[1] "A" "A" "A" "B" "B" "C" "C" "D"
#$`3`
#[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
如果我们想将 'lst' 写入 csv 文件,一种选择是将 list
转换为 data.frame
,方法是在末尾附加 NA
以增加长度转换为 data.frame
时相等(因为 data.frame
是具有相等长度(列)的 list
)
res <- do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
或者使用 stringi
library(stringi)
res <- stri_list2matrix(lst, byrow=TRUE)
然后使用write.csv
write.csv(res, "yourdata.csv", quote=FALSE, row.names = FALSE)