使用 R 的数据帧中字符串的频率及其 ID
Frequency of strings and their IDs in a dataframe using R
目标是生成文本变量的频率并将相应的 ID 与其相关联。
假设Sample是一个dataframe,如下所示:
Sample <- data.frame(ID = c('1', '2', '3', '4', '5', '6'),
Var = c('How are you',
'Do not go',
'How are you',
'Please go',
'How are you',
'Do not go'))
以下命令生成 Var 列中字符串的频率,如下所示:
as.data.frame(table(unlist(strsplit(tolower(Sample$Var), ', '))))
有没有办法在 table 中一起生成关联的 ID,比如?:
试试这个:
library(dplyr)
#Code
New <- Sample %>% group_by(Var) %>%
summarise(Freq=n(),IDS=toString(ID))
输出:
# A tibble: 3 x 3
Var Freq IDS
<chr> <int> <chr>
1 Do not go 2 2, 6
2 How are you 3 1, 3, 5
3 Please go 1 4
如果您申请,这里还有一个选项data.table
> setDT(Sample)[, .(Freq = .N, ID.asso = list(ID)), keyby = Var]
Var Freq ID.asso
1: Do not go 2 2,6
2: How are you 3 1,3,5
3: Please go 1 4
我们可以使用 dplyr
和 stringr
library(dplyr)
library(stringr)
Sample %>%
group_by(Var) %>%
summarise(Freq = n(), IDS = str_c(ID, collapse=", "))
基础 R 解决方案:
data.frame(do.call(rbind, lapply(with(Sample, split(Sample, Var)), function(x){
with(x, data.frame(Var = unique(Var), Freq = nrow(x), ID = toString(ID)))
}
)
), row.names = NULL, stringsAsFactors = FALSE)
目标是生成文本变量的频率并将相应的 ID 与其相关联。
假设Sample是一个dataframe,如下所示:
Sample <- data.frame(ID = c('1', '2', '3', '4', '5', '6'),
Var = c('How are you',
'Do not go',
'How are you',
'Please go',
'How are you',
'Do not go'))
以下命令生成 Var 列中字符串的频率,如下所示:
as.data.frame(table(unlist(strsplit(tolower(Sample$Var), ', '))))
有没有办法在 table 中一起生成关联的 ID,比如?:
试试这个:
library(dplyr)
#Code
New <- Sample %>% group_by(Var) %>%
summarise(Freq=n(),IDS=toString(ID))
输出:
# A tibble: 3 x 3
Var Freq IDS
<chr> <int> <chr>
1 Do not go 2 2, 6
2 How are you 3 1, 3, 5
3 Please go 1 4
如果您申请,这里还有一个选项data.table
> setDT(Sample)[, .(Freq = .N, ID.asso = list(ID)), keyby = Var]
Var Freq ID.asso
1: Do not go 2 2,6
2: How are you 3 1,3,5
3: Please go 1 4
我们可以使用 dplyr
和 stringr
library(dplyr)
library(stringr)
Sample %>%
group_by(Var) %>%
summarise(Freq = n(), IDS = str_c(ID, collapse=", "))
基础 R 解决方案:
data.frame(do.call(rbind, lapply(with(Sample, split(Sample, Var)), function(x){
with(x, data.frame(Var = unique(Var), Freq = nrow(x), ID = toString(ID)))
}
)
), row.names = NULL, stringsAsFactors = FALSE)