跨字符串查找值的频率

Finding the Frequency of Values Across Character Strings

我有三个不同长度的不同字符向量。有些具有重叠的值,有些具有独特的值。这些值在每个向量中出现的次数不同。例如,

A <- c("A", "A", "B")
B <- c("A", "B", "C", "D")
C <- c("B", "A", "C", "E", "F")

我想知道

我该怎么做?我找不到执行此操作的 stringr 命令,而且我是使用字符串的新手。

#Unique items
> unique(A)
[1] "A" "B"

#count of unique items
> length(unique(A))
[1] 2

#frequency of each unique value
df_A <- data.frame(A =A) #data frame prepared

> dplyr::mutate(dplyr::group_by(df_A, A), freq = n())
# A tibble: 3 x 2
# Groups:   A [2]
  A      freq
  <chr> <int>
1 A         2
2 A         2
3 B         1

#filter
df_A <- dplyr::mutate(dplyr::group_by(df_A, A), freq = n())
df_A$A[df_A$freq < 2]

> df_A$A[df_A$freq < 2]
[1] "B"

编辑

#unique items across all lists
> unique(c(A, B, C))
[1] "A" "B" "C" "D" "E" "F"

#Freq across all lists
tabulate(as.factor(c(A,B,C)))
[1] 4 3 2 1 1 1

#OR

> table(c(A, B, C))

A B C D E F 
4 3 2 1 1 1 

您可以使用以下步骤:

查找唯一元素:

uq <- unique(A)

唯一元素总数:

library(car)
A1 <- recode(A, "'A' = 1; 'B' = 2")
# This will give frequencies for all the elements
names(which(table(A1) == max(table(A1))))
tab <- sort(table(a)) # to sort the result in ascending order of frequency

总共有多少个唯一值。

table(unique(A1))