序列中的唯一字符串计数

Question

我正在尝试获取序列中字符串的唯一计数。

例如，

 A<- c('CCE-CRE-DEE-DEE', 'FOE-FOE-GOE-GOE-GOE-ISE', 'ISE-PCE', 'ISE')
 library('stringr')
 B<- str_count(A, "-")
 df<- data.frame(A, B)

我期望输出如下：

C这里是总的多样性，或者说序列中的不同状态，有什么想法或建议吗？我环顾四周，但找不到合理的解决方案。

Answer 1

我会使用 unique:

df$res <- sapply(str_split(A,"-"),function(x) length(unique(x)))
df
                        A B res
1         CCE-CRE-DEE-DEE 3   3
2 FOE-FOE-GOE-GOE-GOE-ISE 5   3
3                 ISE-PCE 1   2
4                     ISE 0   1

我想您所期望的实际上是 CCE-CRE-DEE-DEE 的 3。

序列中的唯一字符串计数

unique string count in a sequence

r

stringr

dplyr