计算依赖于 R 数据帧中其他变量的变量的频率

Question

df <- data.frame(samples = c('45fe.K2','45fe.K2','45fe.K2','45hi.K1','45hi.K1'),source = c('f','f','o','o','f'))
df
  samples   sou
1 45fe.K2      f
2 45fe.K2      f
3 45fe.K2      o
4 45hi.K1      o
5 45hi.K1      f

我想数一数 samples 中有多少来自 sou f 或 o.

结果应该是这样的

samples      sou count
1 45fe.K2      f 2
3 45fe.K2      o 1
4 45hi.K1      o 1
5 45hi.K1      f 1

我试过了

df <- df  %>%
  group_by(sou) %>%
  mutate(count = n_distinct(samples)) %>%
  ungroup()

df <- within(df, { count <- ave(sou, samples, FUN=function(x) length(unique(x)))})

df$count <- ave(as.integer(df$samples), df$sou, FUN = function(x) length(unique(x)))

df$count <- with(df, ave(samples,sou, FUN = function(x) length(unique(x))))

所有这些只计算独特的 samples（即 2）或独特的数量 sou（即 2）。但是我想知道独特样本中有多少个独特的苏。

Answer 1

用 summarise() 和 n() 试试这个 dplyr 解决方案：

library(dplyr)
df %>% group_by(samples,source) %>% summarise(N=n())

输出：

# A tibble: 4 x 3
# Groups:   samples [2]
  samples source     N
  <chr>   <chr>  <int>
1 45fe.K2 f          2
2 45fe.K2 o          1
3 45hi.K1 k          1
4 45hi.K1 o          1

base R 解决方案是创建一个指标变量 N，然后 aggregate():

#Data
df$N <- 1
#Code
aggregate(N~samples+source,df,sum)

输出：

  samples source N
1 45fe.K2      f 2
2 45hi.K1      k 1
3 45fe.K2      o 1
4 45hi.K1      o 1

计算依赖于 R 数据帧中其他变量的变量的频率

count frequency of variable dependent on other variable in an R dataframe

r

count

word

dataframe