计算 R 中唯一的一组列

count unique set of columns in R

我有一个如下所示的数据框:

   FieldID X2009 X2010 X2011 X2012 X2013 X2014
1     H003     1     1     1     1     1     1
2     H001    NA     1     1     1     1     1
3     H005    NA     1     1     1     1     1
4     H006    NA     1     1     1     1     1
5     H009    NA     1     1     1    NA     1
6     H010    NA     1     1     1    NA     1
7     H002    NA     1     1     1    NA    NA
8     H007    NA     1     1     1    NA    NA
9     H008    NA     1     1    NA     1    NA
10    H004    NA     1     1    NA    NA     1

我尝试计算落入 X2009-X2014 的每个唯一组合的行数。 所以数据框的结果看起来像:

   FieldID X2009 X2010 X2011 X2012 X2013 X2014 row
1     H003     1     1     1     1     1     1 1
2     H001    NA     1     1     1     1     1 3
5     H009    NA     1     1     1    NA     1 2
7     H002    NA     1     1     1    NA    NA 2
9     H008    NA     1     1    NA     1    NA 1
10    H004    NA     1     1    NA    NA     1 1

我尝试执行以下操作:

  tt%>%
  gather(.,Year,value,X2009:X2014)%>%
  mutate(value=ifelse(is.na(value),0,1))%>%
  tidyr::spread(.,Year,value)%>%
  group_by(X2009,X2010,X2011,X2012,X2013,X2014)
  %>%summarise(row=n())

这给了我错误

> Error in n() : This function should not be called directly

length()NROW() 替换 n() 没有帮助。我怎样才能做到这一点?

这里有一个选项:

grps <- names(DF)[-1]          # get the grouping columns

DF %>% 
  group_by_(.dots = grps) %>%
  mutate(row = n()) %>%
  distinct()                   # you could add %>% ungroup() if required

#Source: local data frame [6 x 8]
#Groups: X2009, X2010, X2011, X2012, X2013, X2014
#
#  FieldID X2009 X2010 X2011 X2012 X2013 X2014 row
#1    H003     1     1     1     1     1     1   1
#2    H001    NA     1     1     1     1     1   3
#3    H009    NA     1     1     1    NA     1   2
#4    H002    NA     1     1     1    NA    NA   2
#5    H008    NA     1     1    NA     1    NA   1
#6    H004    NA     1     1    NA    NA     1   1

编辑:

或者没有中间变量:

DF %>% 
    group_by_(.dots = names(.)[-1]) %>%
    mutate(row = n()) %>%
    distinct()