计算 R 中唯一的一组列
count unique set of columns in R
我有一个如下所示的数据框:
FieldID X2009 X2010 X2011 X2012 X2013 X2014
1 H003 1 1 1 1 1 1
2 H001 NA 1 1 1 1 1
3 H005 NA 1 1 1 1 1
4 H006 NA 1 1 1 1 1
5 H009 NA 1 1 1 NA 1
6 H010 NA 1 1 1 NA 1
7 H002 NA 1 1 1 NA NA
8 H007 NA 1 1 1 NA NA
9 H008 NA 1 1 NA 1 NA
10 H004 NA 1 1 NA NA 1
我尝试计算落入 X2009-X2014 的每个唯一组合的行数。
所以数据框的结果看起来像:
FieldID X2009 X2010 X2011 X2012 X2013 X2014 row
1 H003 1 1 1 1 1 1 1
2 H001 NA 1 1 1 1 1 3
5 H009 NA 1 1 1 NA 1 2
7 H002 NA 1 1 1 NA NA 2
9 H008 NA 1 1 NA 1 NA 1
10 H004 NA 1 1 NA NA 1 1
我尝试执行以下操作:
tt%>%
gather(.,Year,value,X2009:X2014)%>%
mutate(value=ifelse(is.na(value),0,1))%>%
tidyr::spread(.,Year,value)%>%
group_by(X2009,X2010,X2011,X2012,X2013,X2014)
%>%summarise(row=n())
这给了我错误
> Error in n() : This function should not be called directly
用 length()
或 NROW()
替换 n()
没有帮助。我怎样才能做到这一点?
这里有一个选项:
grps <- names(DF)[-1] # get the grouping columns
DF %>%
group_by_(.dots = grps) %>%
mutate(row = n()) %>%
distinct() # you could add %>% ungroup() if required
#Source: local data frame [6 x 8]
#Groups: X2009, X2010, X2011, X2012, X2013, X2014
#
# FieldID X2009 X2010 X2011 X2012 X2013 X2014 row
#1 H003 1 1 1 1 1 1 1
#2 H001 NA 1 1 1 1 1 3
#3 H009 NA 1 1 1 NA 1 2
#4 H002 NA 1 1 1 NA NA 2
#5 H008 NA 1 1 NA 1 NA 1
#6 H004 NA 1 1 NA NA 1 1
编辑:
或者没有中间变量:
DF %>%
group_by_(.dots = names(.)[-1]) %>%
mutate(row = n()) %>%
distinct()
我有一个如下所示的数据框:
FieldID X2009 X2010 X2011 X2012 X2013 X2014
1 H003 1 1 1 1 1 1
2 H001 NA 1 1 1 1 1
3 H005 NA 1 1 1 1 1
4 H006 NA 1 1 1 1 1
5 H009 NA 1 1 1 NA 1
6 H010 NA 1 1 1 NA 1
7 H002 NA 1 1 1 NA NA
8 H007 NA 1 1 1 NA NA
9 H008 NA 1 1 NA 1 NA
10 H004 NA 1 1 NA NA 1
我尝试计算落入 X2009-X2014 的每个唯一组合的行数。 所以数据框的结果看起来像:
FieldID X2009 X2010 X2011 X2012 X2013 X2014 row
1 H003 1 1 1 1 1 1 1
2 H001 NA 1 1 1 1 1 3
5 H009 NA 1 1 1 NA 1 2
7 H002 NA 1 1 1 NA NA 2
9 H008 NA 1 1 NA 1 NA 1
10 H004 NA 1 1 NA NA 1 1
我尝试执行以下操作:
tt%>%
gather(.,Year,value,X2009:X2014)%>%
mutate(value=ifelse(is.na(value),0,1))%>%
tidyr::spread(.,Year,value)%>%
group_by(X2009,X2010,X2011,X2012,X2013,X2014)
%>%summarise(row=n())
这给了我错误
> Error in n() : This function should not be called directly
用 length()
或 NROW()
替换 n()
没有帮助。我怎样才能做到这一点?
这里有一个选项:
grps <- names(DF)[-1] # get the grouping columns
DF %>%
group_by_(.dots = grps) %>%
mutate(row = n()) %>%
distinct() # you could add %>% ungroup() if required
#Source: local data frame [6 x 8]
#Groups: X2009, X2010, X2011, X2012, X2013, X2014
#
# FieldID X2009 X2010 X2011 X2012 X2013 X2014 row
#1 H003 1 1 1 1 1 1 1
#2 H001 NA 1 1 1 1 1 3
#3 H009 NA 1 1 1 NA 1 2
#4 H002 NA 1 1 1 NA NA 2
#5 H008 NA 1 1 NA 1 NA 1
#6 H004 NA 1 1 NA NA 1 1
编辑:
或者没有中间变量:
DF %>%
group_by_(.dots = names(.)[-1]) %>%
mutate(row = n()) %>%
distinct()