使用 ordered() 对 R 中的因子水平进行排序时,观察结果变为 NA
Observations becoming NA when ordering levels of factors in R with ordered()
你好有一个包含 4 个变量的纵向数据框 p
,如下所示:
> head(p)
date.1 County.x providers beds price
1 Jan/2011 essex 258 5545 251593.4
2 Jan/2011 greater manchester 108 3259 152987.7
3 Jan/2011 kent 301 7191 231985.7
4 Jan/2011 tyne and wear 103 2649 143196.6
5 Jan/2011 west midlands 262 6819 149323.9
6 Jan/2012 essex 2 27 231398.5
我的变量结构如下:
'data.frame': 259 obs. of 5 variables:
$ date.1 : Factor w/ 66 levels "Apr/2011","Apr/2012",..: 23 23 23 23 23 24 24 24 25 25 ...
$ County.x : Factor w/ 73 levels "avon","bedfordshire",..: 22 24 32 65 67 22 32 67 22 32 ...
$ providers: int 258 108 301 103 262 2 9 2 1 1 ...
$ beds : int 5545 3259 7191 2649 6819 27 185 24 70 13 ...
$ price : num 251593 152988 231986 143197 149324 ...
我想按时间顺序date.1
。在应用 ordered()
之前,此变量不包含 NA
观察值。
> summary(is.na(p$date.1))
Mode FALSE NA's
logical 259 0
但是,一旦我应用我的函数来排序对应于 date.1
的级别:
p$date.1 = with(p, ordered(date.1, levels = c("Jun/2010", "Jul/2010",
"Aug/2010", "Sep/2010", "Oct/2010", "Nov/2010", "Dec/2010", "Jan/2011", "Feb/2011",
"Mar/2011","Apr/2011", "May/2011", "Jun/2011", "Jul/2011", "Aug/2011", "Sep/2011",
"Oct/2011", "Nov/2011", "Dec/2011" ,"Jan/2012", "Feb/2012" ,"Mar/2012" ,"Apr/2012",
"May/2012", "Jun/2012", "Jul/2012", "Aug/2012", "Sep/2012", "Oct/2012", "Nov/2012",
"Dec/2012", "Jan/2013", "Feb/2013", "Mar/2013", "Apr/2013", "May/2013",
"Jun/2013", "Jul/2013", "Aug/2013", "Sep/2013", "Oct/2013", "Nov/2013",
"Dec/2013", "Jan/2014",
"Feb/2014", "Mar/2014", "Apr/2014", "May/2014", "Jun/2014", "Jul/2014" ,"Aug/2014",
"Sep/2014", "Oct/2014", "Nov/2014", "Dec/2014", "Jan/2015", "Feb/2015", "Mar/2015",
"Apr/2015","May/2015", "Jun/2015" ,"Jul/2015" ,"Aug/2015", "Sep/2015", "Oct/2015",
"Nov/2015")))
看来我漏掉了一些观察结果。
> summary(is.na(p$date.1))
Mode FALSE TRUE NA's
logical 250 9 0
有人在使用ordered()
时遇到过这个问题吗?或者,是否有任何其他可能的解决方案来按时间顺序对我的观察结果进行分组?
您的 p$date.1
可能有一个不符合任何级别。试试这个 ord.mon
作为关卡。
ord.mon <- do.call(paste, c(expand.grid(month.abb, 2010:2015), sep = "/"))
那你可以试试看这两者有没有不匹配的地方
p$date.1 %in% ord.mon
最后,您还可以在将 date.1
列转换为 Date
之后对数据框进行排序(注意您必须事先添加实际日期)
p <- p[order(as.Date(paste0("01/", p$date.1), "%d/%b/%Y")), ]
你好有一个包含 4 个变量的纵向数据框 p
,如下所示:
> head(p)
date.1 County.x providers beds price
1 Jan/2011 essex 258 5545 251593.4
2 Jan/2011 greater manchester 108 3259 152987.7
3 Jan/2011 kent 301 7191 231985.7
4 Jan/2011 tyne and wear 103 2649 143196.6
5 Jan/2011 west midlands 262 6819 149323.9
6 Jan/2012 essex 2 27 231398.5
我的变量结构如下:
'data.frame': 259 obs. of 5 variables:
$ date.1 : Factor w/ 66 levels "Apr/2011","Apr/2012",..: 23 23 23 23 23 24 24 24 25 25 ...
$ County.x : Factor w/ 73 levels "avon","bedfordshire",..: 22 24 32 65 67 22 32 67 22 32 ...
$ providers: int 258 108 301 103 262 2 9 2 1 1 ...
$ beds : int 5545 3259 7191 2649 6819 27 185 24 70 13 ...
$ price : num 251593 152988 231986 143197 149324 ...
我想按时间顺序date.1
。在应用 ordered()
之前,此变量不包含 NA
观察值。
> summary(is.na(p$date.1))
Mode FALSE NA's
logical 259 0
但是,一旦我应用我的函数来排序对应于 date.1
的级别:
p$date.1 = with(p, ordered(date.1, levels = c("Jun/2010", "Jul/2010",
"Aug/2010", "Sep/2010", "Oct/2010", "Nov/2010", "Dec/2010", "Jan/2011", "Feb/2011",
"Mar/2011","Apr/2011", "May/2011", "Jun/2011", "Jul/2011", "Aug/2011", "Sep/2011",
"Oct/2011", "Nov/2011", "Dec/2011" ,"Jan/2012", "Feb/2012" ,"Mar/2012" ,"Apr/2012",
"May/2012", "Jun/2012", "Jul/2012", "Aug/2012", "Sep/2012", "Oct/2012", "Nov/2012",
"Dec/2012", "Jan/2013", "Feb/2013", "Mar/2013", "Apr/2013", "May/2013",
"Jun/2013", "Jul/2013", "Aug/2013", "Sep/2013", "Oct/2013", "Nov/2013",
"Dec/2013", "Jan/2014",
"Feb/2014", "Mar/2014", "Apr/2014", "May/2014", "Jun/2014", "Jul/2014" ,"Aug/2014",
"Sep/2014", "Oct/2014", "Nov/2014", "Dec/2014", "Jan/2015", "Feb/2015", "Mar/2015",
"Apr/2015","May/2015", "Jun/2015" ,"Jul/2015" ,"Aug/2015", "Sep/2015", "Oct/2015",
"Nov/2015")))
看来我漏掉了一些观察结果。
> summary(is.na(p$date.1))
Mode FALSE TRUE NA's
logical 250 9 0
有人在使用ordered()
时遇到过这个问题吗?或者,是否有任何其他可能的解决方案来按时间顺序对我的观察结果进行分组?
您的 p$date.1
可能有一个不符合任何级别。试试这个 ord.mon
作为关卡。
ord.mon <- do.call(paste, c(expand.grid(month.abb, 2010:2015), sep = "/"))
那你可以试试看这两者有没有不匹配的地方
p$date.1 %in% ord.mon
最后,您还可以在将 date.1
列转换为 Date
之后对数据框进行排序(注意您必须事先添加实际日期)
p <- p[order(as.Date(paste0("01/", p$date.1), "%d/%b/%Y")), ]