如何使用多个变量或维度进行扩展

How to expand with multiple variables or dimensions

我想在 R 中扩展三个维度。我想在一个数据框中合并三年中每天的县级信息,该数据框包含所有年份的所有县,包括所有月份和所有日期(例如 31) .问题是并不是每个 county#day 的观测值都可以在使用数据中找到。这是因为此事件并未在特定县的特定日期发生。因此,这些对我来说是零观察。

为了创建我的主文件,我列出了所有县。然后,我想扩展它,以便我对每个县#year#month#day 组合都有一个独特的观察。

我把代码留给你了。我有一个 data.frame 包括县。我会生成年、月和日。到目前为止,我使用了 tidyverse 的扩展。

编辑:

library(tidyverse)

# This is my list of all counties from an official source
counties <- data.frame("county" = c("A", "B" ,"c"))

# This is what I have, the data includes counties (not all),
# for year (not all),
# months (not all)
# and days (not all)

using <- data.frame("county"  = c("A", "A", "A", "B", "B", "B", "B"),
                    "year"  = c(2015,2016,2017,2015,2016,2017,2018),
                    "month" = c(1,2,7,2,3,2,4),
                    "day" = c(1,2,22,3,21,14,5))

# This is my attempt to get at least all county year combinations
county.month <- expand(counties, county, 1:12)

# But I wish I could get all county#year#month#dya combinations

最佳

丹尼尔

我不确定你想要什么作为输出...但我认为你想要 tidyr 的功能:complete 而不是 expand?

例如

using %>% 
    complete(month, nesting(county, year))


# A tibble: 35 x 4
   month county  year   day
   <dbl> <fct>  <dbl> <dbl>
 1     1 A       2015     1
 2     1 A       2016    NA
 3     1 A       2017    NA
 4     1 B       2015    NA
 5     1 B       2016    NA
 6     1 B       2017    NA
 7     1 B       2018    NA
 8     2 A       2015    NA
 9     2 A       2016     2
10     2 A       2017    NA

这种方法应该做你想做的事:所有可能的 county/year/month/day 组合的小标题(假设每个月有 31 天......;))关键是使用因素

library(tidyverse)
counties <- data.frame("county" = c("A", "B" ,"C"), stringsAsFactors = F)
using <- tibble("county"  = c("A", "A", "A", "B", "B", "B", "B"),
                    "year"  = c(2015,2016,2017,2015,2016,2017,2018),
                    "month" = c(1,2,7,2,3,2,4),
                    "day" = c(1,2,22,3,21,14,5))

using %>% 
  mutate_if(is.character, as_factor) %>%
  mutate_if(is.numeric, as.ordered) %>%
  mutate(county = fct_expand(county, counties$county),
         month = fct_expand(month, as.character(1:12)),
         day = fct_expand(day, as.character(1:31))) %>%
  expand(county, year, month, day) %>%
  arrange(year, month, day)

# A tibble: 4,464 x 4
   county year  month day  
   <fct>  <ord> <ord> <ord>
 1 A      2015  1     1    
 2 B      2015  1     1    
 3 c      2015  1     1    
 4 A      2015  1     2    
 5 B      2015  1     2    
 6 c      2015  1     2    
 7 A      2015  1     3    
 8 B      2015  1     3    
 9 c      2015  1     3    
10 A      2015  1     5    
# … with 4,454 more rows

也许您想要的是数据中年份中的所有日期。如果是这种情况,请使用 seq() 函数 by="1 day".

library(tidyverse)
library(lubridate)
counties <- data.frame("county" = c("A", "B" ,"c"), stringsAsFactors = FALSE)

start_date<-as_date("2015-01-01")
end_date<-as_date("2018-12-31")

all_dates<-seq(start_date, end_date, by='1 day')

allcounties_alldates<-crossing(counties, all_dates)