Purrr-fused 关于安排日期列

Purrr-fused about arranging date column

我正在尝试使用 purrr 排列列表列。但是只是创建一个玩具示例让我完全困惑:

s <- tibble(b = as.integer(runif(
  n = 10, min = 0, max = 20
)))
s$e <-
  map(s$b,  ~ sample(seq(
    as.Date('1990/01/01'), as.Date('2010/01/01'), by = "day"
  ), size = .))

我以为我可以做这样的事情:

s2 <- s %>% map('b') %>% 
  mutate(e = map(~ sample(seq(as.Date('1990/01/01'),
                              as.Date('2010/01/01'), by = "day"),
                          size = .)))

但是,这不起作用。我在这里错过了什么?

现在,我想按升序排列列表列中的日期,并提取第一个和最后一个日期。我将如何以 purrr 方式执行此操作? 我在

上尝试了不同的变体
s %>% map('e') %>% map_df(~arrange(.))

但显然我在这里遗漏了一些东西...

我想要的输出是数据框 s 中的新列表列,其中列表列 s$e 中未排列的日期在新列表列中按升序排列 s$new_arranged_dates.

> s
# A tibble: 10 × 3
       b           e       new_arranged_dates    
   <int>      <list>            <list>    
1     15 <date [15]>           <date [15]>
2      0  <date [0]>           <date [0]>
3      7  <date [7]>             etc
4      6  <date [6]>
5      3  <date [3]>
6     14 <date [14]>
7     15 <date [15]>
8     13 <date [13]>
9     13 <date [13]>
10    11 <date [11]>

编辑 290817:

s2 <- s %>% 
  mutate(e = map(b,~ sample(seq(as.Date('1990/01/01'),
                              as.Date('2010/01/01'), by = "day"),
                          size = .))) %>% mutate(new_arranged_dates =map(e,~.[order(.)]))

得到我想要的。但是,我不明白为什么

s2 <- s %>% 
  mutate(e = map(b,~ sample(seq(as.Date('1990/01/01'),
                              as.Date('2010/01/01'), by = "day"),
                          size = .))) %>% mutate(new_arranged_dates=map(e,~arrange(.)))

结果

Error in mutate_(.data, .dots = lazyeval::lazy_dots(...)) : 
  argument ".data" is missing, with no default

所以-这里的基本错误是 arrange 更喜欢数据帧,不会对向量排序。将循环列表强制为 data_frame 解决了问题,但我花了一段时间才弄清楚生成的强制 data_frame 列的名称也是 .

所以这有效:

  library(dplyr)
  s <- tibble(b = as.integer(runif(
       n = 10, min = 0, max = 20
       )))
  s <-
  s %>% mutate(e = map(b,  ~ sample(seq(
    as.Date('1990/01/01'), as.Date('2010/01/01'), by = "day"
  ), size = .)))

  s <- s2 %>% mutate(arranged = map(e,  ~ arrange(data_frame(.), .)))

提示:使用从 map 调用的 browser() 语句创建一个新函数有很大帮助,并且可能对其他人也有帮助。

现在这是一个老问题,但你在这里需要的只是 sort:

s <- s %>% mutate(new_arranged_dates = map(e, sort))

str(s)

## Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    10 obs. of  3 variables:
##  $ b                 : int  5 16 3 14 16 5 14 1 1 5
##  $ e                 :List of 10
##   ..$ : Date, format: "1991-09-28" "2006-09-12" "1993-03-04" ...
##   ..$ : Date, format: "2000-04-30" "2002-05-16" "1991-10-01" ...
##   ..$ : Date, format: "1998-04-20" "2006-12-16" "2000-10-15"
##   ..$ : Date, format: "2000-02-14" "1993-01-20" "1998-03-26" ...
##   ..$ : Date, format: "1992-07-06" "1995-08-18" "2005-01-24" ...
##   ..$ : Date, format: "1996-05-01" "1993-03-01" "2001-10-11" ...
##   ..$ : Date, format: "2006-04-24" "2008-03-26" "2007-12-08" ...
##   ..$ : Date, format: "2007-04-15"
##   ..$ : Date, format: "1998-07-16"
##   ..$ : Date, format: "2004-04-25" "1994-12-01" "1998-12-21" ...
##  $ new_arranged_dates:List of 10
##   ..$ : Date, format: "1991-09-28" "1993-03-04" "2005-02-15" ...
##   ..$ : Date, format: "1990-08-19" "1991-10-01" "1992-12-15" ...
##   ..$ : Date, format: "1998-04-20" "2000-10-15" "2006-12-16"
##   ..$ : Date, format: "1990-01-21" "1990-12-29" "1992-06-09" ...
##   ..$ : Date, format: "1992-02-12" "1992-07-06" "1993-04-30" ...
##   ..$ : Date, format: "1991-07-30" "1993-03-01" "1996-05-01" ...
##   ..$ : Date, format: "1990-12-05" "1993-08-23" "1994-12-09" ...
##   ..$ : Date, format: "2007-04-15"
##   ..$ : Date, format: "1998-07-16"
##   ..$ : Date, format: "1994-12-01" "1998-12-21" "2004-04-25" ...
##  - attr(*, "vars")= chr 

要提取最早和最晚日期,map minmax

s %>% mutate(earliest = map(e, min), 
             latest = map(e, max)) %>% 
    unnest(earliest, latest, .drop = FALSE)

## # A tibble: 10 × 5
##        b           e new_arranged_dates   earliest     latest
##    <int>      <list>             <list>     <date>     <date>
## 1      5  <date [5]>         <date [5]> 1991-09-28 2007-07-19
## 2     16 <date [16]>        <date [16]> 1990-08-19 2007-10-08
## 3      3  <date [3]>         <date [3]> 1998-04-20 2006-12-16
## 4     14 <date [14]>        <date [14]> 1990-01-21 2006-06-11
## 5     16 <date [16]>        <date [16]> 1992-02-12 2008-12-18
## 6      5  <date [5]>         <date [5]> 1991-07-30 2007-10-23
## 7     14 <date [14]>        <date [14]> 1990-12-05 2009-04-11
## 8      1  <date [1]>         <date [1]> 2007-04-15 2007-04-15
## 9      1  <date [1]>         <date [1]> 1998-07-16 1998-07-16
## 10     5  <date [5]>         <date [5]> 1994-12-01 2008-01-10

没有map_date格式会自动简化到日期,因此您必须使用unnest来简化。 .drop = FALSE 指定保留其他列表列。