将 ISO 8601 week-of-year 编号与 Windows 上的 month-of-year 编号与德语语言环境匹配

Match ISO 8601 week-of-year numbers to month-of-year numbers on Windows with German locale

这与我的问题直接相关

但是,在这个问题中,我想特别询问如何将 ISO 8601 周数映射到一年中的月份数。

对我来说,似乎不可能 and/or 涉及一些 non-intuitive 黑客攻击(甚至这些黑客攻击也不能真正可靠地工作),因此 IMO 应该被视为需要在 base R 中修复的东西。如果我错了请纠正我,不过

编辑:似乎这个问题与 Windows 上的 运行 密切相关 and/or 你所在的语言环境(在我的例子中是标准德语)

posix <- as.POSIXct(c("2015-12-24", "2015-12-31", "2016-01-01", "2016-01-08"))

ISO 8601

(yw <- format(posix, "%Y-%V"))
# [1] "2015-52" "2015-53" "2016-53" "2016-01"
ywd <- sprintf("%s-1", yw)
(as.POSIXct(ywd, format = "%Y-%V-%u"))
# [1] "2015-01-12 CET" "2015-01-12 CET" "2016-01-12 CET" "2016-01-12 CET"
# -> utterly wrong!!!

ywd <- sprintf("%s-4", yw)
(as.POSIXct(ywd, format = "%Y-%V-%u"))
# -> still wrong -> the day of the week is not the reason

# -> no way to use ISO 8601 convention to map week of the year to month of the year

为了尽职调查:尝试使用美国或英国惯例时也不可能:

美国公约

(yw <- format(posix, "%Y-%U"))
# [1] "2015-51" "2015-52" "2016-00" "2016-01"
ywd <- sprintf("%s-1", yw)
(as.POSIXct(ywd, format = "%Y-%U-%u"))
# [1] "2015-12-21 CET" "2015-12-28 CET" NA               "2016-01-04 CET"
# -> NA problem for week 00

ywd <- sprintf("%s-4", yw)
# -> does not work for week 00
(as.POSIXct(ywd, format = "%Y-%U-%u"))
# The day of the week is not the reason

# -> no way to use this convention to reliably map week of the year to month of the year

英国公约

(yw <- format(posix, "%Y-%W"))
# [1] "2015-51" "2015-52" "2016-00" "2016-01"
ywd <- sprintf("%s-1", yw)
(as.POSIXct(ywd, format = "%Y-%W-%u"))
# [1] "2015-12-21 CET" "2015-12-28 CET" NA               "2016-01-04 CET"
# -> NA problem for week 00

ywd <- sprintf("%s-4", yw)
# -> does not work for week 00
(as.POSIXct(ywd, format = "%Y-%W-%u"))
# The day of the week is not the reason

# -> no way to use this convention to reliably map week of the year to month of the year

Session 信息

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=German_Germany.1252     LC_CTYPE=German_Germany.1252       LC_MONETARY=German_Germany.1252   
[4] LC_NUMERIC=C                       LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] fva_0.1.0       digest_0.6.10   readxl_0.1.1    dplyr_0.5.0     plyr_1.8.4      magrittr_1.5   
 [7] memoise_1.0.0   testthat_1.0.2  roxygen2_5.0.1  devtools_1.12.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8     lubridate_1.6.0 assertthat_0.1  packrat_0.4.8-1 crayon_1.3.2    withr_1.0.2    
 [7] R6_2.2.0        DBI_0.5-1       stringi_1.1.2   rstudioapi_0.6  tools_3.3.2     stringr_1.1.0  
[13] tibble_1.2     

> devtools::session_info()
Session info -----------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.3.2 (2016-10-31)
 system   x86_64, mingw32             
 ui       RStudio (1.0.136)           
 language en                          
 collate  German_Germany.1252         
 tz       Europe/Berlin               
 date     2017-01-12                  

Packages ---------------------------------------------------------------------------------------------------
 package    * version date       source        
 assertthat   0.1     2013-12-06 CRAN (R 3.3.2)
 crayon       1.3.2   2016-06-28 CRAN (R 3.3.2)
 DBI          0.5-1   2016-09-10 CRAN (R 3.3.2)
 devtools   * 1.12.0  2016-06-24 CRAN (R 3.3.2)
 digest     * 0.6.10  2016-08-02 CRAN (R 3.3.2)
 dplyr      * 0.5.0   2016-06-24 CRAN (R 3.3.2)
 fva        * 0.1.0   <NA>       local         
 lubridate    1.6.0   2016-09-13 CRAN (R 3.3.2)
 magrittr   * 1.5     2014-11-22 CRAN (R 3.3.2)
 memoise    * 1.0.0   2016-01-29 CRAN (R 3.3.2)
 packrat      0.4.8-1 2016-09-07 CRAN (R 3.3.2)
 plyr       * 1.8.4   2016-06-08 CRAN (R 3.3.2)
 R6           2.2.0   2016-10-05 CRAN (R 3.3.2)
 Rcpp         0.12.8  2016-11-17 CRAN (R 3.3.2)
 readxl     * 0.1.1   2016-03-28 CRAN (R 3.3.2)
 roxygen2   * 5.0.1   2015-11-11 CRAN (R 3.3.2)
 stringi      1.1.2   2016-10-01 CRAN (R 3.3.2)
 stringr      1.1.0   2016-08-19 CRAN (R 3.3.2)
 testthat   * 1.0.2   2016-04-23 CRAN (R 3.3.2)
 tibble       1.2     2016-08-26 CRAN (R 3.3.2)
 withr        1.0.2   2016-06-20 CRAN (R 3.3.2)

很确定除了基数 R 之外还有一些东西需要改变(见末尾的注释):

some_dates <- as.POSIXct(c("2015-12-24", "2015-12-31", "2016-01-01", "2016-01-08"))

(year_week <- format(some_dates, "%Y %U"))
## [1] "2015 51" "2015 52" "2016 00" "2016 01"

(year_week_day <- sprintf("%s 1", year_week))
## [1] "2015 51 1" "2015 52 1" "2016 00 1" "2016 01 1"

(as.POSIXct(year_week_day, format = "%Y %U %u"))
## [1] "2015-12-21 EST" "2015-12-28 EST" "2016-01-04 EST" "2016-01-04 EST"

它也适用于破折号:

(year_week <- format(some_dates, "%Y-%U"))
## [1] "2015-51" "2015-52" "2016-00" "2016-01"

(year_week_day <- sprintf("%s-1", year_week))
## [1] "2015-51-1" "2015-52-1" "2016-00-1" "2016-01-1"

(as.POSIXct(year_week_day, format = "%Y-%U-%u"))
## [1] "2015-12-21 EST" "2015-12-28 EST" "2016-01-04 EST" "2016-01-04 EST"

而且,尽管破折号是 ISO 格式,但当各种值不 >12 或 <0

时,它们可能会导致读者混淆

注意

因为评论线程表明这是 Windows 上的行为:

(year_week <- format(some_dates, "%Y-%U"))
## [1] "2015-51" "2015-52" "2016-00" "2016-01"

(year_week_day <- sprintf("%s-1", year_week))
## [1] "2015-51-1" "2015-52-1" "2016-00-1" "2016-01-1"

(as.POSIXct(year_week_day, format = "%Y-%U-%u"))
## [1] "2015-12-21 PST" "2015-12-28 PST" NA               "2016-01-04 PST"

(Windows 10 64 位,me/this 示例的 R 3.3.2)

R 日期时间格式参数的文档?strptime 说“%V”将在输入时被忽略。

披露: I have created the ISOweek package 中所述,以处理 ISO 8601 基于周的日期。

题目有几个漏洞:

  1. ISO 8601 基于周的年份与日历年不同。
  2. 如果不指定星期几,年-周到年-月的转换是不明确的。

基于周的年份与日历年

OP 使用

创建了样本数据
posix <- as.POSIXct(c("2015-12-24", "2015-12-31", "2016-01-01", "2016-01-08"))
(yw <- format(posix, "%Y-%V"))
[1] "2015-52" "2015-53" "2016-53" "2016-01"

格式规范%Yreturns第三个元素明显错误的日历年。

使用正确的格式规范 %G 我们确实得到

(yw <- format(posix, "%G-%V"))
[1] "2015-52" "2015-53" "2015-53" "2016-01"

一年中的一周到一年中的一个月

仅提供基于 ISO 周的年份和周数 没有星期几 将产生 模棱两可的 结果。

这可以用(更正的)示例数据来证明,这些数据现在包含 OP 自己的(非标准)年-周格式的连续三周:

yw
[1] "2015-52" "2015-53" "2016-01"

借助 ISOweek 包中的 ISOweek2date() 函数,数据被转换为日历日期。请注意,ISOweek2date() 需要完整的基于 ISO 8601 周的日期,格式为 yyyy-Www-d,包括星期几。如果我们选择一周的第一天(星期一),我们会得到:

library(ISOweek)
library(magrittr)
yw %>% 
  # insert "W" to conform with ISO 8601 format
  sub("-", "-W", .) %>% 
  # append day of week
  paste0("-1") %>%
  # convert to class Date and print as yyyy-mm 
  ISOweek2date() %>% 
  format("%Y-%m")
[1] "2015-12" "2015-12" "2016-01"

现在,我们使用一周的最后一天(星期日)重复此操作:

yw %>% 
  sub("-", "-W", .) %>% 
  paste0("-7") %>% 
  ISOweek2date() %>% 
  format("%Y-%m")
[1] "2015-12" "2016-01" "2016-01"

请注意,第二个元素现在指的是 2016 年 1 月,而不是 2015 年 12 月,因为第 53 周的星期日在 1 月,而本周的星期一仍然在 12 月。