gnuplot:如何获得正确的周数?

gnuplot: How to get correct week numbers?

源自这个问题, 发现 gnuplot 中使用时间说明符 %W%U 的周数在某些情况下是错误的。

显然,周数有不同的定义。 此外,一周开始的时间也有不同的定义,例如在周日或周一。 根据 ISO 8601,一种常用的周数定义(但不在美国和其他一些国家/地区使用)。

代码:(说明错误的周数)

### wrong week numbering in gnuplot with %W and %U
reset session

StartDate = "24.12.2020"
myTimeFmt = "%d.%m.%Y"
SecondsPerDay = 3600*24

print "      date   %a  %w  %d   %j  %W  %U"
print "===================================="
do for [i=0:20] {
    t = strptime(myTimeFmt,StartDate) + i*SecondsPerDay
    myDate = strftime(myTimeFmt."  %a  %w  %d  %j  %W  %U", t)
    print sprintf("%s", myDate)
}
### end of code

gnuplot 时间说明符:

%a abbreviated name of day of the week
%w day of the week, 0–6 (Sunday = 0)
%d day of the month, 01–31
%j day of the year, 1–366 
%W week of the year (week starts on Monday)
%U week of the year (week starts on Sunday)

结果:

      date   %a  %w  %d   %j  %W  %U
====================================
24.12.2020  Thu  04  24  359  52  52
25.12.2020  Fri  05  25  360  52  52
26.12.2020  Sat  06  26  361  52  52
27.12.2020  Sun  00  27  362  52  53
28.12.2020  Mon  01  28  363  53  53
29.12.2020  Tue  02  29  364  53  53
30.12.2020  Wed  03  30  365  53  53
31.12.2020  Thu  04  31  366  53  53
01.01.2021  Fri  05  01  001  01  01   ???
02.01.2021  Sat  06  02  002  01  01   ???
03.01.2021  Sun  00  03  003  00  01   ???
04.01.2021  Mon  01  04  004  01  01
05.01.2021  Tue  02  05  005  01  01
06.01.2021  Wed  03  06  006  01  01
07.01.2021  Thu  04  07  007  01  01
08.01.2021  Fri  05  08  008  01  01
09.01.2021  Sat  06  09  009  01  01
10.01.2021  Sun  00  10  010  01  02
11.01.2021  Mon  01  11  011  02  02
12.01.2021  Tue  02  12  012  02  02
13.01.2021  Wed  03  13  013  02  02

问题:是否有解决此问题的解决方法?

根据此处的描述:https://en.wikipedia.org/wiki/ISO_week_date, 我猜ISO 8601定义的本质是:

  1. 一周从星期一开始
  2. 第 01 周是一年中第一个星期四的那一周
  3. 一周属于一年中大部分天数在
  4. 从星期四开始或结束的年份有 53 周,其他年份有 52 周

代码:

### correct week number according to ISO 8601
reset session

dow(t)      = int(tm_wday(t)) ? tm_wday(t) : 7                               # day of week 1=Mon, ..., 7=Sun
week(t)     = int((11 + tm_yday(t) - dow(t))/7)                              # "raw"week of year
wday(d,m,y) = tm_wday(strptime("%d.%m.%Y",sprintf("%02d.%02d.%04d",d,m,y)))  # week day of certain date
wpy(y)      = wday(1,1,y)==4 || wday(31,12,y)==4 ? 53 : 52                   # weeks per year
woy(t)      = week(t) < 1 ? wpy(tm_year(t)-1) : \
              week(t) > wpy(tm_year(t)) ? 1 : week(t)                        # week of year
yow(t)      = int(week(t) < 1 ? tm_year(t)-1 : week(t) > wpy(tm_year(t)) ? \
              tm_year(t)+1 : tm_year(t))                                     # year of week (could be previous, current or next)

StartDate = "24.12.2020"
myTimeFmt = "%d.%m.%Y"
SecondsPerDay = 3600*24

print "      date   %a DoW  %d   %j   YoW WoY"
print "======================================"
do for [i=0:20] {
    t = strptime(myTimeFmt,StartDate) + i*SecondsPerDay
    myDate = strftime(myTimeFmt."  %a", t)
    myDate2 = strftime("%d  %j", t)
    print sprintf("%s  %02d  %s  %04d-W%02d", myDate, dow(t), myDate2, yow(t), woy(t))
}
### end of code

结果:

      date   %a DoW  %d   %j   YoW WoY
======================================
24.12.2020  Thu  04  24  359  2020-W52
25.12.2020  Fri  05  25  360  2020-W52
26.12.2020  Sat  06  26  361  2020-W52
27.12.2020  Sun  07  27  362  2020-W52
28.12.2020  Mon  01  28  363  2020-W53
29.12.2020  Tue  02  29  364  2020-W53
30.12.2020  Wed  03  30  365  2020-W53
31.12.2020  Thu  04  31  366  2020-W53
01.01.2021  Fri  05  01  001  2020-W53
02.01.2021  Sat  06  02  002  2020-W53
03.01.2021  Sun  07  03  003  2020-W53
04.01.2021  Mon  01  04  004  2021-W01
05.01.2021  Tue  02  05  005  2021-W01
06.01.2021  Wed  03  06  006  2021-W01
07.01.2021  Thu  04  07  007  2021-W01
08.01.2021  Fri  05  08  008  2021-W01
09.01.2021  Sat  06  09  009  2021-W01
10.01.2021  Sun  07  10  010  2021-W01
11.01.2021  Mon  01  11  011  2021-W02
12.01.2021  Tue  02  12  012  2021-W02
13.01.2021  Wed  03  13  013  2021-W02

为了使用周数,例如作为时间轴标签,最好为 %W 实现此功能。一不小心,最近在SourceForge上有一个bug report。 所以,我认为它会很快在下一个版本中得到修复。

鉴于持续的大流行以及随之而来的对绘制来自所有来源的流行病学数据的兴趣,清理和扩展 gnuplot 对周日期格式的支持似乎是权宜之计。 gnuplot 文档的“新功能”部分现在列出:

• Time specifier format %W has been brought into accord with the ISO 8601 week date standard. 
• Time specifier format %U has been brought into accord with the CDC/MMWR week date standard. 
• New function tm week(time, std) returns ISO or CDC standard week of year. 
• New function weekdate iso(year, week, day) converts ISO standard week date to calendar time. 
• New function weekdate cdc(year, week, day) converts CDC standard week date to calendar time.

这里是一个示例(来自 the online demo set),它将以 ISO 8601 周日期格式给出的数据转换为标准日历日期,以便沿 gnuplot 时间轴绘制。

#                   Epidemiological data
#
# Plot from data file that encodes date as an ISO 8601 "week date".
# Example:  week date 2004-W01-1 is calendar date 29 December 2003
# The data is from the European Centre for Disease Prevention and Control
# https://www.ecdc.europa.eu/

# The ECDC data file uses fields containing week date as "YYYY-WW".
# First we define a function that extracts the integer year and week
# from this string and converts it to standard time representation.

calendar(date) = weekdate_iso( int(date[1:4]), int(date[6:7]) )

set datafile separator comma
set style data lines
set key Left left reverse box samplen 2 width 2
set grid x lt 1 lw .75 lc "gray"
set tics nomirror
set border 3
set xtics time format "%b\n%Y"
set ytics format " %4.0f"

data1 = '< grep "Denmark.*cases" ECDC-weekly-national-COVID.csv'
data2 = '< grep "Sweden.*cases" ECDC-weekly-national-COVID.csv'
data3 = '< grep "Norway.*cases" ECDC-weekly-national-COVID.csv'
data4 = '< grep "Finland.*cases" ECDC-weekly-national-COVID.csv'
data5 = '< grep "Iceland.*cases" ECDC-weekly-national-COVID.csv'

set title "weekly COVID-19 cases per 100,000 people" font "/Bold,15"

plot data1 using (calendar(strcol(7))) : (1.e5*/) lw 2 title "Denmark", \
     data2 using (calendar(strcol(7))) : (1.e5*/) lw 2 title "Sweden", \
     data3 using (calendar(strcol(7))) : (1.e5*/) lw 2 title "Norway", \
     data4 using (calendar(strcol(7))) : (1.e5*/) lw 2 title "Finland", \
     data5 using (calendar(strcol(7))) : (1.e5*/) lw 2 lt 6 title "Iceland"