用 stat_peaks/stat_valleys 标记极值会产生重复的标签
Labeling extrema with stat_peaks/stat_valleys produces duplicate labels
我从 .nc 天气数据集(ncdf4 包)中提取了一些纵向温度数据,并想使用 ggplot2 及其扩展 ggpmisc,包括 stat_peaks
/stat_valleys
。奇怪的是,所有的标签读起来都一样:"Dec 1969".
我认为最有可能的罪魁祸首是我用于 x 轴的数据格式不正确 Date
,但 x 轴显示正确并且我检查了 class输入数据进行确认。我还尝试应用 group=1
,结果没有任何变化——我承认我是 R 和 ggplot2 的新手(更熟悉 Python/Pandas)并且不完全理解 group=1 的作用,尽管这是必要的使该行正确显示。也许这是错误的结果?
ggplot(df_denver, aes(x=Date, y=Temp..C., group=1)) +
geom_line() +
scale_x_date(date_labels="%b %Y", date_breaks = "10 years", expand=c(0,0)) +
stat_peaks(span=24, ignore_threshold = 0.80, color="red") +
stat_peaks(geom="text", span=24, ignore_threshold = 0.80, x.label.fmt = "%b %Y", color="red", angle=90, hjust=-0.1) +
stat_valleys(span=24, ignore_threshold = 0.55, color="blue") +
stat_valleys(geom="text", span=24, ignore_threshold = 0.55, x.label.fmt = "%b %Y", color="blue", angle=90, hjust=1.1) +
labs(x="Date", y="Temp (C)", title="Monthly Air Surface Temp for Denver from 1880 on")
这里是我的数据集的前 100 行,它们产生 3 个峰和 3 个谷来说明:
Date Temp..C.
1 1880-01-01 2.91287017
2 1880-02-01 -2.73586297
3 1880-03-01 -2.04185677
4 1880-04-01 0.37948364
5 1880-05-01 0.78548384
6 1880-06-01 0.44176754
7 1880-07-01 -1.06966007
8 1880-08-01 -0.53162575
9 1880-09-01 -0.29665694
10 1880-10-01 -2.08401608
11 1880-11-01 -9.46955109
12 1880-12-01 -1.52052176
13 1881-01-01 -2.53366208
14 1881-02-01 -1.88263988
15 1881-03-01 -0.06864686
16 1881-04-01 3.32321167
17 1881-05-01 1.75613177
18 1881-06-01 2.82765651
19 1881-07-01 1.76543093
20 1881-08-01 1.39409852
21 1881-09-01 -0.98141575
22 1881-10-01 -0.63346595
23 1881-11-01 -1.95676208
24 1881-12-01 3.28983855
25 1882-01-01 -0.64792717
26 1882-02-01 2.15854502
27 1882-03-01 2.91465187
28 1882-04-01 0.56616443
29 1882-05-01 -1.89441001
30 1882-06-01 -0.63149375
31 1882-07-01 -0.64883423
32 1882-08-01 0.82802373
33 1882-09-01 0.66150969
34 1882-10-01 -0.54113626
35 1882-11-01 -1.21310496
36 1882-12-01 1.30559540
37 1883-01-01 -1.41802752
38 1883-02-01 -6.39232874
39 1883-03-01 2.96320987
40 1883-04-01 -0.48122203
41 1883-05-01 -0.99614143
42 1883-06-01 -0.67229420
43 1883-07-01 -0.56595141
44 1883-08-01 0.52161294
45 1883-09-01 0.09190032
46 1883-10-01 -2.65115738
47 1883-11-01 1.88332438
48 1883-12-01 -0.19942272
49 1884-01-01 -0.34669495
50 1884-02-01 -2.21085262
51 1884-03-01 0.55254096
52 1884-04-01 -1.21859336
53 1884-05-01 -0.40969065
54 1884-06-01 0.44454563
55 1884-07-01 1.28881764
56 1884-08-01 -1.09331822
57 1884-09-01 1.52377772
58 1884-10-01 1.76569140
59 1884-11-01 0.72411090
60 1884-12-01 -4.64927006
61 1885-01-01 -1.03242493
62 1885-02-01 -0.79325873
63 1885-03-01 0.65910935
64 1885-04-01 -0.10181000
65 1885-05-01 -1.50702798
66 1885-06-01 -1.25801849
67 1885-07-01 -0.88433135
68 1885-08-01 -1.18410277
69 1885-09-01 0.15284735
70 1885-10-01 -0.91721576
71 1885-11-01 1.82403481
72 1885-12-01 1.68553519
73 1886-01-01 -4.21202993
74 1886-02-01 2.43953681
75 1886-03-01 -2.24947429
76 1886-04-01 -1.22557247
77 1886-05-01 2.66594267
78 1886-06-01 -0.21662886
79 1886-07-01 1.09909940
80 1886-08-01 0.63720244
81 1886-09-01 -0.11845125
82 1886-10-01 0.49225059
83 1886-11-01 -3.16969180
84 1886-12-01 2.18220520
85 1887-01-01 0.51427501
86 1887-02-01 -0.69656581
87 1887-03-01 3.96693182
88 1887-04-01 0.92614591
89 1887-05-01 1.66550291
90 1887-06-01 1.88668025
91 1887-07-01 -1.48990893
92 1887-08-01 -0.98355341
93 1887-09-01 0.93172997
94 1887-10-01 -1.12551820
95 1887-11-01 1.07798636
96 1887-12-01 -2.15758419
97 1888-01-01 -1.69266903
98 1888-02-01 2.55955243
99 1888-03-01 -1.83599913
100 1888-04-01 3.63450384
如您所见,stat_peaks
和stat_valleys
产生的标签完全相同,甚至不在缩略数据的范围内,而不是x轴对应的正确日期。
Monthly Air Surface Temp for Denver from 1880 on
stat_peaks
和 stat_valleys
标签将使用 POSIXct
格式的日期:
df_denver$Date <- as.POSIXct(df_denver$Date, format = "%Y-%m-%d")
ggplot(df_denver, aes(x=Date, y=Temp)) +
geom_line() +
scale_x_datetime(date_labels="%b %Y", date_breaks = "1 year", expand=c(0,0)) +
stat_peaks(span=24, ignore_threshold = 0.80, color="red") +
stat_peaks(geom="text", span=24, ignore_threshold = 0.80, x.label.fmt = "%b %Y", color="red", angle=90, hjust=-0.1) +
stat_valleys(span=24, ignore_threshold = 0.55, color="blue") +
stat_valleys(geom="text", span=24, ignore_threshold = 0.55, x.label.fmt = "%b %Y", color="blue", angle=90, hjust=1.1) +
labs(x="Date", y="Temp (C)", title="Monthly Air Surface Temp for Denver from 1880 on") +
expand_limits(y = 6)
注意:scale_x_date
已更改为 scale_x_datetime
。此外,将 date_breaks
更改为 1 年以演示示例数据的 x 轴标签,并将 expand_limits
更改为确保峰值标签可读。 group=1
不需要。
我从 .nc 天气数据集(ncdf4 包)中提取了一些纵向温度数据,并想使用 ggplot2 及其扩展 ggpmisc,包括 stat_peaks
/stat_valleys
。奇怪的是,所有的标签读起来都一样:"Dec 1969".
我认为最有可能的罪魁祸首是我用于 x 轴的数据格式不正确 Date
,但 x 轴显示正确并且我检查了 class输入数据进行确认。我还尝试应用 group=1
,结果没有任何变化——我承认我是 R 和 ggplot2 的新手(更熟悉 Python/Pandas)并且不完全理解 group=1 的作用,尽管这是必要的使该行正确显示。也许这是错误的结果?
ggplot(df_denver, aes(x=Date, y=Temp..C., group=1)) +
geom_line() +
scale_x_date(date_labels="%b %Y", date_breaks = "10 years", expand=c(0,0)) +
stat_peaks(span=24, ignore_threshold = 0.80, color="red") +
stat_peaks(geom="text", span=24, ignore_threshold = 0.80, x.label.fmt = "%b %Y", color="red", angle=90, hjust=-0.1) +
stat_valleys(span=24, ignore_threshold = 0.55, color="blue") +
stat_valleys(geom="text", span=24, ignore_threshold = 0.55, x.label.fmt = "%b %Y", color="blue", angle=90, hjust=1.1) +
labs(x="Date", y="Temp (C)", title="Monthly Air Surface Temp for Denver from 1880 on")
这里是我的数据集的前 100 行,它们产生 3 个峰和 3 个谷来说明:
Date Temp..C.
1 1880-01-01 2.91287017
2 1880-02-01 -2.73586297
3 1880-03-01 -2.04185677
4 1880-04-01 0.37948364
5 1880-05-01 0.78548384
6 1880-06-01 0.44176754
7 1880-07-01 -1.06966007
8 1880-08-01 -0.53162575
9 1880-09-01 -0.29665694
10 1880-10-01 -2.08401608
11 1880-11-01 -9.46955109
12 1880-12-01 -1.52052176
13 1881-01-01 -2.53366208
14 1881-02-01 -1.88263988
15 1881-03-01 -0.06864686
16 1881-04-01 3.32321167
17 1881-05-01 1.75613177
18 1881-06-01 2.82765651
19 1881-07-01 1.76543093
20 1881-08-01 1.39409852
21 1881-09-01 -0.98141575
22 1881-10-01 -0.63346595
23 1881-11-01 -1.95676208
24 1881-12-01 3.28983855
25 1882-01-01 -0.64792717
26 1882-02-01 2.15854502
27 1882-03-01 2.91465187
28 1882-04-01 0.56616443
29 1882-05-01 -1.89441001
30 1882-06-01 -0.63149375
31 1882-07-01 -0.64883423
32 1882-08-01 0.82802373
33 1882-09-01 0.66150969
34 1882-10-01 -0.54113626
35 1882-11-01 -1.21310496
36 1882-12-01 1.30559540
37 1883-01-01 -1.41802752
38 1883-02-01 -6.39232874
39 1883-03-01 2.96320987
40 1883-04-01 -0.48122203
41 1883-05-01 -0.99614143
42 1883-06-01 -0.67229420
43 1883-07-01 -0.56595141
44 1883-08-01 0.52161294
45 1883-09-01 0.09190032
46 1883-10-01 -2.65115738
47 1883-11-01 1.88332438
48 1883-12-01 -0.19942272
49 1884-01-01 -0.34669495
50 1884-02-01 -2.21085262
51 1884-03-01 0.55254096
52 1884-04-01 -1.21859336
53 1884-05-01 -0.40969065
54 1884-06-01 0.44454563
55 1884-07-01 1.28881764
56 1884-08-01 -1.09331822
57 1884-09-01 1.52377772
58 1884-10-01 1.76569140
59 1884-11-01 0.72411090
60 1884-12-01 -4.64927006
61 1885-01-01 -1.03242493
62 1885-02-01 -0.79325873
63 1885-03-01 0.65910935
64 1885-04-01 -0.10181000
65 1885-05-01 -1.50702798
66 1885-06-01 -1.25801849
67 1885-07-01 -0.88433135
68 1885-08-01 -1.18410277
69 1885-09-01 0.15284735
70 1885-10-01 -0.91721576
71 1885-11-01 1.82403481
72 1885-12-01 1.68553519
73 1886-01-01 -4.21202993
74 1886-02-01 2.43953681
75 1886-03-01 -2.24947429
76 1886-04-01 -1.22557247
77 1886-05-01 2.66594267
78 1886-06-01 -0.21662886
79 1886-07-01 1.09909940
80 1886-08-01 0.63720244
81 1886-09-01 -0.11845125
82 1886-10-01 0.49225059
83 1886-11-01 -3.16969180
84 1886-12-01 2.18220520
85 1887-01-01 0.51427501
86 1887-02-01 -0.69656581
87 1887-03-01 3.96693182
88 1887-04-01 0.92614591
89 1887-05-01 1.66550291
90 1887-06-01 1.88668025
91 1887-07-01 -1.48990893
92 1887-08-01 -0.98355341
93 1887-09-01 0.93172997
94 1887-10-01 -1.12551820
95 1887-11-01 1.07798636
96 1887-12-01 -2.15758419
97 1888-01-01 -1.69266903
98 1888-02-01 2.55955243
99 1888-03-01 -1.83599913
100 1888-04-01 3.63450384
如您所见,stat_peaks
和stat_valleys
产生的标签完全相同,甚至不在缩略数据的范围内,而不是x轴对应的正确日期。
Monthly Air Surface Temp for Denver from 1880 on
stat_peaks
和 stat_valleys
标签将使用 POSIXct
格式的日期:
df_denver$Date <- as.POSIXct(df_denver$Date, format = "%Y-%m-%d")
ggplot(df_denver, aes(x=Date, y=Temp)) +
geom_line() +
scale_x_datetime(date_labels="%b %Y", date_breaks = "1 year", expand=c(0,0)) +
stat_peaks(span=24, ignore_threshold = 0.80, color="red") +
stat_peaks(geom="text", span=24, ignore_threshold = 0.80, x.label.fmt = "%b %Y", color="red", angle=90, hjust=-0.1) +
stat_valleys(span=24, ignore_threshold = 0.55, color="blue") +
stat_valleys(geom="text", span=24, ignore_threshold = 0.55, x.label.fmt = "%b %Y", color="blue", angle=90, hjust=1.1) +
labs(x="Date", y="Temp (C)", title="Monthly Air Surface Temp for Denver from 1880 on") +
expand_limits(y = 6)
注意:scale_x_date
已更改为 scale_x_datetime
。此外,将 date_breaks
更改为 1 年以演示示例数据的 x 轴标签,并将 expand_limits
更改为确保峰值标签可读。 group=1
不需要。