在每月 POSIX 数据集中处理年份
Working With Years in a Monthly POSIX data set
我使用来自大阿尔伯克基地区多个气象站的月度气候数据,我以机场数据的这个子集为例,我最终会将同样的过程应用于所有位置。有将近 500 个月的可用数据,但我在这里包含了前 30 个月。
> head(ABQ, 30)
STATION_NAME DATE CLDD
9698 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1945-05-01 449
9699 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1945-06-01 1335
9700 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1945-07-01 2330
9701 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1945-08-01 2269
9702 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1945-09-01 1247
9703 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1945-10-01 13
9709 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1946-04-01 62
9710 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1946-05-01 251
9711 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1946-06-01 2097
9712 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1946-07-01 2303
9713 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1946-08-01 1889
9714 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1946-09-01 1111
9715 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1946-10-01 23
9721 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1947-04-01 1
9722 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1947-05-01 611
9723 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1947-06-01 1273
9724 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1947-07-01 2636
9725 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1947-08-01 1892
9726 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1947-09-01 1265
9727 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1947-10-01 171
9733 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1948-04-01 91
9734 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1948-05-01 642
9735 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1948-06-01 1506
9736 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1948-07-01 2529
9737 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1948-08-01 2186
9738 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1948-09-01 1130
9739 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1948-10-01 13
9745 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1949-04-01 88
9746 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1949-05-01 304
9747 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1949-06-01 1477
我想调用 ABQ$CLDD 的年度总和并将该值应用于 ggplot()... 类似这样
CLDD_yr <- apply.yearly(ABQ$DATE, sum(CLDD))
p <- ggplot(CLDD_yr, aes(YEAR, CLDD_yr)),
+ stat_smooth(method = "lm", formula = y~x + I(x^2), size = 1)
我知道我在调用我认为的数据时犯了一个错误,但我似乎无法解决这个问题。
DATE 列是 POSIX 此处显示的时间
> class(ABQ$DATE)
[1] "POSIXlt" "POSIXt"
编辑:
根据 coffienjunkies 评论
也许新的 df 是解决这个问题的最佳方法,因为我需要以相同的方式查看多个位置的数据
> stations
unique(Bernalillo_data$STATION_NAME)
1 ALBUQUERQUE INTERNATIONAL AIRPORT NM US
2 PETROGLYPH NATIONAL MON NM US
3 SANDIA PARK NM US
4 ALBUQUERQUE VLY NM US
5 ALBUQUERQUE FOOTHILLS NE NM US
6 SANDIA RANGER STATION NM US
7 SANDIA CREST NM US
8 LA MADERA SKI AREA NM US
9 NETHERWOOD PARK NM US
10 EXPERIMENT FARM NM US
11 KIRTLAND AFB NM US
也许新的DF应该是这样的
header <- station_name Year CLDD_sum
我认为在较长的 运行 中,这将使分析更简单。
试试这个,
require(data.table)
setDT(ABQ)
ABQ[, CLDD_yr := sum(CLDD), by = year(DATE)]
# Required because data.table and ggplot don't play nice.
setDF(ABQ)
p <- ggplot(ABQ, aes(YEAR, CLDD_yr)),
+ stat_smooth(method = "lm", formula = y~x + I(x^2), size = 1)
请注意,您必须安装 data.table
。请注意,这将为每一行创建汇总统计信息,因此您可能会在 ggplot 中得到几个重叠的点。如果你不想,你可以试试,
require(data.table)
setDT(ABQ)
for_plot <- ABQ[, .(CLDD_yr := sum(CLDD)), by = list(year = year(DATE))]
# Required because data.table and ggplot don't play nice.
setDF(for_plot)
p <- ggplot(for_plot, aes(year, CLDD_yr)),
+ stat_smooth(method = "lm", formula = y~x + I(x^2), size = 1)
希望对您有所帮助。
我认为您可以使用多种方法,但在某些时候必须进行一些聚合。这里有两个建议:
library(dplyr)
library(ggplot2)
df$year <- df$DATE$year
df$DATE <- as.POSIXct(df$DATE) # dplyr doesn't play well with POSIXlt
df_yr <- df %>% group_by(year) %>% summarise(cldd_yr = sum(CLDD))
这产生:
Source: local data frame [5 x 2]
year cldd_yr
(chr) (int)
1 1945 7643
2 1946 7736
3 1947 7849
4 1948 8097
5 1949 1869
您可以将其与 ggplot
结合使用。对于多个站点,只需将站点添加为分组变量即可。例如,df_yr <- df %>% group_by(year, station) %>% summarise(cldd_yr = sum(CLDD))
将为您提供所有年份和电台的摘要,前提是 station
是您的标识符的称呼方式。
如果您真的不想使用新数据框但可以添加列,请尝试
df <- group_by(df, year) %>% mutate(yr.sum = sum(CLDD))
在yr.sum
中,您有年度总和。请注意,此值是重复的,您必须确保 ggplot
正确使用它。不过,我建议使用第一种方法,因为它可能更高效、更透明。
我使用来自大阿尔伯克基地区多个气象站的月度气候数据,我以机场数据的这个子集为例,我最终会将同样的过程应用于所有位置。有将近 500 个月的可用数据,但我在这里包含了前 30 个月。
> head(ABQ, 30)
STATION_NAME DATE CLDD
9698 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1945-05-01 449
9699 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1945-06-01 1335
9700 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1945-07-01 2330
9701 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1945-08-01 2269
9702 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1945-09-01 1247
9703 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1945-10-01 13
9709 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1946-04-01 62
9710 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1946-05-01 251
9711 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1946-06-01 2097
9712 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1946-07-01 2303
9713 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1946-08-01 1889
9714 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1946-09-01 1111
9715 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1946-10-01 23
9721 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1947-04-01 1
9722 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1947-05-01 611
9723 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1947-06-01 1273
9724 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1947-07-01 2636
9725 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1947-08-01 1892
9726 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1947-09-01 1265
9727 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1947-10-01 171
9733 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1948-04-01 91
9734 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1948-05-01 642
9735 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1948-06-01 1506
9736 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1948-07-01 2529
9737 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1948-08-01 2186
9738 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1948-09-01 1130
9739 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1948-10-01 13
9745 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1949-04-01 88
9746 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1949-05-01 304
9747 ALBUQUERQUE INTERNATIONAL AIRPORT NM US 1949-06-01 1477
我想调用 ABQ$CLDD 的年度总和并将该值应用于 ggplot()... 类似这样
CLDD_yr <- apply.yearly(ABQ$DATE, sum(CLDD))
p <- ggplot(CLDD_yr, aes(YEAR, CLDD_yr)),
+ stat_smooth(method = "lm", formula = y~x + I(x^2), size = 1)
我知道我在调用我认为的数据时犯了一个错误,但我似乎无法解决这个问题。
DATE 列是 POSIX 此处显示的时间
> class(ABQ$DATE)
[1] "POSIXlt" "POSIXt"
编辑: 根据 coffienjunkies 评论
也许新的 df 是解决这个问题的最佳方法,因为我需要以相同的方式查看多个位置的数据
> stations
unique(Bernalillo_data$STATION_NAME)
1 ALBUQUERQUE INTERNATIONAL AIRPORT NM US
2 PETROGLYPH NATIONAL MON NM US
3 SANDIA PARK NM US
4 ALBUQUERQUE VLY NM US
5 ALBUQUERQUE FOOTHILLS NE NM US
6 SANDIA RANGER STATION NM US
7 SANDIA CREST NM US
8 LA MADERA SKI AREA NM US
9 NETHERWOOD PARK NM US
10 EXPERIMENT FARM NM US
11 KIRTLAND AFB NM US
也许新的DF应该是这样的
header <- station_name Year CLDD_sum
我认为在较长的 运行 中,这将使分析更简单。
试试这个,
require(data.table)
setDT(ABQ)
ABQ[, CLDD_yr := sum(CLDD), by = year(DATE)]
# Required because data.table and ggplot don't play nice.
setDF(ABQ)
p <- ggplot(ABQ, aes(YEAR, CLDD_yr)),
+ stat_smooth(method = "lm", formula = y~x + I(x^2), size = 1)
请注意,您必须安装 data.table
。请注意,这将为每一行创建汇总统计信息,因此您可能会在 ggplot 中得到几个重叠的点。如果你不想,你可以试试,
require(data.table)
setDT(ABQ)
for_plot <- ABQ[, .(CLDD_yr := sum(CLDD)), by = list(year = year(DATE))]
# Required because data.table and ggplot don't play nice.
setDF(for_plot)
p <- ggplot(for_plot, aes(year, CLDD_yr)),
+ stat_smooth(method = "lm", formula = y~x + I(x^2), size = 1)
希望对您有所帮助。
我认为您可以使用多种方法,但在某些时候必须进行一些聚合。这里有两个建议:
library(dplyr)
library(ggplot2)
df$year <- df$DATE$year
df$DATE <- as.POSIXct(df$DATE) # dplyr doesn't play well with POSIXlt
df_yr <- df %>% group_by(year) %>% summarise(cldd_yr = sum(CLDD))
这产生:
Source: local data frame [5 x 2]
year cldd_yr
(chr) (int)
1 1945 7643
2 1946 7736
3 1947 7849
4 1948 8097
5 1949 1869
您可以将其与 ggplot
结合使用。对于多个站点,只需将站点添加为分组变量即可。例如,df_yr <- df %>% group_by(year, station) %>% summarise(cldd_yr = sum(CLDD))
将为您提供所有年份和电台的摘要,前提是 station
是您的标识符的称呼方式。
如果您真的不想使用新数据框但可以添加列,请尝试
df <- group_by(df, year) %>% mutate(yr.sum = sum(CLDD))
在yr.sum
中,您有年度总和。请注意,此值是重复的,您必须确保 ggplot
正确使用它。不过,我建议使用第一种方法,因为它可能更高效、更透明。