在使用 Pad 函数填充日期和时间间隔时遇到问题
Having issues using Pad function to fill in date with time gaps
我在使用 Pad 函数 (Padr) 填充时间序列中的空白时遇到问题。我有一些代码可以从服务器下载每小时数据,在特定时间段内一次一天。下载完每天的数据后,目的是使用 pad 清理数据并添加时间和日期,以便可以正确组合而不会出错。
该函数将数据下载为列表,如下所示:
time temperature
2019-11-11 00:00:00 3
2019-11-11 01:00:00 4
2019-11-11 03:00:00 5
希望程序自动填写如下:
time temperature
2019-11-11 00:00:00 3
2019-11-11 01:00:00 4
2019-11-11 02:00:00 NA
2019-11-11 03:00:00 5
我在下面的代码中使用 PAD 来填补空白,但如果数据从 02:00:00 开始,它就会从那个时间步开始。使用 start_val 和 end_val 时,似乎无法识别日期和时间,如有任何帮助,我们将不胜感激。我尝试了很多变通办法但没有运气。请记住,日期会有所不同,并且无法知道缺少哪个小时。
if (nrow(daily$hourly) < 24) {
daily$hourly <- daily$hourly %>% pad(daily$hourly$time, start_val = as.POSIXct('00:00:00'),end_val = as.POSIXct('23:00:00') %>% fill_by_value(value)
}
**更新
我认为主要问题是 R 没有认识到 00:00:00 是时间序列的开始,因此它不会填补 01:00:00 作为空白。如果差距在不同的地方,这两种解决方案都有效。有什么想法吗。请参阅下面的结构。
structure(list(time = structure(c(1521936000, 1521939600, 1521943200,
1521946800, 1521950400, 1521954000, 1521957600, 1521961200, 1521964800,
1521968400, 1521972000, 1521975600, 1521979200, 1521982800, 1521986400,
1521990000, 1521993600, 1521997200, 1522000800, 1522004400, 1522008000,
1522011600, 1522015200), class = c("POSIXct", "POSIXt"), tzone = ""),
summary = c("Overcast", "Overcast", "Overcast", "Overcast",
"Overcast", "Overcast", "Overcast", "Foggy", "Mostly Cloudy",
"Mostly Cloudy", "Overcast", "Mostly Cloudy", "Mostly Cloudy",
"Mostly Cloudy", "Mostly Cloudy", "Mostly Cloudy", "Partly Cloudy",
"Partly Cloudy", "Partly Cloudy", "Partly Cloudy", "Partly Cloudy",
"Clear", "Clear"), icon = c("cloudy", "cloudy", "cloudy",
"cloudy", "cloudy", "cloudy", "cloudy", "fog", "partly-cloudy-day",
"partly-cloudy-day", "cloudy", "partly-cloudy-day", "partly-cloudy-day",
"partly-cloudy-day", "partly-cloudy-day", "partly-cloudy-day",
"partly-cloudy-day", "partly-cloudy-day", "partly-cloudy-day",
"partly-cloudy-night", "partly-cloudy-night", "clear-night",
"clear-night"), precipIntensity = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), precipProbability = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), temperature = c(7.28, 7.3, 7.21, 7.08, 7.03, 7.02, 7.15,
7.19, 7.38, 7.83, 8.43, 9.35, 9.89, 10.54, 10.81, 11.07,
11.55, 11.31, 10.52, 9.67, 8.67, 7.94, 6.93), apparentTemperature = c(7.28,
7.3, 7.21, 7.08, 7.03, 7.02, 7.15, 7.19, 7.38, 7.33, 8.43,
9.35, 9.64, 10.54, 10.81, 11.07, 11.55, 11.31, 10.52, 9.67,
8.67, 7.94, 6.93), dewPoint = c(4.99, 5.07, 5.03, 4.99, 4.86,
5.04, 5.41, 5.6, 5.55, 5.62, 5.57, 5.79, 5.84, 5.7, 5.4,
5.08, 4.4, 4.2, 4.37, 4.32, 4.02, 4.06, 3.73), humidity = c(0.85,
0.86, 0.86, 0.87, 0.86, 0.87, 0.89, 0.9, 0.88, 0.86, 0.82,
0.78, 0.76, 0.72, 0.69, 0.67, 0.61, 0.62, 0.66, 0.69, 0.73,
0.76, 0.8), pressure = c(1005.4, 1005.7, 1006, 1006.4, 1006.7,
1007.2, 1007.7, 1008.6, 1009.4, 1010.3, 1010.9, 1011.6, 1011.7,
1012.1, 1012.2, 1012.3, 1012.4, 1012.6, 1013.3, 1013.8, 1014.5,
1014.8, 1015.3), windSpeed = c(0.35, 0.48, 0.55, 0.33, 0.36,
0.6, 0.85, 1.05, 1.29, 1.38, 0.89, 1.33, 1.39, 1.44, 1.63,
1.57, 1.46, 1.27, 0.57, 0.23, 0.03, 0.27, 0.2), windGust = c(0.48,
0.81, 0.95, 0.42, 0.44, 0.96, 1.14, 1.28, 2.03, 1.99, 1.72,
2.51, 2.48, 2.66, 2.48, 2.46, 2.42, 1.67, 0.65, 0.27, 0.03,
0.27, 0.2), windBearing = c(28L, 6L, 12L, 1L, 12L, 3L, 12L,
23L, 40L, 41L, 26L, 22L, 15L, 21L, 9L, 11L, 10L, 18L, 16L,
17L, NA, 273L, 284L), cloudCover = c(0.98, 0.98, 0.98, 0.93,
0.89, 0.93, 0.97, 0.94, 0.82, 0.83, 0.99, 0.75, 0.75, 0.75,
0.75, 0.73, 0.51, 0.49, 0.46, 0.46, 0.44, 0.1, 0), uvIndex = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 2L,
1L, 0L, 0L, 0L, 0L, 0L, 0L), visibility = c(6.74, 6.064,
6.532, 6.035, 6.054, 6.006, 4.033, 3.047, 4.369, 5.512, 6.856,
8.129, 9.269, 9.488, 10.003, 10.003, 10.003, 10.003, 10.003,
10.003, 10.003, 10.003, 9.521)), row.names = c(NA, -23L), class = "data.frame")
您可以使用 tidyr
中的 complete
并在 min
和 max
之间创建一个每小时序列 time
tidyr::complete(df, time = seq(min(time), max(time), by = "1 hour"))
# time temperature
# <dttm> <int>
#1 2019-11-11 00:00:00 3
#2 2019-11-11 01:00:00 4
#3 2019-11-11 02:00:00 NA
#4 2019-11-11 03:00:00 5
数据
df <- structure(list(time = structure(c(1573401600, 1573405200, 1573412400
), class = c("POSIXct", "POSIXt"), tzone = ""), temperature = 3:5),
row.names = c(NA, -3L), class = "data.frame")
padr::pad
将数据帧作为其第一个参数,因此它不适用于您现在提供的向量。您需要做的就是:
x <- data.frame(
time = as.POSIXct(c('2019-11-11 00:00:00','2019-11-11 01:00:00','2019-11-11 03:00:00')),
temperature = 3:5
)
padr::pad(x)
我在使用 Pad 函数 (Padr) 填充时间序列中的空白时遇到问题。我有一些代码可以从服务器下载每小时数据,在特定时间段内一次一天。下载完每天的数据后,目的是使用 pad 清理数据并添加时间和日期,以便可以正确组合而不会出错。
该函数将数据下载为列表,如下所示:
time temperature
2019-11-11 00:00:00 3
2019-11-11 01:00:00 4
2019-11-11 03:00:00 5
希望程序自动填写如下:
time temperature
2019-11-11 00:00:00 3
2019-11-11 01:00:00 4
2019-11-11 02:00:00 NA
2019-11-11 03:00:00 5
我在下面的代码中使用 PAD 来填补空白,但如果数据从 02:00:00 开始,它就会从那个时间步开始。使用 start_val 和 end_val 时,似乎无法识别日期和时间,如有任何帮助,我们将不胜感激。我尝试了很多变通办法但没有运气。请记住,日期会有所不同,并且无法知道缺少哪个小时。
if (nrow(daily$hourly) < 24) {
daily$hourly <- daily$hourly %>% pad(daily$hourly$time, start_val = as.POSIXct('00:00:00'),end_val = as.POSIXct('23:00:00') %>% fill_by_value(value)
}
**更新
我认为主要问题是 R 没有认识到 00:00:00 是时间序列的开始,因此它不会填补 01:00:00 作为空白。如果差距在不同的地方,这两种解决方案都有效。有什么想法吗。请参阅下面的结构。
structure(list(time = structure(c(1521936000, 1521939600, 1521943200,
1521946800, 1521950400, 1521954000, 1521957600, 1521961200, 1521964800,
1521968400, 1521972000, 1521975600, 1521979200, 1521982800, 1521986400,
1521990000, 1521993600, 1521997200, 1522000800, 1522004400, 1522008000,
1522011600, 1522015200), class = c("POSIXct", "POSIXt"), tzone = ""),
summary = c("Overcast", "Overcast", "Overcast", "Overcast",
"Overcast", "Overcast", "Overcast", "Foggy", "Mostly Cloudy",
"Mostly Cloudy", "Overcast", "Mostly Cloudy", "Mostly Cloudy",
"Mostly Cloudy", "Mostly Cloudy", "Mostly Cloudy", "Partly Cloudy",
"Partly Cloudy", "Partly Cloudy", "Partly Cloudy", "Partly Cloudy",
"Clear", "Clear"), icon = c("cloudy", "cloudy", "cloudy",
"cloudy", "cloudy", "cloudy", "cloudy", "fog", "partly-cloudy-day",
"partly-cloudy-day", "cloudy", "partly-cloudy-day", "partly-cloudy-day",
"partly-cloudy-day", "partly-cloudy-day", "partly-cloudy-day",
"partly-cloudy-day", "partly-cloudy-day", "partly-cloudy-day",
"partly-cloudy-night", "partly-cloudy-night", "clear-night",
"clear-night"), precipIntensity = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), precipProbability = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), temperature = c(7.28, 7.3, 7.21, 7.08, 7.03, 7.02, 7.15,
7.19, 7.38, 7.83, 8.43, 9.35, 9.89, 10.54, 10.81, 11.07,
11.55, 11.31, 10.52, 9.67, 8.67, 7.94, 6.93), apparentTemperature = c(7.28,
7.3, 7.21, 7.08, 7.03, 7.02, 7.15, 7.19, 7.38, 7.33, 8.43,
9.35, 9.64, 10.54, 10.81, 11.07, 11.55, 11.31, 10.52, 9.67,
8.67, 7.94, 6.93), dewPoint = c(4.99, 5.07, 5.03, 4.99, 4.86,
5.04, 5.41, 5.6, 5.55, 5.62, 5.57, 5.79, 5.84, 5.7, 5.4,
5.08, 4.4, 4.2, 4.37, 4.32, 4.02, 4.06, 3.73), humidity = c(0.85,
0.86, 0.86, 0.87, 0.86, 0.87, 0.89, 0.9, 0.88, 0.86, 0.82,
0.78, 0.76, 0.72, 0.69, 0.67, 0.61, 0.62, 0.66, 0.69, 0.73,
0.76, 0.8), pressure = c(1005.4, 1005.7, 1006, 1006.4, 1006.7,
1007.2, 1007.7, 1008.6, 1009.4, 1010.3, 1010.9, 1011.6, 1011.7,
1012.1, 1012.2, 1012.3, 1012.4, 1012.6, 1013.3, 1013.8, 1014.5,
1014.8, 1015.3), windSpeed = c(0.35, 0.48, 0.55, 0.33, 0.36,
0.6, 0.85, 1.05, 1.29, 1.38, 0.89, 1.33, 1.39, 1.44, 1.63,
1.57, 1.46, 1.27, 0.57, 0.23, 0.03, 0.27, 0.2), windGust = c(0.48,
0.81, 0.95, 0.42, 0.44, 0.96, 1.14, 1.28, 2.03, 1.99, 1.72,
2.51, 2.48, 2.66, 2.48, 2.46, 2.42, 1.67, 0.65, 0.27, 0.03,
0.27, 0.2), windBearing = c(28L, 6L, 12L, 1L, 12L, 3L, 12L,
23L, 40L, 41L, 26L, 22L, 15L, 21L, 9L, 11L, 10L, 18L, 16L,
17L, NA, 273L, 284L), cloudCover = c(0.98, 0.98, 0.98, 0.93,
0.89, 0.93, 0.97, 0.94, 0.82, 0.83, 0.99, 0.75, 0.75, 0.75,
0.75, 0.73, 0.51, 0.49, 0.46, 0.46, 0.44, 0.1, 0), uvIndex = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 2L,
1L, 0L, 0L, 0L, 0L, 0L, 0L), visibility = c(6.74, 6.064,
6.532, 6.035, 6.054, 6.006, 4.033, 3.047, 4.369, 5.512, 6.856,
8.129, 9.269, 9.488, 10.003, 10.003, 10.003, 10.003, 10.003,
10.003, 10.003, 10.003, 9.521)), row.names = c(NA, -23L), class = "data.frame")
您可以使用 tidyr
中的 complete
并在 min
和 max
之间创建一个每小时序列 time
tidyr::complete(df, time = seq(min(time), max(time), by = "1 hour"))
# time temperature
# <dttm> <int>
#1 2019-11-11 00:00:00 3
#2 2019-11-11 01:00:00 4
#3 2019-11-11 02:00:00 NA
#4 2019-11-11 03:00:00 5
数据
df <- structure(list(time = structure(c(1573401600, 1573405200, 1573412400
), class = c("POSIXct", "POSIXt"), tzone = ""), temperature = 3:5),
row.names = c(NA, -3L), class = "data.frame")
padr::pad
将数据帧作为其第一个参数,因此它不适用于您现在提供的向量。您需要做的就是:
x <- data.frame(
time = as.POSIXct(c('2019-11-11 00:00:00','2019-11-11 01:00:00','2019-11-11 03:00:00')),
temperature = 3:5
)
padr::pad(x)