基于时间戳的 R 子集数据帧(每小时 1 个值)

R subset dataframe based on timestamp (1 value per hour)

我是 R 的新手,目前正在努力解决以下问题。

我的数据框看起来像这样(注意有更多的列和行):

transform_data_test

我的目标是对数据框进行子集化,这样我每小时只能得到 1 行。通常我每天应该得到 24 行数据。

我尝试了以下代码:

sorting_test <- NULL
sorting_test <- transform_data_test %>% 
  mutate(Date = ymd_hms(date), dt = as_date(date), hr = hour(date)) %>% 
  group_by(dt, hr) %>%
  filter(Date == min(Date)) %>% 
  ungroup() 

但是,我的结果缺少几天和几小时的完整数据。 result

我检查了多个松弛线程,但找不到合适的解决方案。我错过了什么?

编辑:根据要求输入我的数据框

structure(list(event.id = c(8384310912, 8384310913, 8384310914, 
8384310915, 8384310916, 8384310917), visible = c("true", "true", 
"true", "true", "true", "true"), date = c("2019-01-02 14:00:23.000", 
"2019-01-02 19:00:41.000", "2019-01-03 12:40:23.000", "2019-01-03 13:20:07.000", 
"2019-01-03 13:40:07.000", "2019-01-03 14:00:06.000"), long = c(-5.8788868, 
-5.9539697, -5.8788406, -5.8780982, -5.8787327, -5.8787414), 
    lat = c(37.2227809, 37.1668986, 37.2230289, 37.2221521, 37.2229884, 
    37.223007), bar.barometric.pressure = c(NA, NA, NA, NA, NA, 
    NA), data.decoding.software = c(8L, 8L, 8L, 8L, 8L, 8L), 
    eobs.activity = c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), eobs.activity.samples = c(NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
    ), eobs.battery.voltage = c(3706L, 3701L, 3725L, 3735L, 3730L, 
    3730L), eobs.fix.battery.voltage = c(3671L, 3666L, 3696L, 
    3710L, 3706L, 3706L), eobs.horizontal.accuracy.estimate = c(5.12, 
    5.38, 12.8, 7.94, 7.42, 7.17), eobs.key.bin.checksum = c(1795615137, 
    3161364829, 1830803994, 2772018959, 2497038035, 3520440277
    ), eobs.speed.accuracy.estimate = c(0.28, 0.3, 0.27, 0.25, 
    0.35, 0.42), eobs.start.timestamp = c("2019-01-02 13:59:57.000", 
    "2019-01-02 19:00:05.000", "2019-01-03 12:39:57.000", "2019-01-03 13:20:00.000", 
    "2019-01-03 13:40:00.000", "2019-01-03 14:00:00.000"), eobs.status = c("A", 
    "A", "A", "A", "A", "A"), eobs.temperature = c(18L, 11L, 
    13L, 15L, 14L, 15L), eobs.type.of.fix = c(3L, 3L, 3L, 3L, 
    3L, 3L), eobs.used.time.to.get.fix = c(25L, 35L, 25L, 6L, 
    6L, 5L), gps.dop = c(NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_), gps.satellite.count = c(NA, NA, NA, 
    NA, NA, NA), ground.speed = c(0.05, 0.03, 0.07, 10.51, 0.16, 
    0.18), heading = c(0, 0, 0, 346.03, 208.4, 205.13), height.above.ellipsoid = c(108.3, 
    43.8, 106.9, 114.1, 106, 114.7), import.marked.outlier = c("false", 
    "false", "false", "false", "false", "false"), mag.magnetic.field.raw.x = c(NA, 
    NA, NA, NA, NA, NA), mag.magnetic.field.raw.y = c(NA, NA, 
    NA, NA, NA, NA), mag.magnetic.field.raw.z = c(NA, NA, NA, 
    NA, NA, NA), manually.marked.outlier = c(NA, NA, NA, NA, 
    NA, NA), orientation.quaternion.raw.w = c(NA, NA, NA, NA, 
    NA, NA), orientation.quaternion.raw.x = c(NA, NA, NA, NA, 
    NA, NA), orientation.quaternion.raw.y = c(NA, NA, NA, NA, 
    NA, NA), orientation.quaternion.raw.z = c(NA, NA, NA, NA, 
    NA, NA), sensor.type = c("gps", "gps", "gps", "gps", "gps", 
    "gps"), individual.taxon.canonical.name = c("Ciconia ciconia", 
    "Ciconia ciconia", "Ciconia ciconia", "Ciconia ciconia", 
    "Ciconia ciconia", "Ciconia ciconia"), tag.local.identifier = c(3907L, 
    3907L, 3907L, 3907L, 3907L, 3907L), id = c("Mirabell + / DER AN910 (eobs 3907)", 
    "Mirabell + / DER AN910 (eobs 3907)", "Mirabell + / DER AN910 (eobs 3907)", 
    "Mirabell + / DER AN910 (eobs 3907)", "Mirabell + / DER AN910 (eobs 3907)", 
    "Mirabell + / DER AN910 (eobs 3907)"), study.name = c("LifeTrack White Stork SW Germany", 
    "LifeTrack White Stork SW Germany", "LifeTrack White Stork SW Germany", 
    "LifeTrack White Stork SW Germany", "LifeTrack White Stork SW Germany", 
    "LifeTrack White Stork SW Germany"), year = c("2019", "2019", 
    "2019", "2019", "2019", "2019"), burst = c("Mirabell   DER AN910 (eobs 3907) 2019", 
    "Mirabell   DER AN910 (eobs 3907) 2019", "Mirabell   DER AN910 (eobs 3907) 2019", 
    "Mirabell   DER AN910 (eobs 3907) 2019", "Mirabell   DER AN910 (eobs 3907) 2019", 
    "Mirabell   DER AN910 (eobs 3907) 2019")), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), groups = structure(list(
    burst = "Mirabell   DER AN910 (eobs 3907) 2019", .rows = structure(list(
        1:6), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .drop = TRUE))

也许 distinct 有帮助?

library(lubridate)
library(dplyr)
transform_data_test %>%
  mutate(Date = ymd_hms(date), dt = as_date(date), hr = hour(date)) %>%
  group_by(burst) %>%
  distinct(dt, hr, .keep_all = T)