在 R 中合并 xts - 将字符转换为 NA

Merging xts in R - Converting Characters to NA

我有 3 个 xts 对象

logged <- xts::xts(x = loggedInUsers$loggedInUsers, order.by = Sys.time())
loadValue <- xts::xts(x = loadAvg, order.by = Sys.time())
hostname <- xts::xts(x = loadHost, order.by = Sys.time())

dput(hostname)
dput(loadValue)
dput(logged)

dput 给出以下结果

 structure("deliverforgoodportal", .Dim = c(1L, 1L), index = structure(1551088127.27724, tzone = "", tclass = c("POSIXct",
    "POSIXt")), class = c("xts", "zoo"), .indexCLASS = c("POSIXct",
    "POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "")

structure(0, .Dim = c(1L, 1L), .Dimnames = list(NULL, "load"), index = structure(1551088127.27676, tzone = "", tclass = c("POSIXct",
"POSIXt")), .indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct",
"POSIXt"), .indexTZ = "", tzone = "", class = c("xts", "zoo"))

structure(1, .Dim = c(1L, 1L), index = structure(1551088127.27637, tzone = "", tclass = c("POSIXct",
"POSIXt")), class = c("xts", "zoo"), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "")

当我合并这三个并打印时,主机名被转换为 NA

  tmp <- merge.xts(hostname, logged, loadValue, all = TRUE)
    print(tmp)

输出是:(主机名是 NA)

                    hostname logged  load
2019-02-25 09:48:47       NA      1    NA
2019-02-25 09:48:47       NA     NA    0
2019-02-25 09:48:47       NA     NA    NA

为什么这是 NA?

你应该意识到一个xts对象是一个时间序列和一个矩阵。现在矩阵只能包含一种类型的值,字符或数字。但不是两者。您的合并试图将字符值矩阵(主机名)与数值(记录和加载)结合起来。这导致主机名值被强制为 NA。

如果要加入此数据,则必须使用 data.frame(或 data.table)。另请注意,您的时间值不相等,它们以毫秒为单位。因此,如果您想按分钟加入,请先使用 lubridate 包中的 floor_date。请参阅下面两个使用和不使用 lubridate 的示例。我使用包 timetk 将 xts 对象转换为 tibble,但这取决于您的源数据,这可能不是必需的。

with full_join, no lubridate

library(timetk)
library(dplyr)
hostname <- tk_tbl(hostname)
loadValue <- tk_tbl(loadValue)
logged <- tk_tbl(logged)

hostname %>% 
  full_join(loadValue) %>% 
  full_join(logged, 
            by = "index", 
            suffix = c("_hostname", "_logged"))

Joining, by = "index"
# A tibble: 3 x 4
  index               value_hostname        load value_logged
  <dttm>              <chr>                <dbl>        <dbl>
1 2019-02-25 10:48:47 deliverforgoodportal    NA           NA
2 2019-02-25 10:48:47 NA                       0           NA
3 2019-02-25 10:48:47 NA                      NA            1

使用 lubridate 和左连接:

hostname %>% 
  mutate(index = lubridate::floor_date(index, unit = "seconds")) %>% 
  left_join(loadValue %>% mutate(index = lubridate::floor_date(index, unit = "seconds"))) %>% 
  left_join(logged %>% mutate(index = lubridate::floor_date(index, unit = "seconds")), 
            by = "index", 
            suffix = c("_hostname", "_logged"))    

Joining, by = "index"
# A tibble: 1 x 4
  index               value_hostname        load value_logged
  <dttm>              <chr>                <dbl>        <dbl>
1 2019-02-25 10:48:47 deliverforgoodportal     0            1