如何用 r 中数据框中每一列中存在的值来填补空白?

how to fill the gaps with values present in each column in a dataframe in r?

我的数据是这样的:

dput(head(COR_trial,10))
structure(list(rDate = structure(c(1439995500, 1439995800, 1439996100, 
1439996400, 1439996700, 1439997000, 1439997300, 1439997600, 1439997900, 
1439998200), class = c("POSIXct", "POSIXt"), tzone = ""), CCRN630 = c(NA, 
NA, NA, NA, NA, 0.2878412, NA, NA, NA, NA), CCRN800 = c(NA, NA, 
NA, NA, 0.3213675, NA, NA, NA, NA, NA), CCRN532 = c(NA, NA, NA, 
0.3327465, NA, NA, NA, NA, NA, NA), CCRN570 = c(NA, NA, NA, NA, 
NA, 0.4172932, NA, NA, NA, NA)), row.names = c(NA, 10L), class = "data.frame")

这是数据帧的头部:

                rDate   CCRN630   CCRN800   CCRN532   CCRN570
1 2015-08-19 14:45:00        NA        NA        NA        NA
2 2015-08-19 14:50:00        NA        NA        NA        NA
3 2015-08-19 14:55:00        NA        NA        NA        NA
4 2015-08-19 15:00:00        NA        NA 0.3327465        NA
5 2015-08-19 15:05:00        NA 0.3213675        NA        NA
6 2015-08-19 15:10:00 0.2878412        NA        NA 0.4172932

2015-08-19 14:45:002018-10-11 13:00:00(5分钟内分发了5个数据),我对REFN630REFN800、[=的每一列都有14个值19=] 和 REFN570(注意:它们的值出现在不同的日期和时间,但有时它们匹配)

我将给出一列的示例 REFN630 以及值分布的日期。

  rDate               CCRN630
  <dttm>                <dbl>
1 2015-08-19 15:10:00   0.288
2 2015-10-23 10:40:00   0.129
3 2016-02-03 12:40:00   0.373
4 2016-03-24 13:25:00   0.392
5 2016-06-21 11:50:00   0.144
6 2016-07-15 11:35:00   0.195
7 2016-08-18 11:35:00   0.204
8 2016-12-20 13:00:00   0.22
9 2017-01-18 13:25:00   0.210
10 2017-02-17 13:05:00   0.237
11 2017-03-29 12:10:00   0.2  
12 2017-05-03 10:30:00   0.174
13 2017-06-08 12:20:00   0.157
14 2017-07-11 11:55:00   0.164

如您所见,日期之间有很多空白需要填补。有什么方法可以用相同的值填充空白直到下一个定义的值?

我找到了一种使用以下代码插入值的方法:

CCRN630<-fillGap(COR_trial$CCRN630, method=c("fixed"),rule=2)

现在我想做一些比插值更简单的事情,但我不知道该怎么做。

期望的输出是这样的:

                rDate   CCRN630   CCRN800   CCRN532   CCRN570
1       2015-08-19 14:45:00 0.2878412 0.3213675 0.3327465 0.4172932
2       2015-08-19 14:50:00 0.2878412 0.3213675 0.3327465 0.4172932
3       2015-08-19 14:55:00 0.2878412 0.3213675 0.3327465 0.4172932
4       2015-08-19 15:00:00 0.2878412 0.3213675 0.3327465 0.4172932
5       2015-08-19 15:05:00 0.2878412 0.3213675 0.3327465 0.4172932
6       2015-08-19 15:10:00 0.2878412 0.3213675 0.3327465 0.4172932

18670   2015-08-19 14:45:00 0.2878412 0.3213675 0.3327465 0.4172932
18671   2015-08-19 14:50:00 0.2878412 0.3213675 0.3327465 0.4172932
18672   2015-10-23 10:40:00 0.1287671 0.1181319 0.2111437 0.2463768
18673   2015-08-19 15:00:00 0.1287671 0.1181319 0.2111437 0.2463768
18674   2015-08-19 15:05:00 0.1287671 0.1181319 0.2111437 0.2463768
18675   2015-08-19 15:10:00 0.1287671 0.1181319 0.2111437 0.2463768

任何帮助将不胜感激。

如果我正确理解你的问题,你可以使用 dplyrtidyr:

library(dplyr)
library(tidyr)

COR_trial %>%
  complete(rDate = seq(min(rDate), max(rDate), by=300)) %>%
  fill(starts_with("CCRN"))
  • complete 创建缺失的日期和时间
  • seq(min(rDate), max(rDate), by=300) 创建一个日期和时间序列,从数据集中最低的 date/time 开始。步数始终为 5 分钟,以秒计算,因此 by = 300.
  • fill 使用已知值并填充行直到下一个已知值。如果要向上填充行,可以将 fill(starts_with("CCRN")) 更改为 fill(starts_with("CCRN"), .direction="up")

这个returns

# A tibble: 18,648 x 5
   rDate               CCRN630 CCRN800 CCRN532 CCRN570
   <dttm>                <dbl>   <dbl>   <dbl>   <dbl>
 1 2015-08-19 16:45:00  NA      NA      NA      NA    
 2 2015-08-19 16:50:00  NA      NA      NA      NA    
 3 2015-08-19 16:55:00  NA      NA      NA      NA    
 4 2015-08-19 17:00:00  NA      NA       0.333  NA    
 5 2015-08-19 17:05:00  NA       0.321   0.333  NA    
 6 2015-08-19 17:10:00   0.288   0.321   0.333   0.417
 7 2015-08-19 17:15:00   0.288   0.321   0.333   0.417
 8 2015-08-19 17:20:00   0.288   0.321   0.333   0.417
 9 2015-08-19 17:25:00   0.288   0.321   0.333   0.417
10 2015-08-19 17:30:00   0.288   0.321   0.333   0.417

数据

structure(list(rDate = structure(c(1445589600, 1439995500, 1439995800, 
1439996100, 1439996400, 1439996700, 1439997000, 1439997300, 1439997600, 
1439997900, 1439998200), tzone = "", class = c("POSIXct", "POSIXt"
)), CCRN630 = c(0.129, NA, NA, NA, NA, NA, 0.2878412, NA, NA, 
NA, NA), CCRN800 = c(NA, NA, NA, NA, NA, 0.3213675, NA, NA, NA, 
NA, NA), CCRN532 = c(NA, NA, NA, NA, 0.3327465, NA, NA, NA, NA, 
NA, NA), CCRN570 = c(NA, NA, NA, NA, NA, NA, 0.4172932, NA, NA, 
NA, NA)), row.names = c(NA, -11L), class = c("tbl_df", "tbl", 
"data.frame"), problems = structure(list(row = 11L, col = NA_character_, 
    expected = "4 columns", actual = "5 columns", file = "literal data"), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")))