如何用 r 中数据框中每一列中存在的值来填补空白?
how to fill the gaps with values present in each column in a dataframe in r?
我的数据是这样的:
dput(head(COR_trial,10))
structure(list(rDate = structure(c(1439995500, 1439995800, 1439996100,
1439996400, 1439996700, 1439997000, 1439997300, 1439997600, 1439997900,
1439998200), class = c("POSIXct", "POSIXt"), tzone = ""), CCRN630 = c(NA,
NA, NA, NA, NA, 0.2878412, NA, NA, NA, NA), CCRN800 = c(NA, NA,
NA, NA, 0.3213675, NA, NA, NA, NA, NA), CCRN532 = c(NA, NA, NA,
0.3327465, NA, NA, NA, NA, NA, NA), CCRN570 = c(NA, NA, NA, NA,
NA, 0.4172932, NA, NA, NA, NA)), row.names = c(NA, 10L), class = "data.frame")
这是数据帧的头部:
rDate CCRN630 CCRN800 CCRN532 CCRN570
1 2015-08-19 14:45:00 NA NA NA NA
2 2015-08-19 14:50:00 NA NA NA NA
3 2015-08-19 14:55:00 NA NA NA NA
4 2015-08-19 15:00:00 NA NA 0.3327465 NA
5 2015-08-19 15:05:00 NA 0.3213675 NA NA
6 2015-08-19 15:10:00 0.2878412 NA NA 0.4172932
从2015-08-19 14:45:00
到2018-10-11 13:00:00
(5分钟内分发了5个数据),我对REFN630
、REFN800
、[=的每一列都有14个值19=] 和 REFN570
(注意:它们的值出现在不同的日期和时间,但有时它们匹配)
我将给出一列的示例 REFN630
以及值分布的日期。
rDate CCRN630
<dttm> <dbl>
1 2015-08-19 15:10:00 0.288
2 2015-10-23 10:40:00 0.129
3 2016-02-03 12:40:00 0.373
4 2016-03-24 13:25:00 0.392
5 2016-06-21 11:50:00 0.144
6 2016-07-15 11:35:00 0.195
7 2016-08-18 11:35:00 0.204
8 2016-12-20 13:00:00 0.22
9 2017-01-18 13:25:00 0.210
10 2017-02-17 13:05:00 0.237
11 2017-03-29 12:10:00 0.2
12 2017-05-03 10:30:00 0.174
13 2017-06-08 12:20:00 0.157
14 2017-07-11 11:55:00 0.164
如您所见,日期之间有很多空白需要填补。有什么方法可以用相同的值填充空白直到下一个定义的值?
我找到了一种使用以下代码插入值的方法:
CCRN630<-fillGap(COR_trial$CCRN630, method=c("fixed"),rule=2)
现在我想做一些比插值更简单的事情,但我不知道该怎么做。
期望的输出是这样的:
rDate CCRN630 CCRN800 CCRN532 CCRN570
1 2015-08-19 14:45:00 0.2878412 0.3213675 0.3327465 0.4172932
2 2015-08-19 14:50:00 0.2878412 0.3213675 0.3327465 0.4172932
3 2015-08-19 14:55:00 0.2878412 0.3213675 0.3327465 0.4172932
4 2015-08-19 15:00:00 0.2878412 0.3213675 0.3327465 0.4172932
5 2015-08-19 15:05:00 0.2878412 0.3213675 0.3327465 0.4172932
6 2015-08-19 15:10:00 0.2878412 0.3213675 0.3327465 0.4172932
18670 2015-08-19 14:45:00 0.2878412 0.3213675 0.3327465 0.4172932
18671 2015-08-19 14:50:00 0.2878412 0.3213675 0.3327465 0.4172932
18672 2015-10-23 10:40:00 0.1287671 0.1181319 0.2111437 0.2463768
18673 2015-08-19 15:00:00 0.1287671 0.1181319 0.2111437 0.2463768
18674 2015-08-19 15:05:00 0.1287671 0.1181319 0.2111437 0.2463768
18675 2015-08-19 15:10:00 0.1287671 0.1181319 0.2111437 0.2463768
任何帮助将不胜感激。
如果我正确理解你的问题,你可以使用 dplyr
和 tidyr
:
library(dplyr)
library(tidyr)
COR_trial %>%
complete(rDate = seq(min(rDate), max(rDate), by=300)) %>%
fill(starts_with("CCRN"))
complete
创建缺失的日期和时间
seq(min(rDate), max(rDate), by=300)
创建一个日期和时间序列,从数据集中最低的 date/time 开始。步数始终为 5 分钟,以秒计算,因此 by = 300
.
fill
使用已知值并填充行直到下一个已知值。如果要向上填充行,可以将 fill(starts_with("CCRN"))
更改为 fill(starts_with("CCRN"), .direction="up")
。
这个returns
# A tibble: 18,648 x 5
rDate CCRN630 CCRN800 CCRN532 CCRN570
<dttm> <dbl> <dbl> <dbl> <dbl>
1 2015-08-19 16:45:00 NA NA NA NA
2 2015-08-19 16:50:00 NA NA NA NA
3 2015-08-19 16:55:00 NA NA NA NA
4 2015-08-19 17:00:00 NA NA 0.333 NA
5 2015-08-19 17:05:00 NA 0.321 0.333 NA
6 2015-08-19 17:10:00 0.288 0.321 0.333 0.417
7 2015-08-19 17:15:00 0.288 0.321 0.333 0.417
8 2015-08-19 17:20:00 0.288 0.321 0.333 0.417
9 2015-08-19 17:25:00 0.288 0.321 0.333 0.417
10 2015-08-19 17:30:00 0.288 0.321 0.333 0.417
数据
structure(list(rDate = structure(c(1445589600, 1439995500, 1439995800,
1439996100, 1439996400, 1439996700, 1439997000, 1439997300, 1439997600,
1439997900, 1439998200), tzone = "", class = c("POSIXct", "POSIXt"
)), CCRN630 = c(0.129, NA, NA, NA, NA, NA, 0.2878412, NA, NA,
NA, NA), CCRN800 = c(NA, NA, NA, NA, NA, 0.3213675, NA, NA, NA,
NA, NA), CCRN532 = c(NA, NA, NA, NA, 0.3327465, NA, NA, NA, NA,
NA, NA), CCRN570 = c(NA, NA, NA, NA, NA, NA, 0.4172932, NA, NA,
NA, NA)), row.names = c(NA, -11L), class = c("tbl_df", "tbl",
"data.frame"), problems = structure(list(row = 11L, col = NA_character_,
expected = "4 columns", actual = "5 columns", file = "literal data"), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame")))
我的数据是这样的:
dput(head(COR_trial,10))
structure(list(rDate = structure(c(1439995500, 1439995800, 1439996100,
1439996400, 1439996700, 1439997000, 1439997300, 1439997600, 1439997900,
1439998200), class = c("POSIXct", "POSIXt"), tzone = ""), CCRN630 = c(NA,
NA, NA, NA, NA, 0.2878412, NA, NA, NA, NA), CCRN800 = c(NA, NA,
NA, NA, 0.3213675, NA, NA, NA, NA, NA), CCRN532 = c(NA, NA, NA,
0.3327465, NA, NA, NA, NA, NA, NA), CCRN570 = c(NA, NA, NA, NA,
NA, 0.4172932, NA, NA, NA, NA)), row.names = c(NA, 10L), class = "data.frame")
这是数据帧的头部:
rDate CCRN630 CCRN800 CCRN532 CCRN570
1 2015-08-19 14:45:00 NA NA NA NA
2 2015-08-19 14:50:00 NA NA NA NA
3 2015-08-19 14:55:00 NA NA NA NA
4 2015-08-19 15:00:00 NA NA 0.3327465 NA
5 2015-08-19 15:05:00 NA 0.3213675 NA NA
6 2015-08-19 15:10:00 0.2878412 NA NA 0.4172932
从2015-08-19 14:45:00
到2018-10-11 13:00:00
(5分钟内分发了5个数据),我对REFN630
、REFN800
、[=的每一列都有14个值19=] 和 REFN570
(注意:它们的值出现在不同的日期和时间,但有时它们匹配)
我将给出一列的示例 REFN630
以及值分布的日期。
rDate CCRN630
<dttm> <dbl>
1 2015-08-19 15:10:00 0.288
2 2015-10-23 10:40:00 0.129
3 2016-02-03 12:40:00 0.373
4 2016-03-24 13:25:00 0.392
5 2016-06-21 11:50:00 0.144
6 2016-07-15 11:35:00 0.195
7 2016-08-18 11:35:00 0.204
8 2016-12-20 13:00:00 0.22
9 2017-01-18 13:25:00 0.210
10 2017-02-17 13:05:00 0.237
11 2017-03-29 12:10:00 0.2
12 2017-05-03 10:30:00 0.174
13 2017-06-08 12:20:00 0.157
14 2017-07-11 11:55:00 0.164
如您所见,日期之间有很多空白需要填补。有什么方法可以用相同的值填充空白直到下一个定义的值?
我找到了一种使用以下代码插入值的方法:
CCRN630<-fillGap(COR_trial$CCRN630, method=c("fixed"),rule=2)
现在我想做一些比插值更简单的事情,但我不知道该怎么做。
期望的输出是这样的:
rDate CCRN630 CCRN800 CCRN532 CCRN570
1 2015-08-19 14:45:00 0.2878412 0.3213675 0.3327465 0.4172932
2 2015-08-19 14:50:00 0.2878412 0.3213675 0.3327465 0.4172932
3 2015-08-19 14:55:00 0.2878412 0.3213675 0.3327465 0.4172932
4 2015-08-19 15:00:00 0.2878412 0.3213675 0.3327465 0.4172932
5 2015-08-19 15:05:00 0.2878412 0.3213675 0.3327465 0.4172932
6 2015-08-19 15:10:00 0.2878412 0.3213675 0.3327465 0.4172932
18670 2015-08-19 14:45:00 0.2878412 0.3213675 0.3327465 0.4172932
18671 2015-08-19 14:50:00 0.2878412 0.3213675 0.3327465 0.4172932
18672 2015-10-23 10:40:00 0.1287671 0.1181319 0.2111437 0.2463768
18673 2015-08-19 15:00:00 0.1287671 0.1181319 0.2111437 0.2463768
18674 2015-08-19 15:05:00 0.1287671 0.1181319 0.2111437 0.2463768
18675 2015-08-19 15:10:00 0.1287671 0.1181319 0.2111437 0.2463768
任何帮助将不胜感激。
如果我正确理解你的问题,你可以使用 dplyr
和 tidyr
:
library(dplyr)
library(tidyr)
COR_trial %>%
complete(rDate = seq(min(rDate), max(rDate), by=300)) %>%
fill(starts_with("CCRN"))
complete
创建缺失的日期和时间seq(min(rDate), max(rDate), by=300)
创建一个日期和时间序列,从数据集中最低的 date/time 开始。步数始终为 5 分钟,以秒计算,因此by = 300
.fill
使用已知值并填充行直到下一个已知值。如果要向上填充行,可以将fill(starts_with("CCRN"))
更改为fill(starts_with("CCRN"), .direction="up")
。
这个returns
# A tibble: 18,648 x 5
rDate CCRN630 CCRN800 CCRN532 CCRN570
<dttm> <dbl> <dbl> <dbl> <dbl>
1 2015-08-19 16:45:00 NA NA NA NA
2 2015-08-19 16:50:00 NA NA NA NA
3 2015-08-19 16:55:00 NA NA NA NA
4 2015-08-19 17:00:00 NA NA 0.333 NA
5 2015-08-19 17:05:00 NA 0.321 0.333 NA
6 2015-08-19 17:10:00 0.288 0.321 0.333 0.417
7 2015-08-19 17:15:00 0.288 0.321 0.333 0.417
8 2015-08-19 17:20:00 0.288 0.321 0.333 0.417
9 2015-08-19 17:25:00 0.288 0.321 0.333 0.417
10 2015-08-19 17:30:00 0.288 0.321 0.333 0.417
数据
structure(list(rDate = structure(c(1445589600, 1439995500, 1439995800,
1439996100, 1439996400, 1439996700, 1439997000, 1439997300, 1439997600,
1439997900, 1439998200), tzone = "", class = c("POSIXct", "POSIXt"
)), CCRN630 = c(0.129, NA, NA, NA, NA, NA, 0.2878412, NA, NA,
NA, NA), CCRN800 = c(NA, NA, NA, NA, NA, 0.3213675, NA, NA, NA,
NA, NA), CCRN532 = c(NA, NA, NA, NA, 0.3327465, NA, NA, NA, NA,
NA, NA), CCRN570 = c(NA, NA, NA, NA, NA, NA, 0.4172932, NA, NA,
NA, NA)), row.names = c(NA, -11L), class = c("tbl_df", "tbl",
"data.frame"), problems = structure(list(row = 11L, col = NA_character_,
expected = "4 columns", actual = "5 columns", file = "literal data"), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame")))