缺失值和行

Missing Values and Rows

如果这是一个重复的问题,我深表歉意,我似乎找不到类似的东西。

我有一些正在清理的数据,我需要填充缺失值。数据看起来像这样,下面是 dput。打印中删除了小数,但包含在 dput 中。

> print(tbl_df(df), n=26)
# A tibble: 26 x 6
   Year  Trial  Group1  Group2 Group3  Group4
   <chr> <dbl>   <dbl>   <dbl>  <dbl>   <dbl>
 1 Year1     2 346588. 156266  34806.     NA 
 2 Year1     3 342573      NA  34652. 292001.
 3 Year1     5 286285. 129257. 29645. 252786.
 4 Year1     7 234410.     NA  24536.     NA 
 5 Year1     9 184733.  82944.    NA  170653 
 6 Year1    10     NA   81419. 19461  167273.
 7 Year1    11 169620.  74688. 18065  155442 
 8 Year1    14 107652   48381. 11941. 100076 
 9 Year1    15  88440   39807  10123.  83137 
10 Year1    17     NA   31608   7926   64551.
11 Year1    18  63622   29236   7444.  58848.
12 Year1    22  14143.   6366.  1683.  10889.
13 Year2    22 279904  102271  28221. 138804.
14 Year2    25 200386   78628. 21942      NA 
15 Year2    26 157182.     NA  18099.  91963.
16 Year2    28 121122.  54538  14532.  76422 
17 Year2    30  25899.  16773    489.     NA 
18 Year2    32 112091.  51219. 11298.  71655.
19 Year2    33 108756   49311. 10589.  70167 
20 Year2    34     NA   49127.    NA   69195.
21 Year2    36 104827   42651.  8568.  63580.
22 Year2    38  44849   14114   2302.  11652 
23 Year2    40 104407.  42545   6240   63318.
24 Year2    41  99059.  38423   6766.  58017 
25 Year2    42     NA   40432.    NA   57932.
26 Year2    44  49119.   8796.  4769.  11233.



dput(df)
structure(list(Year = c("Year1", "Year1", "Year1", "Year1", "Year1", 
"Year1", "Year1", "Year1", "Year1", "Year1", "Year1", "Year1", 
"Year2", "Year2", "Year2", "Year2", "Year2", "Year2", "Year2", 
"Year2", "Year2", "Year2", "Year2", "Year2", "Year2", "Year2"
), Trial = c(2, 3, 5, 7, 9, 10, 11, 14, 15, 17, 18, 22, 22, 25, 
26, 28, 30, 32, 33, 34, 36, 38, 40, 41, 42, 44), Group1 = c(346587.6667, 
342573, 286285.3333, 234409.6667, 184733.3333, NA, 169620.3333, 
107652, 88440, NA, 63622, 14143.33333, 279904, 200386, 157182.3333, 
121122.3333, 25899.33333, 112090.6667, 108756, NA, 104827, 44849, 
104407.3333, 99058.66667, NA, 49119.33333), Group2 = c(156266, 
NA, 129257.3333, NA, 82943.66667, 81419.33333, 74688.33333, 48381.33333, 
39807, 31608, 29236, 6365.666667, 102271, 78628.33333, NA, 54538, 
16773, 51218.66667, 49311.33333, 49127.33333, 42650.66667, 14114, 
42545, 38423, 40432.33333, 8795.666667), Group3 = c(34805.66667, 
34651.66667, 29644.66667, 24535.66667, NA, 19461, 18065, 11941.33333, 
10123.33333, 7926, 7444.333333, 1683.333333, 28221.33333, 21942, 
18099.33333, 14532.33333, 489.3333333, 11297.66667, 10588.66667, 
NA, 8567.666667, 2302.333333, 6240, 6765.666667, NA, 4769.333333
), Group4 = c(NA, 292000.6667, 252785.6667, NA, 170653, 167273.3333, 
155442, 100076, 83137, 64551.33333, 58847.66667, 10888.66667, 
138803.6667, NA, 91963.33333, 76422, NA, 71655.33333, 70167, 
69195.33333, 63579.66667, 11652, 63317.66667, 58017, 57932.33333, 
11232.66667)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -26L), spec = structure(list(cols = list(
    Year = structure(list(), class = c("collector_character", 
    "collector")), Trial = structure(list(), class = c("collector_double", 
    "collector")), Group1 = structure(list(), class = c("collector_double", 
    "collector")), Group2 = structure(list(), class = c("collector_double", 
    "collector")), Group3 = structure(list(), class = c("collector_double", 
    "collector")), Group4 = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))

基本上,我需要用之前的试验(试验按降序排列)填充 na 值。例如,我需要用第 6 行第 4 列的数据填充第 6 行第 3 列。

但这还不是全部。我需要为缺少试验的日子创建一行,然后用最后一次试验填充这些行。这就是我被挂断的事情。有没有办法同时实现这两个目标?

例如,我需要将 tail(df) 从 A 更改为 B。

A.

 Year  Trial  Group1 Group2 Group3 Group4
  <chr> <dbl>   <dbl>  <dbl>  <dbl>  <dbl>
1 Year2    40 104407. 42545   6240  63318.
2 Year2    41  99059. 38423   6766. 58017 
3 Year2    42     NA  40432.    NA  57932.
4 Year2    44  49119.  8796.  4769. 11233.

B.

  Year  Trial  Group1 Group2 Group3 Group4
  <chr> <dbl>   <dbl>  <dbl>  <dbl>  <dbl>
1 Year2    40 104407. 42545   6240  63318.
2 Year2    41  99059. 38423   6766. 58017 
3 Year2    42  49119. 40432.  4769. 57932.
4 Year2    43  49119. 40432.  4769. 57932.
5 Year2    44  49119.  8796.  4769. 11233.

您可以将 completefill.direction = 'up'

一起使用
library(dplyr)
library(tidyr)

df %>%
  group_by(Year) %>%
  complete(Trial = min(Trial):max(Trial)) %>%
  fill(starts_with('Group'), .direction = 'up') %>%
  ungroup

# A tibble: 44 x 6
#   Year  Trial  Group1  Group2 Group3  Group4
#   <chr> <dbl>   <dbl>   <dbl>  <dbl>   <dbl>
# 1 Year1     2 346588. 156266  34806. 292001.
# 2 Year1     3 342573  129257. 34652. 292001.
# 3 Year1     4 286285. 129257. 29645. 252786.
# 4 Year1     5 286285. 129257. 29645. 252786.
# 5 Year1     6 234410.  82944. 24536. 170653 
# 6 Year1     7 234410.  82944. 24536. 170653 
# 7 Year1     8 184733.  82944. 19461  170653 
# 8 Year1     9 184733.  82944. 19461  170653 
# 9 Year1    10 169620.  81419. 19461  167273.
#10 Year1    11 169620.  74688. 18065  155442 
# … with 34 more rows