如何使用 tidyverse 完整函数中的 fill 来填充所有数据框列?

How to use fill inside the tidyverse complete function to fill all dataframe columns?

当运行下面的代码时,我生成以下测试数据帧输出,对于原始data数据帧和运行函数state_inflow:

> test
   Previous_State 1 2 3
1:             X0 2 0 0
2:             X1 0 0 0
3:             X2 0 0 1

    library(data.table)
    library(dplyr)
    library(tidyverse)
    
    data <- 
      data.frame(
        ID = c(1,1,1,2,2,2,3,3,3),
        Period_1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
        Period_2 = c("2020-01","2020-02","2020-03","2020-04","2020-05","2020-06","2020-02","2020-03","2020-04"),
        Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
        State = c("X0","X1","X2","X0","X2","X0", "X2","X1","X3")
      )
    
    state_inflow <- function(mydat, target_state, period_col_name, fct) {
      dcast(
        setDT(mydat)[, Previous_State := factor(shift(State, fill = target_state)), by = ID][
          , period_factor := lapply(.SD, factor), .SDcols = period_col_name],
        Previous_State ~ period_factor, fct, 
        value.var = "Values", subset = .(State == target_state), drop = FALSE
      ) 
    }
    
    test <- state_inflow(data, "X0", "Period_1", length) 

我正在向数据框添加一行,以包括那些从不触及 target_state 类别的“状态”组合(请参阅 data 数据框中的 ID 3;跨时期它从不触及x0 的目标状态,因此被排除在上面显示的原始 test 输出之外),并用 0 填充为该新行添加的所有这些列。我现在这样做如下:

test %>%
  complete(Previous_State = unique(data$State)) %>%
  replace(is.na(.), 0)

这给了我正确的输出:

# A tibble: 4 x 4
  Previous_State   `1`   `2`   `3`
  <chr>          <int> <int> <int>
1 X0                 2     0     0
2 X1                 0     0     0
3 X2                 0     0     1
4 X3                 0     0     0

看到第 4 行“X3”是如何添加全 0 的了吗?这是正确的输出。

我正在努力学习如何使用 complete(... ,fill = ...)。我将如何完成我上面所做的,而是在 complete(...) 函数中使用 fill = ...

completefill 参数需要一个列表来设置每个单独列的值。默认情况下,所有列都是 NA。您可以通过分别为每一列设置所需的填充值来更改此设置:

test %>%
  complete(Previous_State = unique(data$State),
    fill = list(`1` = 0, `2` = 0, `3` = 0))

# A tibble: 4 x 4
#  Previous_State   `1`   `2`   `3`
#  <chr>          <dbl> <dbl> <dbl>
#1 X0                 2     0     0
#2 X1                 0     0     0
#3 X2                 0     0     1
#4 X3                 0     0     0

因为你的问题是关于 tidyverse 的:一个 tidy 数据框被规范化了,所以你通常只有一列对应每个 属性。这使得完成归档相同结果变得更加容易:

test %>%
  pivot_longer(matches("^[0-9]+$")) %>%
  complete(Previous_State = unique(data$State), name,
    fill = list(value = 0)) %>%
  pivot_wider()