如何使用 tidyverse 完整函数中的 fill 来填充所有数据框列?
How to use fill inside the tidyverse complete function to fill all dataframe columns?
当运行下面的代码时,我生成以下测试数据帧输出,对于原始data
数据帧和运行函数state_inflow
:
> test
Previous_State 1 2 3
1: X0 2 0 0
2: X1 0 0 0
3: X2 0 0 1
library(data.table)
library(dplyr)
library(tidyverse)
data <-
data.frame(
ID = c(1,1,1,2,2,2,3,3,3),
Period_1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
Period_2 = c("2020-01","2020-02","2020-03","2020-04","2020-05","2020-06","2020-02","2020-03","2020-04"),
Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
State = c("X0","X1","X2","X0","X2","X0", "X2","X1","X3")
)
state_inflow <- function(mydat, target_state, period_col_name, fct) {
dcast(
setDT(mydat)[, Previous_State := factor(shift(State, fill = target_state)), by = ID][
, period_factor := lapply(.SD, factor), .SDcols = period_col_name],
Previous_State ~ period_factor, fct,
value.var = "Values", subset = .(State == target_state), drop = FALSE
)
}
test <- state_inflow(data, "X0", "Period_1", length)
我正在向数据框添加一行,以包括那些从不触及 target_state
类别的“状态”组合(请参阅 data
数据框中的 ID 3;跨时期它从不触及x0 的目标状态,因此被排除在上面显示的原始 test
输出之外),并用 0 填充为该新行添加的所有这些列。我现在这样做如下:
test %>%
complete(Previous_State = unique(data$State)) %>%
replace(is.na(.), 0)
这给了我正确的输出:
# A tibble: 4 x 4
Previous_State `1` `2` `3`
<chr> <int> <int> <int>
1 X0 2 0 0
2 X1 0 0 0
3 X2 0 0 1
4 X3 0 0 0
看到第 4 行“X3”是如何添加全 0 的了吗?这是正确的输出。
我正在努力学习如何使用 complete(... ,fill = ...)
。我将如何完成我上面所做的,而是在 complete(...)
函数中使用 fill = ...
?
complete
的 fill
参数需要一个列表来设置每个单独列的值。默认情况下,所有列都是 NA
。您可以通过分别为每一列设置所需的填充值来更改此设置:
test %>%
complete(Previous_State = unique(data$State),
fill = list(`1` = 0, `2` = 0, `3` = 0))
# A tibble: 4 x 4
# Previous_State `1` `2` `3`
# <chr> <dbl> <dbl> <dbl>
#1 X0 2 0 0
#2 X1 0 0 0
#3 X2 0 0 1
#4 X3 0 0 0
因为你的问题是关于 tidyverse 的:一个 tidy 数据框被规范化了,所以你通常只有一列对应每个 属性。这使得完成归档相同结果变得更加容易:
test %>%
pivot_longer(matches("^[0-9]+$")) %>%
complete(Previous_State = unique(data$State), name,
fill = list(value = 0)) %>%
pivot_wider()
当运行下面的代码时,我生成以下测试数据帧输出,对于原始data
数据帧和运行函数state_inflow
:
> test
Previous_State 1 2 3
1: X0 2 0 0
2: X1 0 0 0
3: X2 0 0 1
library(data.table)
library(dplyr)
library(tidyverse)
data <-
data.frame(
ID = c(1,1,1,2,2,2,3,3,3),
Period_1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
Period_2 = c("2020-01","2020-02","2020-03","2020-04","2020-05","2020-06","2020-02","2020-03","2020-04"),
Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
State = c("X0","X1","X2","X0","X2","X0", "X2","X1","X3")
)
state_inflow <- function(mydat, target_state, period_col_name, fct) {
dcast(
setDT(mydat)[, Previous_State := factor(shift(State, fill = target_state)), by = ID][
, period_factor := lapply(.SD, factor), .SDcols = period_col_name],
Previous_State ~ period_factor, fct,
value.var = "Values", subset = .(State == target_state), drop = FALSE
)
}
test <- state_inflow(data, "X0", "Period_1", length)
我正在向数据框添加一行,以包括那些从不触及 target_state
类别的“状态”组合(请参阅 data
数据框中的 ID 3;跨时期它从不触及x0 的目标状态,因此被排除在上面显示的原始 test
输出之外),并用 0 填充为该新行添加的所有这些列。我现在这样做如下:
test %>%
complete(Previous_State = unique(data$State)) %>%
replace(is.na(.), 0)
这给了我正确的输出:
# A tibble: 4 x 4
Previous_State `1` `2` `3`
<chr> <int> <int> <int>
1 X0 2 0 0
2 X1 0 0 0
3 X2 0 0 1
4 X3 0 0 0
看到第 4 行“X3”是如何添加全 0 的了吗?这是正确的输出。
我正在努力学习如何使用 complete(... ,fill = ...)
。我将如何完成我上面所做的,而是在 complete(...)
函数中使用 fill = ...
?
complete
的 fill
参数需要一个列表来设置每个单独列的值。默认情况下,所有列都是 NA
。您可以通过分别为每一列设置所需的填充值来更改此设置:
test %>%
complete(Previous_State = unique(data$State),
fill = list(`1` = 0, `2` = 0, `3` = 0))
# A tibble: 4 x 4
# Previous_State `1` `2` `3`
# <chr> <dbl> <dbl> <dbl>
#1 X0 2 0 0
#2 X1 0 0 0
#3 X2 0 0 1
#4 X3 0 0 0
因为你的问题是关于 tidyverse 的:一个 tidy 数据框被规范化了,所以你通常只有一列对应每个 属性。这使得完成归档相同结果变得更加容易:
test %>%
pivot_longer(matches("^[0-9]+$")) %>%
complete(Previous_State = unique(data$State), name,
fill = list(value = 0)) %>%
pivot_wider()