滞后到具有条件 R 的值
Lag back to a Value with Conditions R
我需要一种有条件的方法来滞后到最后一行,其中的值比当前行低一个数字或“级别”。每当 type = "yes" 时,我想返回低一级到最后一个 "no" 并获取数量。例如,这里的第 2 行和第 3 行是类型“是”和级别 5。在这种情况下,我想返回最后一个级别 4“否”行,获取数量,并将其分配给新列。当类型为“否”时,不需要进行滞后操作。
Data:
row_id level type quantity
1 4 no 100
2 5 yes 110
3 5 yes 115
4 2 no 500
5 2 no 375
6 3 yes 250
7 3 yes 260
8 3 yes 420
Desired output:
row_id level type quantity lagged_quantity
1 4 no 100 NA
2 5 yes 110 100
3 5 yes 115 100
4 2 no 500 NA
5 2 no 375 NA
6 3 yes 250 375
7 3 yes 260 375
8 3 yes 420 375
Data:
structure(list(row_id = c(1, 2, 3, 4, 5, 6, 7, 8), level = c(4,
5, 5, 2, 2, 3, 3, 3), type = c("no", "yes", "yes", "no", "no",
"yes", "yes", "yes"), quantity = c(100, 110, 115, 500, 375, 250,
260, 420)), row.names = c(NA, -8L), class = c("tbl_df", "tbl",
"data.frame"))
Desired output:
structure(list(row_id = c(1, 2, 3, 4, 5, 6, 7, 8), level = c(4,
5, 5, 2, 2, 3, 3, 3), type = c("no", "yes", "yes", "no", "no",
"yes", "yes", "yes"), quantity = c(100, 110, 115, 500, 375, 250,
260, 420), lagged_quantity = c("NA", "100", "100", "NA", "NA",
"375", "375", "375")), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
@Mossa
直接解决方案是:
df1 %>%
mutate(
level_id = 1 + cumsum(c(1, diff(level)) < 0)
) %>%
mutate(lagged_quantity = if_else(type == "yes", NA_real_, quantity)) %>%
fill(lagged_quantity) %>%
mutate(lagged_quantity = if_else(type == "no", NA_real_, lagged_quantity))
首先我们只保留您想要的值,然后用最后一个已知值填充缺失的条目,然后取出不需要滞后的 no
个答案。
选项data.table
library(data.table)
setDT(df1)[df1[, .(lagged_qty = last(quantity)), .(level, type)][,
lagged_qty := shift(lagged_qty), .(grp = cumsum(type == 'no'))],
lagged_qty := lagged_qty, on = .(level, type)]
-输出
> df1
row_id level type quantity lagged_qty
<int> <int> <char> <int> <int>
1: 1 4 no 100 NA
2: 2 5 yes 110 100
3: 3 5 yes 115 100
4: 4 2 no 500 NA
5: 5 2 no 375 NA
6: 6 3 yes 250 375
7: 7 3 yes 260 375
8: 8 3 yes 420 375
我需要一种有条件的方法来滞后到最后一行,其中的值比当前行低一个数字或“级别”。每当 type = "yes" 时,我想返回低一级到最后一个 "no" 并获取数量。例如,这里的第 2 行和第 3 行是类型“是”和级别 5。在这种情况下,我想返回最后一个级别 4“否”行,获取数量,并将其分配给新列。当类型为“否”时,不需要进行滞后操作。
Data:
row_id level type quantity
1 4 no 100
2 5 yes 110
3 5 yes 115
4 2 no 500
5 2 no 375
6 3 yes 250
7 3 yes 260
8 3 yes 420
Desired output:
row_id level type quantity lagged_quantity
1 4 no 100 NA
2 5 yes 110 100
3 5 yes 115 100
4 2 no 500 NA
5 2 no 375 NA
6 3 yes 250 375
7 3 yes 260 375
8 3 yes 420 375
Data:
structure(list(row_id = c(1, 2, 3, 4, 5, 6, 7, 8), level = c(4,
5, 5, 2, 2, 3, 3, 3), type = c("no", "yes", "yes", "no", "no",
"yes", "yes", "yes"), quantity = c(100, 110, 115, 500, 375, 250,
260, 420)), row.names = c(NA, -8L), class = c("tbl_df", "tbl",
"data.frame"))
Desired output:
structure(list(row_id = c(1, 2, 3, 4, 5, 6, 7, 8), level = c(4,
5, 5, 2, 2, 3, 3, 3), type = c("no", "yes", "yes", "no", "no",
"yes", "yes", "yes"), quantity = c(100, 110, 115, 500, 375, 250,
260, 420), lagged_quantity = c("NA", "100", "100", "NA", "NA",
"375", "375", "375")), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
@Mossa
直接解决方案是:
df1 %>%
mutate(
level_id = 1 + cumsum(c(1, diff(level)) < 0)
) %>%
mutate(lagged_quantity = if_else(type == "yes", NA_real_, quantity)) %>%
fill(lagged_quantity) %>%
mutate(lagged_quantity = if_else(type == "no", NA_real_, lagged_quantity))
首先我们只保留您想要的值,然后用最后一个已知值填充缺失的条目,然后取出不需要滞后的 no
个答案。
选项data.table
library(data.table)
setDT(df1)[df1[, .(lagged_qty = last(quantity)), .(level, type)][,
lagged_qty := shift(lagged_qty), .(grp = cumsum(type == 'no'))],
lagged_qty := lagged_qty, on = .(level, type)]
-输出
> df1
row_id level type quantity lagged_qty
<int> <int> <char> <int> <int>
1: 1 4 no 100 NA
2: 2 5 yes 110 100
3: 3 5 yes 115 100
4: 4 2 no 500 NA
5: 5 2 no 375 NA
6: 6 3 yes 250 375
7: 7 3 yes 260 375
8: 8 3 yes 420 375