滞后到具有条件 R 的值

Question

我需要一种有条件的方法来滞后到最后一行，其中的值比当前行低一个数字或“级别”。每当 type = "yes" 时，我想返回低一级到最后一个 "no" 并获取数量。例如，这里的第 2 行和第 3 行是类型“是”和级别 5。在这种情况下，我想返回最后一个级别 4“否”行，获取数量，并将其分配给新列。当类型为“否”时，不需要进行滞后操作。

Data:

row_id  level  type   quantity
1       4      no     100
2       5      yes    110
3       5      yes    115
4       2      no     500  
5       2      no     375
6       3      yes    250
7       3      yes    260
8       3      yes    420


Desired output:

row_id  level  type  quantity lagged_quantity
1       4      no    100      NA
2       5      yes   110      100
3       5      yes   115      100
4       2      no    500      NA
5       2      no    375      NA
6       3      yes   250      375
7       3      yes   260      375
8       3      yes   420      375

Data:

structure(list(row_id = c(1, 2, 3, 4, 5, 6, 7, 8), level = c(4, 
5, 5, 2, 2, 3, 3, 3), type = c("no", "yes", "yes", "no", "no", 
"yes", "yes", "yes"), quantity = c(100, 110, 115, 500, 375, 250, 
260, 420)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", 
"data.frame"))


Desired output:

structure(list(row_id = c(1, 2, 3, 4, 5, 6, 7, 8), level = c(4, 
5, 5, 2, 2, 3, 3, 3), type = c("no", "yes", "yes", "no", "no", 
"yes", "yes", "yes"), quantity = c(100, 110, 115, 500, 375, 250, 
260, 420), lagged_quantity = c("NA", "100", "100", "NA", "NA", 
"375", "375", "375")), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))

@Mossa

Answer 1

直接解决方案是：

df1 %>% 
  mutate(
    level_id = 1 + cumsum(c(1, diff(level)) < 0)
  ) %>%
  mutate(lagged_quantity = if_else(type == "yes", NA_real_, quantity)) %>% 
  fill(lagged_quantity) %>% 
  mutate(lagged_quantity = if_else(type == "no", NA_real_, lagged_quantity))

首先我们只保留您想要的值，然后用最后一个已知值填充缺失的条目，然后取出不需要滞后的 no 个答案。

Answer 2

选项data.table

library(data.table)
setDT(df1)[df1[, .(lagged_qty = last(quantity)), .(level, type)][,
   lagged_qty := shift(lagged_qty), .(grp = cumsum(type == 'no'))], 
    lagged_qty := lagged_qty, on = .(level, type)]

-输出

> df1
   row_id level   type quantity lagged_qty
    <int> <int> <char>    <int>      <int>
1:      1     4     no      100         NA
2:      2     5    yes      110        100
3:      3     5    yes      115        100
4:      4     2     no      500         NA
5:      5     2     no      375         NA
6:      6     3    yes      250        375
7:      7     3    yes      260        375
8:      8     3    yes      420        375

滞后到具有条件 R 的值

Lag back to a Value with Conditions R

r

lag

data-cleaning