在 R 中值 >0 的两列之间插入 0 的行值

Interpolate row-wise values of 0 between two columns with values >0 in R

我尝试在列的两个不等于零的值之间插入 0 值:R 中 data.table 的 2018 年到 2021 年。这就是示例数据 df1 的方式看起来像:

   ID string1 2018 2019 2020 2021 string2
1: a1      x2    3    3    0    4      si
2: a2      g3    5    5    4    0      q2
3: a3      n2   11    0    0    3      oq
4: a4      m3    3    0    9    8      mx
5: a5      2w    9    1    6    5      ix
6: a6     ps2    2    4    7    4      p2
7: a7     kg2    6    0    9    6      2q

为了方便重现:

df1 = data.table(
  ID = c("a1", "a2", "a3", "a4", "a5", "a6", "a7"),
  "string1" = c("x2", "g3", "n2", "m3", "2w", "ps2", "kg2"),
  "2018" = c(3,5,11,3,9,2,6),
  "2019" = c(3,5,0,0,1,4,0),
  "2020" = c(0,4,0,9,6,7,9),
  "2021" = c(4,0,3,8,5,4,6),
  "string2" = c("si", "q2", "oq", "mx", "ix", "p2", "2q"))

df1 中,存在两个数字之间为零的情况 >0(例如;第 1 行/2020 列、第 4 行/2019 列或第 7 行 2019 列)。我尝试识别这些情况并将它们与相邻列进行插值(例如;第 1 行/第 2020 列:3 + 4 =3.5)。

有办法解决吗?到目前为止,我只找到了一种方法来替换所有的零值,但没有条件是两个数字之间 >0.

我尝试得到这样的输出:

   ID string1 2018 2019 2020 2021 string2
1: a1      x2    3  3.0  3.5    4      si
2: a2      g3    5  5.0  4.0    0      q2
3: a3      n2   11  0.0  0.0    3      oq
4: a4      m3    3  6.0  9.0    8      mx
5: a5      2w    9  1.0  6.0    5      ix
6: a6     ps2    2  4.0  7.0    4      p2
7: a7     kg2    6  7.5  9.0    6      2q

非常感谢!

在两个正元素之间插入零的函数:

f <- function(vec){
  
  prev_val <- shift(vec, 1, fill = 0)
  next_val <- shift(vec, -1, fill = 0)
  
  fifelse(prev_val > 0 & next_val > 0 & vec == 0, (prev_val + next_val) / 2, vec)
}

将函数应用于年份列的所有行:

year_cols <- names(df1)[grep("^[0-9]+$", names(df1))]
df1[, (year_cols) := transpose(lapply(transpose(.SD), f)), .SDcols = year_cols]
使用

transpose 是因为您想对行进行更改。第二个用途是return把它变成列格式。

使用data.table函数(和原来的data.frame),这段代码(有点麻烦)应该可以工作:

for (i in c(2019,2020)){
  x = which(colnames(df1) == i)
  df1[,x] <- ifelse(c(df1[,.SD,.SDcols = x] == 0 & df1[,.SD,.SDcols = c(x-1)] > 0 & df1[,.SD,.SDcols = c(x+1)] > 0), 
                    rowMeans(df1[,.SD,.SDcols = c(x-1,x+1)]), unlist(df1[,.SD,.SDcols = x]))
}

> df1
   ID string1 2018 2019 2020 2021 string2
1: a1      x2    3  3.0  3.5    4      si
2: a2      g3    5  5.0  4.0    0      q2
3: a3      n2   11  0.0  0.0    3      oq
4: a4      m3    3  6.0  9.0    8      mx
5: a5      2w    9  1.0  6.0    5      ix
6: a6     ps2    2  4.0  7.0    4      p2
7: a7     kg2    6  7.5  9.0    6      2q

这是一个基本的 R 解决方案(使用 data.frame 而不是 data.table 来生成数据):

for (i in c("X2019","X2020")){
  x = which(colnames(df1) == i)
  df1[,x] <- ifelse(df1[,x] == 0 & df1[,x-1] > 0 & df1[,x+1] > 0, rowMeans(df1[,c(x-1,x+1)]), df1[,x])
}

也许这有点矫枉过正,但这里有一个使用两次重塑的解决方案:

melt(df1, measure.vars = patterns("^[0-9]+$")
     )[,value := fifelse(value == 0 &
                           shift(value, type = "lag", fill = 0) > 0 &
                           shift(value, type = "lead", fill = 0) > 0,
                         (shift(value, type = "lag") + shift(value, type = "lead")) / 2,
                         value), by = ID
       ][, dcast(.SD, ...~variable) ]

#    ID string1 string2 2018 2019 2020 2021
# 1: a1      x2      si    3  3.0  3.5    4
# 2: a2      g3      q2    5  5.0  4.0    0
# 3: a3      n2      oq   11  0.0  0.0    3
# 4: a4      m3      mx    3  6.0  9.0    8
# 5: a5      2w      ix    9  1.0  6.0    5
# 6: a6     ps2      p2    2  4.0  7.0    4
# 7: a7     kg2      2q    6  7.5  9.0    6

编辑: 要填写所有 NA,我们可以使用 zoo::na.approxzoo ::na.spline

cols <- grep("^[0-9]+$", names(df1), value = TRUE)

df1[, (cols) := transpose(lapply(transpose(.SD), function(i) zoo::na.approx(
  ifelse(i == 0, NA, i), na.rm = FALSE))),
  .SDcols = cols ]
# Using na.approx, notice 2nd row for 2021 is NA.
#    ID string1 2018     2019     2020 2021 string2
# 1: a1      x2    3 3.000000 3.500000    4      si
# 2: a2      g3    5 5.000000 4.000000   NA      q2
# 3: a3      n2   11 8.333333 5.666667    3      oq
# 4: a4      m3    3 6.000000 9.000000    8      mx
# 5: a5      2w    9 1.000000 6.000000    5      ix
# 6: a6     ps2    2 4.000000 7.000000    4      p2
# 7: a7     kg2    6 7.500000 9.000000    6      2q

# Using na.spline
#    ID string1 2018     2019     2020 2021 string2
# 1: a1      x2    3 3.000000 3.333333    4      si
# 2: a2      g3    5 5.000000 4.000000    2      q2
# 3: a3      n2   11 8.333333 5.666667    3      oq
# 4: a4      m3    3 7.333333 9.000000    8      mx
# 5: a5      2w    9 1.000000 6.000000    5      ix
# 6: a6     ps2    2 4.000000 7.000000    4      p2
# 7: a7     kg2    6 9.000000 9.000000    6      2q