填充数字和字符列,条件是上一个和下一个值相等
Fill both numerical and character columns, conditional on the previous and next value being equal
请注意我的问题不同于:
在下面的数据中,我想根据 areacode
和 type
的邮政编码 NA
的条件填写数字和字符列 NA
=14=] 与 NA
.
后邮政编码的 areacode
和 type
相同
换句话说:“因为邮政编码 1002 有粘土,邮政编码 1004 有粘土,我们假设邮政编码 1003 有粘土。”
我想用,但是na.fill
只能填数值
dat <- structure(list(zipcode = c(1001, 1002, 1003, 1004), areacode = c(4,
4, NA, 4), type = structure(c(3L, 3L, NA, 3L), .Label = c("",
"sand", "clay", "na2"), class = "factor"), region = c(3, 3,
NA, 3)), class = c("data.table", "data.frame"), row.names = c(NA,
-4L))
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 NA <NA> NA
4: 1004 4 clay 3
dat2 <- structure(list(zipcode = c(1001, 1002, 1003, 1004), areacode = c(4,
4, NA, 1), type = structure(c(3L, 3L, NA, 2L), .Label = c("",
"sand", "clay", "na2"), class = "factor"), region = c(3, 3, NA,
3)), class = c("data.table", "data.frame"), row.names = c(NA,
-4L))
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 NA <NA> NA
4: 1004 1 sand 3
最好的方法是什么?
期望的输出dat
:
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 4 clay 3
4: 1004 4 clay 3
期望的输出dat2
:
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 NA <NA> NA
4: 1004 1 sand 3
编辑:
下面是不够的,因为即使第四行说sand
.
也会填clay
dat2 %>%
fill(areacode, type, region)
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 4 clay 3
4: 1004 1 sand 3
dat2[, lapply(.SD, zoo::na.locf)]
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 4 clay 3
4: 1004 1 sand 3
使用dplyr
:
library(dplyr)
dat2 |>
mutate(type = as.character(type)) |>
mutate(across(2:4,
~ ifelse(is.na(.) & lag(areacode) == lead(areacode) & lag(type) == lead(type),
lag(.),
.)))
zipcode areacode type region
1 1001 4 clay 3
2 1002 4 clay 3
3 1003 NA <NA> NA
4 1004 1 sand 3
dat |>
mutate(type = as.character(type)) |>
mutate(across(2:4,
~ ifelse(is.na(.) & lag(areacode) == lead(areacode) & lag(type) == lead(type),
lag(.),
.)))
zipcode areacode type region
1 1001 4 clay 3
2 1002 4 clay 3
3 1003 4 clay 3
4 1004 4 clay 3
请注意我的问题不同于
在下面的数据中,我想根据 areacode
和 type
的邮政编码 NA
的条件填写数字和字符列 NA
=14=] 与 NA
.
areacode
和 type
相同
换句话说:“因为邮政编码 1002 有粘土,邮政编码 1004 有粘土,我们假设邮政编码 1003 有粘土。”
我想用na.fill
只能填数值
dat <- structure(list(zipcode = c(1001, 1002, 1003, 1004), areacode = c(4,
4, NA, 4), type = structure(c(3L, 3L, NA, 3L), .Label = c("",
"sand", "clay", "na2"), class = "factor"), region = c(3, 3,
NA, 3)), class = c("data.table", "data.frame"), row.names = c(NA,
-4L))
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 NA <NA> NA
4: 1004 4 clay 3
dat2 <- structure(list(zipcode = c(1001, 1002, 1003, 1004), areacode = c(4,
4, NA, 1), type = structure(c(3L, 3L, NA, 2L), .Label = c("",
"sand", "clay", "na2"), class = "factor"), region = c(3, 3, NA,
3)), class = c("data.table", "data.frame"), row.names = c(NA,
-4L))
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 NA <NA> NA
4: 1004 1 sand 3
最好的方法是什么?
期望的输出dat
:
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 4 clay 3
4: 1004 4 clay 3
期望的输出dat2
:
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 NA <NA> NA
4: 1004 1 sand 3
编辑:
下面是不够的,因为即使第四行说sand
.
clay
dat2 %>%
fill(areacode, type, region)
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 4 clay 3
4: 1004 1 sand 3
dat2[, lapply(.SD, zoo::na.locf)]
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 4 clay 3
4: 1004 1 sand 3
使用dplyr
:
library(dplyr)
dat2 |>
mutate(type = as.character(type)) |>
mutate(across(2:4,
~ ifelse(is.na(.) & lag(areacode) == lead(areacode) & lag(type) == lead(type),
lag(.),
.)))
zipcode areacode type region
1 1001 4 clay 3
2 1002 4 clay 3
3 1003 NA <NA> NA
4 1004 1 sand 3
dat |>
mutate(type = as.character(type)) |>
mutate(across(2:4,
~ ifelse(is.na(.) & lag(areacode) == lead(areacode) & lag(type) == lead(type),
lag(.),
.)))
zipcode areacode type region
1 1001 4 clay 3
2 1002 4 clay 3
3 1003 4 clay 3
4 1004 4 clay 3