在另一列中粘贴 NA 时的字符串
Paste string when NA in another column
这是我的数据示例:
my_df <- data.frame("R" = c("123", NA, NA, "456", "789", "123", NA),
"D" = c("abc", "def", "ghi", "jkl", "mno", "aze", "aze"),
stringsAsFactors = FALSE)
我想做的是,每当我在“R”列中有 NA 时,我想将内容粘贴到前一行的“D”列中,其中“R”不是 NA。
这是预期的结果:
my_result <- data.frame("R" = c("123", "456", "789", "123"),
"D" = c("abcdefghi", "ijk", "lmn", "azeaze"),
stringsAsFactors = FALSE)
tidyverse
my_df <- data.frame("R" = c("123", NA, NA, "456", "789", "123", NA),
"D" = c("abc", "def", "ghi", "jkl", "mno", "aze", "aze"),
stringsAsFactors = FALSE)
library(tidyverse)
my_df %>%
mutate(grp = cumsum(!is.na(R))) %>%
fill(R) %>%
group_by(R, grp) %>%
summarise(D = paste0(D, collapse = ""), .groups = "drop") %>%
arrange(grp) %>%
select(-grp)
#> # A tibble: 4 x 2
#> R D
#> <chr> <chr>
#> 1 123 abcdefghi
#> 2 456 jkl
#> 3 789 mno
#> 4 123 azeaze
由 reprex package (v2.0.1)
创建于 2021-12-07
data.table
library(data.table)
library(magrittr)
setDT(my_df)[, grp := cumsum(!is.na(R))] %>%
.[, R := zoo::na.locf(R)] %>%
.[, list(D = paste0(D, collapse = "")), by = list(R, grp)] %>%
.[, grp := NULL] %>%
.[]
#> R D
#> 1: 123 abcdefghi
#> 2: 456 jkl
#> 3: 789 mno
#> 4: 123 azeaze
由 reprex package (v2.0.1)
创建于 2021-12-07
在 cumsum(!is.na(my_df$R))
my_df$D
split
之后,您可以在 sapply
中使用 paste
。
i <- !is.na(my_df$R)
data.frame(my_df["R"][i,,drop=FALSE]
, D = sapply(split(my_df$D, cumsum(i)), paste, collapse = ""))
# R D
#1 123 abcdefghi
#4 456 jkl
#5 789 mno
#6 123 azeaze
这是我的数据示例:
my_df <- data.frame("R" = c("123", NA, NA, "456", "789", "123", NA),
"D" = c("abc", "def", "ghi", "jkl", "mno", "aze", "aze"),
stringsAsFactors = FALSE)
我想做的是,每当我在“R”列中有 NA 时,我想将内容粘贴到前一行的“D”列中,其中“R”不是 NA。
这是预期的结果:
my_result <- data.frame("R" = c("123", "456", "789", "123"),
"D" = c("abcdefghi", "ijk", "lmn", "azeaze"),
stringsAsFactors = FALSE)
tidyverse
my_df <- data.frame("R" = c("123", NA, NA, "456", "789", "123", NA),
"D" = c("abc", "def", "ghi", "jkl", "mno", "aze", "aze"),
stringsAsFactors = FALSE)
library(tidyverse)
my_df %>%
mutate(grp = cumsum(!is.na(R))) %>%
fill(R) %>%
group_by(R, grp) %>%
summarise(D = paste0(D, collapse = ""), .groups = "drop") %>%
arrange(grp) %>%
select(-grp)
#> # A tibble: 4 x 2
#> R D
#> <chr> <chr>
#> 1 123 abcdefghi
#> 2 456 jkl
#> 3 789 mno
#> 4 123 azeaze
由 reprex package (v2.0.1)
创建于 2021-12-07data.table
library(data.table)
library(magrittr)
setDT(my_df)[, grp := cumsum(!is.na(R))] %>%
.[, R := zoo::na.locf(R)] %>%
.[, list(D = paste0(D, collapse = "")), by = list(R, grp)] %>%
.[, grp := NULL] %>%
.[]
#> R D
#> 1: 123 abcdefghi
#> 2: 456 jkl
#> 3: 789 mno
#> 4: 123 azeaze
由 reprex package (v2.0.1)
创建于 2021-12-07在 cumsum(!is.na(my_df$R))
my_df$D
split
之后,您可以在 sapply
中使用 paste
。
i <- !is.na(my_df$R)
data.frame(my_df["R"][i,,drop=FALSE]
, D = sapply(split(my_df$D, cumsum(i)), paste, collapse = ""))
# R D
#1 123 abcdefghi
#4 456 jkl
#5 789 mno
#6 123 azeaze