根据条件更新多行
update multiple rows based on conditions
我想在一列的基础上同时更新三列
我的数据是这样的
df <- data.frame(input = c("Antidesma cuspidatum Mull.Arg.", "Antidesma cuspidatum Müll.Arg.",
"Alchornea parviflora (Benth.) Mull.Arg.", "Alchornea parviflora (Benth.) Müll.Arg."),
n1 = c("Antidesma cuspidatum", NA, "Alchornea parviflora", NA),
n2 = c("Antidesma", NA, "Alchornea", NA),
n3 = c("Phyllanthaceae", NA, "Euphorbiaceae", NA))
input n1 n2 n3
1 Antidesma cuspidatum Mull.Arg. Antidesma cuspidatum Antidesma Phyllanthaceae
2 Antidesma cuspidatum Müll.Arg. <NA> <NA> <NA>
3 Alchornea parviflora (Benth.) Mull.Arg. Alchornea parviflora Alchornea Euphorbiaceae
4 Alchornea parviflora (Benth.) Müll.Arg. <NA> <NA> <NA>
请问如果我发现 input
列的前两个 strings
相同,则相应的行也相同。这意味着本例中 n1
、n2
、n3
的值(第 2 行和第 4 行)将加上值(第 1 行和第 3 行)。
这里是我想要的输出
input n1 n2 n3
1 Antidesma cuspidatum Mull.Arg. Antidesma cuspidatum Antidesma Phyllanthaceae
2 Antidesma cuspidatum Müll.Arg. Antidesma cuspidatum Antidesma Phyllanthaceae
3 Alchornea parviflora (Benth.) Mull.Arg. Alchornea parviflora Alchornea Euphorbiaceae
4 Alchornea parviflora (Benth.) Müll.Arg. Alchornea parviflora Alchornea Euphorbiaceae
这个案例对我有什么建议吗?
您可以使用 dplyr
包。
首先,我创建一个列 gr
,其中仅包含 input
的前两个字符串。然后我通过将该组的非 NA 值放在那里来更改(或 mutate
)列 n1
、n2
和 n3
。
library(dplyr)
df %>%
group_by(gr = gsub("(^\w+ \w+) .*", "\1", input)) %>%
mutate(across(c(n1, n2, n3), ~.x[!is.na(.x)][1])) %>%
ungroup()
基础 R 解决方案:
# Resolve the names of column vectors prefixed with "n":
# na_col_names => character vector
na_col_names <- grep(
"n\d+",
names(df),
value = TRUE
)
# Carry the last value forward: df => data.frame
df[,na_col_names] <- lapply(
na_col_names,
function(x){
df[,x] <- na.omit(df[,x])[cumsum(!(is.na(df[,x])))]
}
)
Tidyverse:
library(tidyverse)
df %>%
mutate_if(
str_detect("n\d+", names(.)),
function(x){
fill(x, .direction = "down")
}
)
我想在一列的基础上同时更新三列
我的数据是这样的
df <- data.frame(input = c("Antidesma cuspidatum Mull.Arg.", "Antidesma cuspidatum Müll.Arg.",
"Alchornea parviflora (Benth.) Mull.Arg.", "Alchornea parviflora (Benth.) Müll.Arg."),
n1 = c("Antidesma cuspidatum", NA, "Alchornea parviflora", NA),
n2 = c("Antidesma", NA, "Alchornea", NA),
n3 = c("Phyllanthaceae", NA, "Euphorbiaceae", NA))
input n1 n2 n3
1 Antidesma cuspidatum Mull.Arg. Antidesma cuspidatum Antidesma Phyllanthaceae
2 Antidesma cuspidatum Müll.Arg. <NA> <NA> <NA>
3 Alchornea parviflora (Benth.) Mull.Arg. Alchornea parviflora Alchornea Euphorbiaceae
4 Alchornea parviflora (Benth.) Müll.Arg. <NA> <NA> <NA>
请问如果我发现 input
列的前两个 strings
相同,则相应的行也相同。这意味着本例中 n1
、n2
、n3
的值(第 2 行和第 4 行)将加上值(第 1 行和第 3 行)。
这里是我想要的输出
input n1 n2 n3
1 Antidesma cuspidatum Mull.Arg. Antidesma cuspidatum Antidesma Phyllanthaceae
2 Antidesma cuspidatum Müll.Arg. Antidesma cuspidatum Antidesma Phyllanthaceae
3 Alchornea parviflora (Benth.) Mull.Arg. Alchornea parviflora Alchornea Euphorbiaceae
4 Alchornea parviflora (Benth.) Müll.Arg. Alchornea parviflora Alchornea Euphorbiaceae
这个案例对我有什么建议吗?
您可以使用 dplyr
包。
首先,我创建一个列 gr
,其中仅包含 input
的前两个字符串。然后我通过将该组的非 NA 值放在那里来更改(或 mutate
)列 n1
、n2
和 n3
。
library(dplyr)
df %>%
group_by(gr = gsub("(^\w+ \w+) .*", "\1", input)) %>%
mutate(across(c(n1, n2, n3), ~.x[!is.na(.x)][1])) %>%
ungroup()
基础 R 解决方案:
# Resolve the names of column vectors prefixed with "n":
# na_col_names => character vector
na_col_names <- grep(
"n\d+",
names(df),
value = TRUE
)
# Carry the last value forward: df => data.frame
df[,na_col_names] <- lapply(
na_col_names,
function(x){
df[,x] <- na.omit(df[,x])[cumsum(!(is.na(df[,x])))]
}
)
Tidyverse:
library(tidyverse)
df %>%
mutate_if(
str_detect("n\d+", names(.)),
function(x){
fill(x, .direction = "down")
}
)