如果条件在 R 中匹配,则将一列拆分为两个现有列
split a column in two existing columns if a condition is matched in R
我有一个这样的 df:
df <- data.frame(id=c("j1", "j2", "j3/j9", "j5", "j2/j8", "j3/j4"), dad=c("j10", "j11", "", "j13", "", ""), mom=c("k2", "k4", "", "k6", "", ""))
我试图只拆分那些在“id”列中包含斜杠“/”的单元格。我想在现有列“爸爸”和“妈妈”中获取拆分字符串。期望的输出是这样的:
df2 <- data.frame(id=c("j1", "j2", "j3/j9", "j5", "j2/j8", "j3/j4"), dad=c("j10", "j11", "j3", "j13", "j2", "j3"), mom=c("k2", "k4", "j9", "k6", "j8", "j4"))
我正在尝试这段代码:
df3 <- tidyr::separate(data = df, col = "id", into = c("dad", "mom"), sep = "/")
但是将整个列“id”拆分为两个新列。知道如何解决这个问题吗?
这是在 separate
ing 之后使用 coalesce
的一种方法 - 将空格 (""
) 转换为 NA
(na_if
), separate
将 'id' 放入 'dad2'、'mom2' 列,循环 across
'dad'、'mom' 列和 coalesce
与相应的 'dad2'、'mom2' 列
library(dplyr)
library(tidyr)
library(stringr)
df %>%
na_if("") %>%
separate(id, into = c("dad2", "mom2"), sep = "/", fill = "right",
remove = FALSE) %>%
mutate(across(dad:mom, ~ coalesce(.x, get(str_c(cur_column(),
2)))), .keep = "unused")
-输出
id dad mom
1 j1 j10 k2
2 j2 j11 k4
3 j3/j9 j3 j9
4 j5 j13 k6
5 j2/j8 j2 j8
6 j3/j4 j3 j4
或者 across2
来自 dplyover
稍微有用一点
library(dplyover)
df %>%
na_if("") %>%
separate(id, into = c("dad2", "mom2"), sep = "/", fill = "right",
remove = FALSE) %>%
mutate(across2(dad:mom, dad2:mom2, coalesce, .names = "{xcol}")) %>%
select(names(df))
您可以使用:
library(dplyr)
library(stringr)
df %>%
mutate(dad = if_else(str_detect(id, "/"), str_extract(id, ".*(?=/)"), dad),
mom = if_else(str_detect(id, "/"), str_extract(id, "(?<=/).*"), mom))
这个returns
id dad mom
1 j1 j10 k2
2 j2 j11 k4
3 j3/j9 j3 j9
4 j5 j13 k6
5 j2/j8 j2 j8
6 j3/j4 j3 j4
您可以使用 grep
获取包含 /
的行,而不是使用 strsplit
并将结果插入 df。
i <- grep("/", df$id)
df[i, c("dad", "mom")] <- do.call(rbind, strsplit(df$id[i], "/"))
#df[i, -1] <- do.call(rbind, strsplit(df$id[i], "/")) #Alternative
df
## id dad mom
#1 j1 j10 k2
#2 j2 j11 k4
#3 j3/j9 j3 j9
#4 j5 j13 k6
#5 j2/j8 j2 j8
#6 j3/j4 j3 j4
或使用sub
.
i <- grep("/", df$id)
df$dad[i] <- sub("/.*", "", df$id[i])
df$mom[i] <- sub(".*/", "", df$id[i])
我有一个这样的 df:
df <- data.frame(id=c("j1", "j2", "j3/j9", "j5", "j2/j8", "j3/j4"), dad=c("j10", "j11", "", "j13", "", ""), mom=c("k2", "k4", "", "k6", "", ""))
我试图只拆分那些在“id”列中包含斜杠“/”的单元格。我想在现有列“爸爸”和“妈妈”中获取拆分字符串。期望的输出是这样的:
df2 <- data.frame(id=c("j1", "j2", "j3/j9", "j5", "j2/j8", "j3/j4"), dad=c("j10", "j11", "j3", "j13", "j2", "j3"), mom=c("k2", "k4", "j9", "k6", "j8", "j4"))
我正在尝试这段代码:
df3 <- tidyr::separate(data = df, col = "id", into = c("dad", "mom"), sep = "/")
但是将整个列“id”拆分为两个新列。知道如何解决这个问题吗?
这是在 separate
ing 之后使用 coalesce
的一种方法 - 将空格 (""
) 转换为 NA
(na_if
), separate
将 'id' 放入 'dad2'、'mom2' 列,循环 across
'dad'、'mom' 列和 coalesce
与相应的 'dad2'、'mom2' 列
library(dplyr)
library(tidyr)
library(stringr)
df %>%
na_if("") %>%
separate(id, into = c("dad2", "mom2"), sep = "/", fill = "right",
remove = FALSE) %>%
mutate(across(dad:mom, ~ coalesce(.x, get(str_c(cur_column(),
2)))), .keep = "unused")
-输出
id dad mom
1 j1 j10 k2
2 j2 j11 k4
3 j3/j9 j3 j9
4 j5 j13 k6
5 j2/j8 j2 j8
6 j3/j4 j3 j4
或者 across2
来自 dplyover
library(dplyover)
df %>%
na_if("") %>%
separate(id, into = c("dad2", "mom2"), sep = "/", fill = "right",
remove = FALSE) %>%
mutate(across2(dad:mom, dad2:mom2, coalesce, .names = "{xcol}")) %>%
select(names(df))
您可以使用:
library(dplyr)
library(stringr)
df %>%
mutate(dad = if_else(str_detect(id, "/"), str_extract(id, ".*(?=/)"), dad),
mom = if_else(str_detect(id, "/"), str_extract(id, "(?<=/).*"), mom))
这个returns
id dad mom
1 j1 j10 k2
2 j2 j11 k4
3 j3/j9 j3 j9
4 j5 j13 k6
5 j2/j8 j2 j8
6 j3/j4 j3 j4
您可以使用 grep
获取包含 /
的行,而不是使用 strsplit
并将结果插入 df。
i <- grep("/", df$id)
df[i, c("dad", "mom")] <- do.call(rbind, strsplit(df$id[i], "/"))
#df[i, -1] <- do.call(rbind, strsplit(df$id[i], "/")) #Alternative
df
## id dad mom
#1 j1 j10 k2
#2 j2 j11 k4
#3 j3/j9 j3 j9
#4 j5 j13 k6
#5 j2/j8 j2 j8
#6 j3/j4 j3 j4
或使用sub
.
i <- grep("/", df$id)
df$dad[i] <- sub("/.*", "", df$id[i])
df$mom[i] <- sub(".*/", "", df$id[i])