合并和替换字符列
Merge and replace character columns
我有一个包含 2 列的数据框,我想合并:
Region PA
1 Mbeya Ruaha National Park
2 Mbeya Ruaha National Park
3 Mbeya Ruaha National Park
4 Mbeya Ruaha National Park
5 Mbeya Ruaha National Park
6 Mbeya Ruaha National Park
7 Mbeya NA
8 Mbeya NA
9 Mbeya NA
10 Mbeya NA
这可以通过获取 PA 值并覆盖行中的区域值,或将 PA 中的所有 NA 替换为该行区域中的值来合并。
我试过:
Carcass.cleaned$New<-rowSums(Carcass.cleaned[, c("PA", "Region")], na.rm=T)
Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) :
'x' must be numeric
with(Carcass.cleaned,ifelse(is.na(PA),Region,PA))
(returns list of numbers)
and coalesce(Carcass.cleaned$PA, Carcass.cleaned$Region)
unite(Carcass.cleaned, new, PA:Region, sep='')
(both merge the columns names instead of replacing)
您可以使用一个简单的 if else 语句:
df$Region <- ifelse(is.na(df$PA), df$Region, df$PA)
基本上,只要 PA 为 NA,您就可以保持 Region 不变,并且在 PA 有值的地方,您可以覆盖 Region 中的值。后记如果你想删除PA
尝试使用 dplyr
中的 mutate()
:
library(tidyr)
library(dplyr)
#Code
df <- df %>% group_by(Region) %>%
mutate(PA=ifelse(is.na(PA),Region,PA))
输出:
# A tibble: 10 x 2
# Groups: Region [1]
Region PA
<chr> <chr>
1 Mbeya Ruaha National Park
2 Mbeya Ruaha National Park
3 Mbeya Ruaha National Park
4 Mbeya Ruaha National Park
5 Mbeya Ruaha National Park
6 Mbeya Ruaha National Park
7 Mbeya Mbeya
8 Mbeya Mbeya
9 Mbeya Mbeya
10 Mbeya Mbeya
使用了一些数据:
#Data
df <- structure(list(Region = c("Mbeya", "Mbeya", "Mbeya", "Mbeya",
"Mbeya", "Mbeya", "Mbeya", "Mbeya", "Mbeya", "Mbeya"), PA = c("Ruaha National Park",
"Ruaha National Park", "Ruaha National Park", "Ruaha National Park",
"Ruaha National Park", "Ruaha National Park", NA, NA, NA, NA)), row.names = c(NA,
-10L), class = "data.frame")
我们可以使用 dplyr
中的 coalesce
library(dplyr)
df %>%
mutate(PA = coalesce(PA, Region))
# Region PA
#1 Mbeya Ruaha National Park
#2 Mbeya Ruaha National Park
#3 Mbeya Ruaha National Park
#4 Mbeya Ruaha National Park
#5 Mbeya Ruaha National Park
#6 Mbeya Ruaha National Park
#7 Mbeya Mbeya
#8 Mbeya Mbeya
#9 Mbeya Mbeya
#10 Mbeya Mbeya
或在data.table
中使用fcoalesce
library(data.table)
setDT(df)[, PA := fcoalesce(PA, Region)]
数据
df <- structure(list(Region = c("Mbeya", "Mbeya", "Mbeya", "Mbeya",
"Mbeya", "Mbeya", "Mbeya", "Mbeya", "Mbeya", "Mbeya"), PA = c("Ruaha National Park",
"Ruaha National Park", "Ruaha National Park", "Ruaha National Park",
"Ruaha National Park", "Ruaha National Park", NA, NA, NA, NA)), row.names = c(NA,
-10L), class = "data.frame")
您可以用相应的 Region
值替换 PA
中的 NA
值。
df$PA[is.na(df$PA)] <- df$Region[is.na(df$PA)]
df
# Region PA
#1 Mbeya Ruaha National Park
#2 Mbeya Ruaha National Park
#3 Mbeya Ruaha National Park
#4 Mbeya Ruaha National Park
#5 Mbeya Ruaha National Park
#6 Mbeya Ruaha National Park
#7 Mbeya Mbeya
#8 Mbeya Mbeya
#9 Mbeya Mbeya
#10 Mbeya Mbeya
我有一个包含 2 列的数据框,我想合并:
Region PA
1 Mbeya Ruaha National Park
2 Mbeya Ruaha National Park
3 Mbeya Ruaha National Park
4 Mbeya Ruaha National Park
5 Mbeya Ruaha National Park
6 Mbeya Ruaha National Park
7 Mbeya NA
8 Mbeya NA
9 Mbeya NA
10 Mbeya NA
这可以通过获取 PA 值并覆盖行中的区域值,或将 PA 中的所有 NA 替换为该行区域中的值来合并。
我试过:
Carcass.cleaned$New<-rowSums(Carcass.cleaned[, c("PA", "Region")], na.rm=T)
Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) :
'x' must be numeric
with(Carcass.cleaned,ifelse(is.na(PA),Region,PA))
(returns list of numbers)
and coalesce(Carcass.cleaned$PA, Carcass.cleaned$Region)
unite(Carcass.cleaned, new, PA:Region, sep='')
(both merge the columns names instead of replacing)
您可以使用一个简单的 if else 语句:
df$Region <- ifelse(is.na(df$PA), df$Region, df$PA)
基本上,只要 PA 为 NA,您就可以保持 Region 不变,并且在 PA 有值的地方,您可以覆盖 Region 中的值。后记如果你想删除PA
尝试使用 dplyr
中的 mutate()
:
library(tidyr)
library(dplyr)
#Code
df <- df %>% group_by(Region) %>%
mutate(PA=ifelse(is.na(PA),Region,PA))
输出:
# A tibble: 10 x 2
# Groups: Region [1]
Region PA
<chr> <chr>
1 Mbeya Ruaha National Park
2 Mbeya Ruaha National Park
3 Mbeya Ruaha National Park
4 Mbeya Ruaha National Park
5 Mbeya Ruaha National Park
6 Mbeya Ruaha National Park
7 Mbeya Mbeya
8 Mbeya Mbeya
9 Mbeya Mbeya
10 Mbeya Mbeya
使用了一些数据:
#Data
df <- structure(list(Region = c("Mbeya", "Mbeya", "Mbeya", "Mbeya",
"Mbeya", "Mbeya", "Mbeya", "Mbeya", "Mbeya", "Mbeya"), PA = c("Ruaha National Park",
"Ruaha National Park", "Ruaha National Park", "Ruaha National Park",
"Ruaha National Park", "Ruaha National Park", NA, NA, NA, NA)), row.names = c(NA,
-10L), class = "data.frame")
我们可以使用 dplyr
coalesce
library(dplyr)
df %>%
mutate(PA = coalesce(PA, Region))
# Region PA
#1 Mbeya Ruaha National Park
#2 Mbeya Ruaha National Park
#3 Mbeya Ruaha National Park
#4 Mbeya Ruaha National Park
#5 Mbeya Ruaha National Park
#6 Mbeya Ruaha National Park
#7 Mbeya Mbeya
#8 Mbeya Mbeya
#9 Mbeya Mbeya
#10 Mbeya Mbeya
或在data.table
fcoalesce
library(data.table)
setDT(df)[, PA := fcoalesce(PA, Region)]
数据
df <- structure(list(Region = c("Mbeya", "Mbeya", "Mbeya", "Mbeya",
"Mbeya", "Mbeya", "Mbeya", "Mbeya", "Mbeya", "Mbeya"), PA = c("Ruaha National Park",
"Ruaha National Park", "Ruaha National Park", "Ruaha National Park",
"Ruaha National Park", "Ruaha National Park", NA, NA, NA, NA)), row.names = c(NA,
-10L), class = "data.frame")
您可以用相应的 Region
值替换 PA
中的 NA
值。
df$PA[is.na(df$PA)] <- df$Region[is.na(df$PA)]
df
# Region PA
#1 Mbeya Ruaha National Park
#2 Mbeya Ruaha National Park
#3 Mbeya Ruaha National Park
#4 Mbeya Ruaha National Park
#5 Mbeya Ruaha National Park
#6 Mbeya Ruaha National Park
#7 Mbeya Mbeya
#8 Mbeya Mbeya
#9 Mbeya Mbeya
#10 Mbeya Mbeya