合并和替换字符列

Merge and replace character columns

我有一个包含 2 列的数据框,我想合并:

     Region             PA
1     Mbeya    Ruaha National Park
2     Mbeya    Ruaha National Park
3     Mbeya    Ruaha National Park
4     Mbeya    Ruaha National Park
5     Mbeya    Ruaha National Park
6     Mbeya    Ruaha National Park
7     Mbeya    NA
8     Mbeya    NA
9     Mbeya    NA
10    Mbeya    NA

这可以通过获取 PA 值并覆盖行中的区域值,或将 PA 中的所有 NA 替换为该行区域中的值来合并。

我试过:

  Carcass.cleaned$New<-rowSums(Carcass.cleaned[, c("PA", "Region")], na.rm=T)
    Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) : 
      'x' must be numeric
    
    with(Carcass.cleaned,ifelse(is.na(PA),Region,PA))
    (returns list of numbers)
    
    and coalesce(Carcass.cleaned$PA, Carcass.cleaned$Region) 
    unite(Carcass.cleaned, new, PA:Region, sep='') 
 (both merge the columns names instead of replacing)

您可以使用一个简单的 if else 语句:

df$Region <- ifelse(is.na(df$PA), df$Region, df$PA)

基本上,只要 PA 为 NA,您就可以保持 Region 不变,并且在 PA 有值的地方,您可以覆盖 Region 中的值。后记如果你想删除PA

尝试使用 dplyr 中的 mutate():

library(tidyr)
library(dplyr)
#Code
df <- df %>% group_by(Region) %>% 
  mutate(PA=ifelse(is.na(PA),Region,PA))

输出:

# A tibble: 10 x 2
# Groups:   Region [1]
   Region PA                 
   <chr>  <chr>              
 1 Mbeya  Ruaha National Park
 2 Mbeya  Ruaha National Park
 3 Mbeya  Ruaha National Park
 4 Mbeya  Ruaha National Park
 5 Mbeya  Ruaha National Park
 6 Mbeya  Ruaha National Park
 7 Mbeya  Mbeya              
 8 Mbeya  Mbeya              
 9 Mbeya  Mbeya              
10 Mbeya  Mbeya       

使用了一些数据:

#Data
df <- structure(list(Region = c("Mbeya", "Mbeya", "Mbeya", "Mbeya", 
"Mbeya", "Mbeya", "Mbeya", "Mbeya", "Mbeya", "Mbeya"), PA = c("Ruaha National Park", 
"Ruaha National Park", "Ruaha National Park", "Ruaha National Park", 
"Ruaha National Park", "Ruaha National Park", NA, NA, NA, NA)), row.names = c(NA, 
-10L), class = "data.frame")

我们可以使用 dplyr

中的 coalesce
library(dplyr)
df %>%
   mutate(PA = coalesce(PA, Region))
#   Region                  PA
#1   Mbeya Ruaha National Park
#2   Mbeya Ruaha National Park
#3   Mbeya Ruaha National Park
#4   Mbeya Ruaha National Park
#5   Mbeya Ruaha National Park
#6   Mbeya Ruaha National Park
#7   Mbeya               Mbeya
#8   Mbeya               Mbeya
#9   Mbeya               Mbeya
#10  Mbeya               Mbeya

或在data.table

中使用fcoalesce
library(data.table)
setDT(df)[, PA := fcoalesce(PA, Region)]

数据

df <- structure(list(Region = c("Mbeya", "Mbeya", "Mbeya", "Mbeya", 
"Mbeya", "Mbeya", "Mbeya", "Mbeya", "Mbeya", "Mbeya"), PA = c("Ruaha National Park", 
"Ruaha National Park", "Ruaha National Park", "Ruaha National Park", 
"Ruaha National Park", "Ruaha National Park", NA, NA, NA, NA)), row.names = c(NA, 
-10L), class = "data.frame")

您可以用相应的 Region 值替换 PA 中的 NA 值。

df$PA[is.na(df$PA)] <- df$Region[is.na(df$PA)]
df
#   Region                  PA
#1   Mbeya Ruaha National Park
#2   Mbeya Ruaha National Park
#3   Mbeya Ruaha National Park
#4   Mbeya Ruaha National Park
#5   Mbeya Ruaha National Park
#6   Mbeya Ruaha National Park
#7   Mbeya               Mbeya
#8   Mbeya               Mbeya
#9   Mbeya               Mbeya
#10  Mbeya               Mbeya