R - 在用数字替换字符串时避免连接

R - Avoid concatenation when replacing string by number

看起来是一个非常简单的问题,但到目前为止我还没有找到任何解决方案。

考虑以下数据框:

dat <- data.frame(id=LETTERS[1:5],
                  land.use=c(3,4,9,34,39))

我需要用字符串替换 land.use 列中的数字。问题是:我有不同的数字字符串 3434.

但是,R 坚持用 34.

的连接字符串替换 34

例如:

dat$land.use <- gsub("3","Bare soil", dat$land.use)
dat$land.use <- gsub("4","Primary Forest", dat$land.use)
dat$land.use <- gsub("9","Secondary Forest", dat$land.use)
dat$land.use <- gsub("34","Wheat", dat$land.use)
dat$land.use <- gsub("39","Soybean", dat$land.use)

> dat
  id                  land.use
1  A                 Bare soil # This is OK
2  B            Primary Forest # This is OK
3  C          Secondary Forest # This is OK
4  D   Bare soilPrimary Forest # This should be Wheat
5  E Bare soilSecondary Forest # This should be Soybean

我做错了什么?

在这种情况下,我会使用 match 将数字替换为字符串

c("Bare soil","Primary Forest","Secondary Forest","Wheat",
  "Soybean")[match(dat$land.use, c(3,4,9,34,39))]
#[1] "Bare soil"        "Primary Forest"   "Secondary Forest" "Wheat"           
#[5] "Soybean"         

要使用您的方法,您必须添加 ^$

dat$land.use <- sub("^3$","Bare soil", dat$land.use)
dat$land.use <- sub("^4$","Primary Forest", dat$land.use)
dat$land.use <- sub("^9$","Secondary Forest", dat$land.use)
dat$land.use <- sub("^34$","Wheat", dat$land.use)
dat$land.use <- sub("^39$","Soybean", dat$land.use)
dat
#  id         land.use
#1  A        Bare soil
#2  B   Primary Forest
#3  C Secondary Forest
#4  D            Wheat
#5  E          Soybean

当您要执行完全匹配时,不要使用部分匹配函数(gsubgrep 等)。您可以创建查找 table 并执行联接。

lookup_table <- data.frame(land.use = c(3, 4, 9, 34, 39), 
                           value = c("Bare soil", "Primary Forest", 
                           "Secondary Forest", "Wheat", "Soybean"))

merge(dat, lookup_table, all.x = TRUE, by = 'land.use')

#  land.use id            value
#1        3  A        Bare soil
#2        4  B   Primary Forest
#3        9  C Secondary Forest
#4       34  D            Wheat
#5       39  E          Soybean

根据您下一步的操作,您也可能需要一个 factor() 变量。您可以这样做,或者使用其他方法之一并稍后使用 as.factor()

dat$land.use.factor <- factor(dat$land.use, 
                              levels = c(3, 4, 9, 34, 39),
                              labels = c("Bare soil", "Primary Forest", 
                                         "Secondary Forest", "Wheat", "Soybean"))

# > dat
#    id land.use  land.use.factor
# 1   A        3        Bare soil
# 2   B        4   Primary Forest
# 3   C        9 Secondary Forest
# 4   D       34            Wheat
# 5   E       39          Soybean

我们可以使用 left_join

library(dplyr)
left_join(df1, keydat, by = 'land.use')

数据

keydat <- data.frame(land.use = c(3, 4, 9, 34, 39), 
                           value = c("Bare soil", "Primary Forest", 
                           "Secondary Forest", "Wheat", "Soybean"))