R - 在用数字替换字符串时避免连接
R - Avoid concatenation when replacing string by number
看起来是一个非常简单的问题,但到目前为止我还没有找到任何解决方案。
考虑以下数据框:
dat <- data.frame(id=LETTERS[1:5],
land.use=c(3,4,9,34,39))
我需要用字符串替换 land.use
列中的数字。问题是:我有不同的数字字符串 3
、4
和 34
.
但是,R 坚持用 3
和 4
.
的连接字符串替换 34
例如:
dat$land.use <- gsub("3","Bare soil", dat$land.use)
dat$land.use <- gsub("4","Primary Forest", dat$land.use)
dat$land.use <- gsub("9","Secondary Forest", dat$land.use)
dat$land.use <- gsub("34","Wheat", dat$land.use)
dat$land.use <- gsub("39","Soybean", dat$land.use)
> dat
id land.use
1 A Bare soil # This is OK
2 B Primary Forest # This is OK
3 C Secondary Forest # This is OK
4 D Bare soilPrimary Forest # This should be Wheat
5 E Bare soilSecondary Forest # This should be Soybean
我做错了什么?
在这种情况下,我会使用 match
来 将数字替换为字符串 。
c("Bare soil","Primary Forest","Secondary Forest","Wheat",
"Soybean")[match(dat$land.use, c(3,4,9,34,39))]
#[1] "Bare soil" "Primary Forest" "Secondary Forest" "Wheat"
#[5] "Soybean"
要使用您的方法,您必须添加 ^
和 $
。
dat$land.use <- sub("^3$","Bare soil", dat$land.use)
dat$land.use <- sub("^4$","Primary Forest", dat$land.use)
dat$land.use <- sub("^9$","Secondary Forest", dat$land.use)
dat$land.use <- sub("^34$","Wheat", dat$land.use)
dat$land.use <- sub("^39$","Soybean", dat$land.use)
dat
# id land.use
#1 A Bare soil
#2 B Primary Forest
#3 C Secondary Forest
#4 D Wheat
#5 E Soybean
当您要执行完全匹配时,不要使用部分匹配函数(gsub
、grep
等)。您可以创建查找 table 并执行联接。
lookup_table <- data.frame(land.use = c(3, 4, 9, 34, 39),
value = c("Bare soil", "Primary Forest",
"Secondary Forest", "Wheat", "Soybean"))
merge(dat, lookup_table, all.x = TRUE, by = 'land.use')
# land.use id value
#1 3 A Bare soil
#2 4 B Primary Forest
#3 9 C Secondary Forest
#4 34 D Wheat
#5 39 E Soybean
根据您下一步的操作,您也可能需要一个 factor()
变量。您可以这样做,或者使用其他方法之一并稍后使用 as.factor()
。
dat$land.use.factor <- factor(dat$land.use,
levels = c(3, 4, 9, 34, 39),
labels = c("Bare soil", "Primary Forest",
"Secondary Forest", "Wheat", "Soybean"))
# > dat
# id land.use land.use.factor
# 1 A 3 Bare soil
# 2 B 4 Primary Forest
# 3 C 9 Secondary Forest
# 4 D 34 Wheat
# 5 E 39 Soybean
我们可以使用 left_join
library(dplyr)
left_join(df1, keydat, by = 'land.use')
数据
keydat <- data.frame(land.use = c(3, 4, 9, 34, 39),
value = c("Bare soil", "Primary Forest",
"Secondary Forest", "Wheat", "Soybean"))
看起来是一个非常简单的问题,但到目前为止我还没有找到任何解决方案。
考虑以下数据框:
dat <- data.frame(id=LETTERS[1:5],
land.use=c(3,4,9,34,39))
我需要用字符串替换 land.use
列中的数字。问题是:我有不同的数字字符串 3
、4
和 34
.
但是,R 坚持用 3
和 4
.
34
例如:
dat$land.use <- gsub("3","Bare soil", dat$land.use)
dat$land.use <- gsub("4","Primary Forest", dat$land.use)
dat$land.use <- gsub("9","Secondary Forest", dat$land.use)
dat$land.use <- gsub("34","Wheat", dat$land.use)
dat$land.use <- gsub("39","Soybean", dat$land.use)
> dat
id land.use
1 A Bare soil # This is OK
2 B Primary Forest # This is OK
3 C Secondary Forest # This is OK
4 D Bare soilPrimary Forest # This should be Wheat
5 E Bare soilSecondary Forest # This should be Soybean
我做错了什么?
在这种情况下,我会使用 match
来 将数字替换为字符串 。
c("Bare soil","Primary Forest","Secondary Forest","Wheat",
"Soybean")[match(dat$land.use, c(3,4,9,34,39))]
#[1] "Bare soil" "Primary Forest" "Secondary Forest" "Wheat"
#[5] "Soybean"
要使用您的方法,您必须添加 ^
和 $
。
dat$land.use <- sub("^3$","Bare soil", dat$land.use)
dat$land.use <- sub("^4$","Primary Forest", dat$land.use)
dat$land.use <- sub("^9$","Secondary Forest", dat$land.use)
dat$land.use <- sub("^34$","Wheat", dat$land.use)
dat$land.use <- sub("^39$","Soybean", dat$land.use)
dat
# id land.use
#1 A Bare soil
#2 B Primary Forest
#3 C Secondary Forest
#4 D Wheat
#5 E Soybean
当您要执行完全匹配时,不要使用部分匹配函数(gsub
、grep
等)。您可以创建查找 table 并执行联接。
lookup_table <- data.frame(land.use = c(3, 4, 9, 34, 39),
value = c("Bare soil", "Primary Forest",
"Secondary Forest", "Wheat", "Soybean"))
merge(dat, lookup_table, all.x = TRUE, by = 'land.use')
# land.use id value
#1 3 A Bare soil
#2 4 B Primary Forest
#3 9 C Secondary Forest
#4 34 D Wheat
#5 39 E Soybean
根据您下一步的操作,您也可能需要一个 factor()
变量。您可以这样做,或者使用其他方法之一并稍后使用 as.factor()
。
dat$land.use.factor <- factor(dat$land.use,
levels = c(3, 4, 9, 34, 39),
labels = c("Bare soil", "Primary Forest",
"Secondary Forest", "Wheat", "Soybean"))
# > dat
# id land.use land.use.factor
# 1 A 3 Bare soil
# 2 B 4 Primary Forest
# 3 C 9 Secondary Forest
# 4 D 34 Wheat
# 5 E 39 Soybean
我们可以使用 left_join
library(dplyr)
left_join(df1, keydat, by = 'land.use')
数据
keydat <- data.frame(land.use = c(3, 4, 9, 34, 39),
value = c("Bare soil", "Primary Forest",
"Secondary Forest", "Wheat", "Soybean"))