在 R 中使用查找 table 将所有变量的 class 转换为因子
using a lookup table in R converts the class of all variables to factors
我想使用查找 table 来查找和替换数据框中的匹配值,但是当我应用查找 table 时,它会将数据框中的所有变量更改为因子。有没有办法在不更改变量的 class 的情况下应用此查找 table?
这是我的数据:
df <- structure(list(year = c(2008, 2008, 2008, 2010, 2009, 2009, 2011,
2007, 2011, 2009, 2007, 2008, 2010, 2006, 2009, 2010, 2009, 2006,
2009, 2008), change_occurred = c("true", "false", "true", "false",
"false", "true", "false", "false", "false", "false", "false",
"false", "true", "false", "false", "true", "false", "false",
"false", "false"), agent_01 = c("harvest", "none", "development",
"none", "none", "agriculture", "none", "none", "none", "none",
"none", "none", "insect_disease_defo", "none", "none", "insect_disease_defo",
"none", "none", "none", "none"), agent_01_conc = c("harvest_60",
"none", "development", "none", "none", "agriculture", "none",
"none", "none", "none", "none", "none", "insect_disease_defo",
"none", "none", "insect_disease_defo", "none", "none", "none",
"none"), ha_affect = c(3.87, 0, 1.134, 0, 0, 1.44, 0, 0, 0, 0,
0, 0, 1.8, 0, 0, 2.43, 0, 0, 0, 0)), .Names = c("year", "change_occurred",
"agent_01", "agent_01_conc", "ha_affect"), row.names = c(NA,
20L), class = "data.frame")
df
的结构:
str(df)
'data.frame': 20 obs. of 5 variables:
$ year : num 2008 2008 2008 2010 2009 ...
$ change_occurred: chr "true" "false" "true" "false" ...
$ agent_01 : chr "harvest" "none" "development" "none" ...
$ agent_01_conc : chr "harvest_60" "none" "development" "none" ...
$ ha_affect : num 3.87 0 1.13 0 0 ...
这是我的查询 table:
lookup <- structure(c("harvest_0", "harvest_10", "harvest_20", "harvest_30",
"harvest_40", "harvest_50", "harvest_60", "harvest_70", "harvest_80",
"harvest_90", "harvest_00_20", "harvest_00_20", "harvest_00_20",
"harvest_30_60", "harvest_30_60", "harvest_30_60", "harvest_30_60",
"harvest_70_90", "harvest_70_90", "harvest_70_90"), .Dim = c(10L,
2L), .Dimnames = list(NULL, c("list", "val")))
现在我使用查找 table 查找 lookup$list
中的任何匹配项,如果找到匹配项,则将其替换为 lookup$val
中的值。
g <- sapply(df, function(x) {
tmp = lookup[, 2][match(x, lookup[, 1])]
ifelse(is.na(tmp), x, tmp)
})
现在我将它强制转换为数据框...
g.df <- as.data.frame(g)
但是现在变量的结构都是factor
str(g.df)
'data.frame': 20 obs. of 5 variables:
$ year : Factor w/ 6 levels "2006","2007",..: 3 3 3 5 4 4 6 2 6 4 ...
$ change_occurred: Factor w/ 2 levels "false","true": 2 1 2 1 1 2 1 1 1 1 ...
$ agent_01 : Factor w/ 5 levels "agriculture",..: 3 5 2 5 5 1 5 5 5 5 ...
$ agent_01_conc : Factor w/ 5 levels "agriculture",..: 3 5 2 5 5 1 5 5 5 5 ...
$ ha_affect : Factor w/ 6 levels "0","1.134","1.44",..: 6 1 2 1 1 3 1 1 1 1 ...
关于如何防止这种情况发生的任何想法?
-樱桃树
我们需要使用 lapply
而不是 sapply
,因为后者转换为 matrix
并且矩阵只能容纳一个 class。如果有任何字符列,所有列将被转换为character
。当我们使用 as.data.frame
时,它会转换为 factor
,因为默认选项是 stringsAsFactors=TRUE
。
g <- lapply(df, function(x) {
tmp = lookup[, 2][match(x, lookup[, 1])]
ifelse(is.na(tmp), x, tmp)
})
df2 <- data.frame(g)
str(df2)
#'data.frame': 20 obs. of 5 variables:
# $ year : num 2008 2008 2008 2010 2009 ...
# $ change_occurred: Factor w/ 2 levels "false","true": 2 1 2 1 1 2 1 1 1 1 ...
# $ agent_01 : Factor w/ 5 levels "agriculture",..: 3 5 2 5 5 1 5 5 5 5 ...
# $ agent_01_conc : Factor w/ 5 levels "agriculture",..: 3 5 2 5 5 1 5 5 5 5 ...
# $ ha_affect : num 3.87 0 1.13 0 0 ...
如果我们真的要用sapply
,那么有一个选项simplify=FALSE
,这样就不会强制到matrix
。
我想使用查找 table 来查找和替换数据框中的匹配值,但是当我应用查找 table 时,它会将数据框中的所有变量更改为因子。有没有办法在不更改变量的 class 的情况下应用此查找 table?
这是我的数据:
df <- structure(list(year = c(2008, 2008, 2008, 2010, 2009, 2009, 2011,
2007, 2011, 2009, 2007, 2008, 2010, 2006, 2009, 2010, 2009, 2006,
2009, 2008), change_occurred = c("true", "false", "true", "false",
"false", "true", "false", "false", "false", "false", "false",
"false", "true", "false", "false", "true", "false", "false",
"false", "false"), agent_01 = c("harvest", "none", "development",
"none", "none", "agriculture", "none", "none", "none", "none",
"none", "none", "insect_disease_defo", "none", "none", "insect_disease_defo",
"none", "none", "none", "none"), agent_01_conc = c("harvest_60",
"none", "development", "none", "none", "agriculture", "none",
"none", "none", "none", "none", "none", "insect_disease_defo",
"none", "none", "insect_disease_defo", "none", "none", "none",
"none"), ha_affect = c(3.87, 0, 1.134, 0, 0, 1.44, 0, 0, 0, 0,
0, 0, 1.8, 0, 0, 2.43, 0, 0, 0, 0)), .Names = c("year", "change_occurred",
"agent_01", "agent_01_conc", "ha_affect"), row.names = c(NA,
20L), class = "data.frame")
df
的结构:
str(df)
'data.frame': 20 obs. of 5 variables:
$ year : num 2008 2008 2008 2010 2009 ...
$ change_occurred: chr "true" "false" "true" "false" ...
$ agent_01 : chr "harvest" "none" "development" "none" ...
$ agent_01_conc : chr "harvest_60" "none" "development" "none" ...
$ ha_affect : num 3.87 0 1.13 0 0 ...
这是我的查询 table:
lookup <- structure(c("harvest_0", "harvest_10", "harvest_20", "harvest_30",
"harvest_40", "harvest_50", "harvest_60", "harvest_70", "harvest_80",
"harvest_90", "harvest_00_20", "harvest_00_20", "harvest_00_20",
"harvest_30_60", "harvest_30_60", "harvest_30_60", "harvest_30_60",
"harvest_70_90", "harvest_70_90", "harvest_70_90"), .Dim = c(10L,
2L), .Dimnames = list(NULL, c("list", "val")))
现在我使用查找 table 查找 lookup$list
中的任何匹配项,如果找到匹配项,则将其替换为 lookup$val
中的值。
g <- sapply(df, function(x) {
tmp = lookup[, 2][match(x, lookup[, 1])]
ifelse(is.na(tmp), x, tmp)
})
现在我将它强制转换为数据框...
g.df <- as.data.frame(g)
但是现在变量的结构都是factor
str(g.df)
'data.frame': 20 obs. of 5 variables:
$ year : Factor w/ 6 levels "2006","2007",..: 3 3 3 5 4 4 6 2 6 4 ...
$ change_occurred: Factor w/ 2 levels "false","true": 2 1 2 1 1 2 1 1 1 1 ...
$ agent_01 : Factor w/ 5 levels "agriculture",..: 3 5 2 5 5 1 5 5 5 5 ...
$ agent_01_conc : Factor w/ 5 levels "agriculture",..: 3 5 2 5 5 1 5 5 5 5 ...
$ ha_affect : Factor w/ 6 levels "0","1.134","1.44",..: 6 1 2 1 1 3 1 1 1 1 ...
关于如何防止这种情况发生的任何想法? -樱桃树
我们需要使用 lapply
而不是 sapply
,因为后者转换为 matrix
并且矩阵只能容纳一个 class。如果有任何字符列,所有列将被转换为character
。当我们使用 as.data.frame
时,它会转换为 factor
,因为默认选项是 stringsAsFactors=TRUE
。
g <- lapply(df, function(x) {
tmp = lookup[, 2][match(x, lookup[, 1])]
ifelse(is.na(tmp), x, tmp)
})
df2 <- data.frame(g)
str(df2)
#'data.frame': 20 obs. of 5 variables:
# $ year : num 2008 2008 2008 2010 2009 ...
# $ change_occurred: Factor w/ 2 levels "false","true": 2 1 2 1 1 2 1 1 1 1 ...
# $ agent_01 : Factor w/ 5 levels "agriculture",..: 3 5 2 5 5 1 5 5 5 5 ...
# $ agent_01_conc : Factor w/ 5 levels "agriculture",..: 3 5 2 5 5 1 5 5 5 5 ...
# $ ha_affect : num 3.87 0 1.13 0 0 ...
如果我们真的要用sapply
,那么有一个选项simplify=FALSE
,这样就不会强制到matrix
。