一种通过查找 table 而不是 case_when 来替换 R 中 100 多个唯一值的方法?
A way of replacing 100s of unique values in R by using a look up table rather than case_when?
我在 excel sheet 中有一个数据框,其中包含一些我想更改为 R 中不同字符值集的字符值,但是我有 184 个不同的值需要更改为一组不同的 184 个值的列。这些值之间的转换在垂直查找中列出 table。
我可以用 case_when 进行变异,但是这将花费很长时间才能写出所有 184 个值,而且我可能不得不对其他类似大小的数据集无限期地重复此操作。我假设有某种方法可以通过在两个相同长度的向量之间创建查找来做到这一点?
示例数据框
df <- tibble(
Var1 = c("","Label 3", "Label 184", "Label 4", ""),
Var2 = c("","", "Label 1", "", "Label 2"),
Var3 = c("Label 2","Label 184", "Label 1", "", "Label 4")
)
Var1 Var2 Var3
<chr> <chr> <chr>
1 "" "" "Label 2"
2 "Label 3" "" "Label 184"
3 "Label 184" "Label 1" "Label 1"
4 "Label 4" "" ""
5 "" "Label 2" "Label 4"
示例查找 table
Lookup_table <- tibble(
x = c("Label 1","Label 2","Label 3","Label 4","Label 184"),
y = c("NewLabel 1","NewLabel 2","NewLabel 3","NewLabel 4","NewLabel 184")
)
x y
<chr> <chr>
1 Label 1 NewLabel 1
2 Label 2 NewLabel 2
3 Label 3 NewLabel 3
4 Label 4 NewLabel 4
5 Label 184 NewLabel 184
预期结果
Var1 Var2 Var3
<chr> <chr> <chr>
1 "" "" "NewLabel 2"
2 "NewLabel 3" "" "NewLabel 184"
3 "NewLabel 184" "NewLabel 1" "NewLabel 1"
4 "NewLabel 4" "" ""
5 "" "NewLabel 2" "NewLabel 4"
您可以获取长格式的数据,将其与 Lookup_table
连接,将 NA
替换为空白值并获取宽格式的数据。
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
pivot_longer(cols = -row) %>%
left_join(Lookup_table, by = c('value' = 'x')) %>%
mutate(y = replace_na(y, '')) %>%
select(-value) %>%
pivot_wider(names_from = name, values_from = y) %>%
select(-row)
在基础 R 中,您可以将 lapply
与 match
一起使用:
df[] <- lapply(df, function(x) {
tmp <- Lookup_table$y[match(x, Lookup_table$x)]
replace(tmp, is.na(tmp), '')
})
df
# A tibble: 5 x 3
# Var1 Var2 Var3
# <chr> <chr> <chr>
#1 "" "" "NewLabel 2"
#2 "NewLabel 3" "" "NewLabel 184"
#3 "NewLabel 184" "NewLabel 1" "NewLabel 1"
#4 "NewLabel 4" "" ""
#5 "" "NewLabel 2" "NewLabel 4"
我在 excel sheet 中有一个数据框,其中包含一些我想更改为 R 中不同字符值集的字符值,但是我有 184 个不同的值需要更改为一组不同的 184 个值的列。这些值之间的转换在垂直查找中列出 table。
我可以用 case_when 进行变异,但是这将花费很长时间才能写出所有 184 个值,而且我可能不得不对其他类似大小的数据集无限期地重复此操作。我假设有某种方法可以通过在两个相同长度的向量之间创建查找来做到这一点?
示例数据框
df <- tibble(
Var1 = c("","Label 3", "Label 184", "Label 4", ""),
Var2 = c("","", "Label 1", "", "Label 2"),
Var3 = c("Label 2","Label 184", "Label 1", "", "Label 4")
)
Var1 Var2 Var3
<chr> <chr> <chr>
1 "" "" "Label 2"
2 "Label 3" "" "Label 184"
3 "Label 184" "Label 1" "Label 1"
4 "Label 4" "" ""
5 "" "Label 2" "Label 4"
示例查找 table
Lookup_table <- tibble(
x = c("Label 1","Label 2","Label 3","Label 4","Label 184"),
y = c("NewLabel 1","NewLabel 2","NewLabel 3","NewLabel 4","NewLabel 184")
)
x y
<chr> <chr>
1 Label 1 NewLabel 1
2 Label 2 NewLabel 2
3 Label 3 NewLabel 3
4 Label 4 NewLabel 4
5 Label 184 NewLabel 184
预期结果
Var1 Var2 Var3
<chr> <chr> <chr>
1 "" "" "NewLabel 2"
2 "NewLabel 3" "" "NewLabel 184"
3 "NewLabel 184" "NewLabel 1" "NewLabel 1"
4 "NewLabel 4" "" ""
5 "" "NewLabel 2" "NewLabel 4"
您可以获取长格式的数据,将其与 Lookup_table
连接,将 NA
替换为空白值并获取宽格式的数据。
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
pivot_longer(cols = -row) %>%
left_join(Lookup_table, by = c('value' = 'x')) %>%
mutate(y = replace_na(y, '')) %>%
select(-value) %>%
pivot_wider(names_from = name, values_from = y) %>%
select(-row)
在基础 R 中,您可以将 lapply
与 match
一起使用:
df[] <- lapply(df, function(x) {
tmp <- Lookup_table$y[match(x, Lookup_table$x)]
replace(tmp, is.na(tmp), '')
})
df
# A tibble: 5 x 3
# Var1 Var2 Var3
# <chr> <chr> <chr>
#1 "" "" "NewLabel 2"
#2 "NewLabel 3" "" "NewLabel 184"
#3 "NewLabel 184" "NewLabel 1" "NewLabel 1"
#4 "NewLabel 4" "" ""
#5 "" "NewLabel 2" "NewLabel 4"