在 R 中查找 table 引用数据框中的行值和特定列

Look up table in R referencing row values and specific columns in a dataframe

我在 R 中有一个多部分查找 table 问题。 我有一个数据框,其中每列中的数字代表一个项目名称。物品名称可以在对应的查找中找到table.

数据:

  > food.dat
      Fruit Vegetable Meat Dairy
    1     1         2    2     3
    2     3         2    1     1
    3     3         2    2     2
    4     2         2    1     1
    5     1         1    1     2

查找Table:

> food.lookup
    FoodItem Number  FoodName
1      Fruit      1    Banana
2      Fruit      2     Apple
3      Fruit      3     Mango
4  Vegetable      1    Carrot
5  Vegetable      2  Broccoli 
6       Meat      1   Chicken
7       Meat      2      Fish
8      Dairy      1    Cheese
9      Dairy      2    Yogurt
10    Dairy       3  IceCream

请注意,这个数字在食物中并不是唯一的。例如,1 表示 Fruit (Banana) 列中的不同 FoodName 和 Vegetable (Carrot) 列中的不同 FoodName。

我想重新编码 food.dat 数据框以从查找 table 中获取 FoodName 值。 如果可能的话,我还希望能够使用一个简单的函数并提供一个 FoodName 和 return 来自 food.dat 的数据框,其中仅包含包含指定 FoodName 的行。

感谢您的宝贵时间和想法:)

split 由 'FoodItem' 命名的 vector 从 'food.lookup' 变成了 list。循环 across 'food.dat' 列,提取 list 元素并通过匹配

替换值
library(dplyr)
lst1 <- with(food.lookup, split(setNames(FoodName, Number), FoodItem))
food.dat %>% 
    mutate(across(all_of(names(lst1)), ~ lst1[[cur_column()]][as.character(.)]))

-输出

 Fruit Vegetable    Meat    Dairy
1 Banana  Broccoli    Fish IceCream
2  Mango  Broccoli Chicken   Cheese
3  Mango  Broccoli    Fish   Yogurt
4  Apple  Broccoli Chicken   Cheese
5 Banana    Carrot Chicken   Yogurt

数据

food.dat <- structure(list(Fruit = c(1L, 3L, 3L, 2L, 1L), Vegetable = c(2L, 
2L, 2L, 2L, 1L), Meat = c(2L, 1L, 2L, 1L, 1L), Dairy = c(3L, 
1L, 2L, 1L, 2L)), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5"))
food.lookup <- structure(list(FoodItem = c("Fruit", "Fruit", 
"Fruit", "Vegetable", 
"Vegetable", "Meat", "Meat", "Dairy", "Dairy", "Dairy"), Number = c(1L, 
2L, 3L, 1L, 2L, 1L, 2L, 1L, 2L, 3L), FoodName = c("Banana", "Apple", 
"Mango", "Carrot", "Broccoli", "Chicken", "Fish", "Cheese", "Yogurt", 
"IceCream")), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10"))

类似地,您可以利用不同名称的“位置”。 为此,将循环 table 拆分为相应的食物类型(或手动输入)。然后简单地使用索引设置结果。

下面做一个例子。您可以轻松地将其扩展到所有人。 我将结果存储在 Dairy2 中,因此您可以比较并查看索引的工作原理。

dairy <- c("Cheese","Yogurt","IceCream")
food.dat <- data.frame(Dairy = c(3,1,2,1,2))

food.dat$Dairy2 = dairy[food.dat$Dairy]

food.dat
  Dairy   Dairy2
1     3 IceCream
2     1   Cheese
3     2   Yogurt
4     1   Cheese
5     2   Yogurt

我们可以将数据转换为长格式,逐行显示一个食物,加入查找 table,然后转换回宽格式

library(tidyr)
library(dplyr)

food.dat %>% 
  tibble::rowid_to_column() %>% 
  pivot_longer(-rowid, names_to = "FoodItem", 
               values_to = "Number") %>% 
  left_join(food.lookup) %>% 
  pivot_wider(id_cols = rowid, names_from = FoodItem, 
              values_from = FoodName)


#> # A tibble: 5 x 5
#>   rowid Fruit  Vegetable Meat    Dairy   
#>   <int> <chr>  <chr>     <chr>   <chr>   
#> 1     1 Banana Broccoli  Fish    IceCream
#> 2     2 Mango  Broccoli  Chicken Cheese  
#> 3     3 Mango  Broccoli  Fish    Yogurt  
#> 4     4 Apple  Broccoli  Chicken Cheese  
#> 5     5 Banana Carrot    Chicken Yogurt


有数据:

food.dat <- read.table(text =
'Fruit Vegetable Meat Dairy
1         2    2     3
3         2    1     1
3         2    2     2
2         2    1     1
1         1    1     2', header = TRUE)

food.lookup <- read.table(text =
'FoodItem Number  FoodName
    Fruit      1    Banana
    Fruit      2     Apple
    Fruit      3     Mango
Vegetable      1    Carrot
Vegetable      2  Broccoli 
     Meat      1   Chicken
     Meat      2      Fish
    Dairy      1    Cheese
    Dairy      2    Yogurt
    Dairy       3  IceCream', header = TRUE)