根据R中现有列中的值将列添加到数据框

Question

我在 Rstudio 工作，有一个类似于以下的数据框：

Favorite<-c("Apple","Lemon","Orange","Salat","Onion", "Apple","Strawberry","Celery","Blueberry","Sweetpotatoes","Strawberry",
                "Oragne","Celery","Sweetpotatoes","Onion","Blueberry","Strawberry","Salad")
PersonID<-c(67,82,67,21,02,12,90,23,65,32,44,67,56,77,30,198,20,99)
all_Data<-data.frame(PersonID,Favorite)

> head(all_Data)
  PersonID Favorite
1       67    Apple
2       82    Lemon
3       67   Orange
4       21    Salat
5        2    Onion
6       12    Apple

我想再添加 3 个列，它们应该包含以下内容：

如果 all_Data$Favorite 中的一行是 Apple 或 Blueberry，则 all_Data$Country = Ireand，all_Data$Continent= Europe 和 all_Data$city=Belfast

如果 all_Data$Favorite 中的一行是 Strawberry，那么 all_Data$Country= Holland，all_Data$Continent= Europe 和 all_Data$city=Emmen

如果 all_Data$Favorite 中的一行是 Lemon 或 Orange，则 all_Data$Country= France，all_Data$Continent= Europe 和 all_Data$city=Menton

如果 all_Data$Favorite 中的一行是沙拉或洋葱，那么 all_Data$Country= Sweeden，all_Data$Continent= Europe 和 all_Data$city=Malmoe

如果 all_Data$Favorite 中的一行是 Lemon 或 Orange，则 all_Data$Country= France，all_Data$Continent= Europe 和 all_Data$city=Menton

如果 all_Data$Favorite 中的一行是 Sweetpotatoes，那么 all_Data$Country= USA，all_Data$Continent= America 和 all_Data$city=Verona

如果 all_Data$Favorite 中的一行是芹菜，那么 all_Data$Country= 德国，all_Data$Continent= 欧洲和 all_Data$city=柏林

library(tidyverse)

all_Data |> 
  mutate(ctry_cont = case_when(
    str_detect(Favorite, "Appl|Blueb")  ~ "Ireland|Europe",
    str_detect(Favorite, "Straw")       ~ "Brazillian|South's of America",
    str_detect(Favorite, "Lemon|Orang") ~ "France|Europe",
    str_detect(Favorite, "Salad|Onion") ~ "Sweden|Europe",
    str_detect(Favorite, "Sweetpot")    ~ "United of state|America",
    str_detect(Favorite, "Celery")      ~ "Germany|Europe",
    TRUE                                ~ "Other|Other"
  )) |> 
  separate(ctry_cont, c("country", "continent"))

在运行上面的代码之后，我得到以下警告和数据，其中我们看到了英国和美国的一半价值。我还添加了带撇号的单词，因为在我的原始数据中有带撇号的单词，但它也不可见：

     PersonID      Favorite    country continent
1        67         Apple    Ireland    Europe
2        82         Lemon     France    Europe
3        67        Orange     France    Europe
4        21         Salat      Other     Other
5         2         Onion     Sweden    Europe
6        12         Apple    Ireland    Europe
7        90    Strawberry Brazillian     South
8        23        Celery    Germany    Europe
9        65     Blueberry    Ireland    Europe
10       32 Sweetpotatoes     United        of
11       44    Strawberry Brazillian     South
12       67        Oragne      Other     Other
13       56        Celery    Germany    Europe
14       77 Sweetpotatoes     United        of
15       30         Onion     Sweden    Europe
16      198     Blueberry    Ireland    Europe
17       20    Strawberry Brazillian     South
18       99         Salad     Sweden    Europe

    Warning message:
Expected 2 pieces. Additional pieces discarded in 5 rows [7, 10, 11, 14, 17].

我还尝试在代码的最后一步添加 sep=""。它给出了一个错误。

separate(ctry_cont, c("country", "continent"), sep="")

Answer 1

你可以这样做...

Favorite <- c(
  "Apple",
  "Lemon",
  "Orange",
  "Salad",
  "Onion",
  "Apple",
  "Strawberry",
  "Celery",
  "Blueberry",
  "Sweetpotatoes",
  "Strawberry",
  "Orange",
  "Celery",
  "Sweetpotatoes",
  "Onion",
  "Blueberry",
  "Strawberry",
  "Salad"
)

PersonID <-
  c(67, 82, 67, 21, 02, 12, 90, 23, 65, 32, 44, 67, 56, 77, 30, 198, 20, 99)

all_Data <- data.frame(PersonID, Favorite)

library(tidyverse)

all_Data |> 
  mutate(ctry_cont = case_when(
    str_detect(Favorite, "Appl|Blueb")  ~ "Ireland, Europe",
    str_detect(Favorite, "Straw")       ~ "Holland, Europe",
    str_detect(Favorite, "Lemon|Orang") ~ "France, Europe",
    str_detect(Favorite, "Salad|Onion") ~ "Sweden, Europe",
    str_detect(Favorite, "Sweetpot")    ~ "United States, North America",
    str_detect(Favorite, "Celery")      ~ "Germany, Europe",
    TRUE                                ~ "Other, Other"
  )) |> 
  separate(ctry_cont, c("country", "continent"), sep = ", ")
#>    PersonID      Favorite       country     continent
#> 1        67         Apple       Ireland        Europe
#> 2        82         Lemon        France        Europe
#> 3        67        Orange        France        Europe
#> 4        21         Salad        Sweden        Europe
#> 5         2         Onion        Sweden        Europe
#> 6        12         Apple       Ireland        Europe
#> 7        90    Strawberry       Holland        Europe
#> 8        23        Celery       Germany        Europe
#> 9        65     Blueberry       Ireland        Europe
#> 10       32 Sweetpotatoes United States North America
#> 11       44    Strawberry       Holland        Europe
#> 12       67        Orange        France        Europe
#> 13       56        Celery       Germany        Europe
#> 14       77 Sweetpotatoes United States North America
#> 15       30         Onion        Sweden        Europe
#> 16      198     Blueberry       Ireland        Europe
#> 17       20    Strawberry       Holland        Europe
#> 18       99         Salad        Sweden        Europe

^{由 reprex package (v2.0.1)}

创建于 2022-04-22

根据R中现有列中的值将列添加到数据框

adding columns to dataframe based on the values in existing column in R

if-statement

r

extract

dataframe