将分类变量重新编码为 R 中的新变量

Recode categorical variable as new variable in R

我如何根据 R 中第一列中的值向该数据添加新的分类列?像这样:

> head(df)
          common_name
1       Sailfin molly
2 Hardhead silverside
3           Blue crab

if common_name = "Sailfin molly", "Hardhead silverside", put "Fish" 否则,放“螃蟹”

> head(df)
          common_name   category
1       Sailfin molly   Fish
2 Hardhead silverside   Fish
3           Blue crab   Crab

在这里找到这个答案 (https://rstudio-pubs-static.s3.amazonaws.com/116317_e6922e81e72e4e3f83995485ce686c14.html#/9)

df <- mutate(df, cat = ifelse(grepl("Sailfin molly", common_name), "Fish",
                                      ifelse(grepl("Hardhead silverside", common_name), "Fish", "Crab")))

使用 dput() 提供数据样本,不要只列出打印输出,因为这会隐藏重要细节:

df <- structure(list(common_name = c("Sailfin molly", "Hardhead silverside", 
"Blue crab")), class = "data.frame", row.names = c(NA, -3L))

现在我们需要一个常用名称列表:

Names <- unique(df$common_name)
Names
# [1] "Sailfin molly"       "Hardhead silverside" "Blue crab"     
Fish <- unique(df$common_name)[1:2]

前两个名字是鱼。您的完整数据将有更多名称,但您必须创建一个列出鱼的变量。然后添加新列:

df$category <- ifelse(df$common_name %in% Fish, "Fish", "Crab")
df
          common_name category
1       Sailfin molly     Fish
2 Hardhead silverside     Fish
3           Blue crab     Crab

如果您有两个以上的类别,则创建一个包含每个 common_namecategory 的 2 列数据框会更容易,然后使用 merge().

df2 <- df[, 1, drop=FALSE]
table <- data.frame(common_name=Names, category=df$category)
merge(df2, table)
#           common_name category
# 1           Blue crab     Crab
# 2 Hardhead silverside     Fish
# 3       Sailfin molly     Fish