使用另一个变量的选定水平创建一个新变量

Make a new variable with selected levels of another variable

我在使用另一个变量的选定水平创建新变量时遇到了问题。数据集是gss,变量是class有5个水平"Lower Class""Working Class""Middle Class""Upper Class""No Class"和NA

如果我运行,

gss %>% 
select(class) %>%
str()

它给了我

'data.frame':   57061 obs. of  1 variable:
$ class: Factor w/ 5 levels "Lower Class",..: 3 3 2 3 2 3 3 2 2 2 ...

因为我只对那些指定了经济 class 的人感兴趣,所以我想去掉 "No Class" 水平和 NA。我不知道有什么更好的方法,所以我做了

gss <- gss %>%
mutate(filteredclass = ifelse(class == "Lower Class", "Lower Class", 
ifelse(class == "Working Class", "Working Class", ifelse(class == "Middle 
Class", "Middle Class", ifelse(class == "Upper Class", "Upper Class", NA)))))

然后,我试着看看它是否有效,所以我 运行:

with (gss, table(filteredclass))

然后给了我如下的混合订单:

filteredclass
Lower Class  Middle Class   Upper Class Working Class 
     3147         24289          1741         24458

我希望新变量 filteredclass 显示为与变量 'class' 相同的顺序。因为如果我对变量 'class' 做同样的事情,它会给我:

with (gss, table(class))
class
Lower Class Working Class  Middle Class   Upper Class 
     3147         24458         24289          1741 
 No Class 
        1 

有什么办法可以解决这个问题吗?或者,有没有什么办法可以在不通过我上面执行的 mutate 命令的情况下取消 No Class 级别?

提前感谢您的帮助!

以后如果你提供一个reproducible example就容易多了。

如果你想摆脱 "No Class" 你可以使用 filter

gss <- gss %>% 
  filter(class != "No Class") %>%
  droplevels()

要删除 NA,只需使用

gss <- na.omit(gss)

最简单的方法可能是 factor 在 class 上:

gss$filteredclass <- factor(gss$class, c("Lower Class", "Working Class",
                             "Middle Class", "Upper Class"))

这将省略 "No class" 并将其设置为 NA

您必须按照与 gss$class 相同的顺序重新调整因子。 为此,您可以在 mutate() 语句中添加另一行,您可以在其中创建具有相同水平的因子并删除未使用的水平(否 Class)。

library(tidyverse)
# Generate the data you showed
gss <- data.frame(class = factor(sample(c("Lower Class",  "Working Class",  "Middle Class",    "Upper Class", NA, "No Class"), 
                                        45000, replace = TRUE))) %>%
  mutate(class = factor(class, levels = c("Lower Class",  "Working Class",  "Middle Class",    "Upper Class", "No Class", NA)))

# Sampled data
with(gss, table(class, useNA = "always"))

# Mutate gss the way you did it
gss <-  gss %>%
  mutate(filteredclass = ifelse(class == "Lower Class", "Lower Class", 
                                ifelse(class == "Working Class", "Working Class",
                                       ifelse(class == "Middle Class", "Middle Class", 
                                              ifelse(class == "Upper Class", "Upper Class", NA)))),
         # Then make filteredclass into a factor with the same levels as class
         # Use droplevels() to remove unused classes (since we removed the No Class)
         filteredclass = droplevels(factor(filteredclass, levels = levels(class))))

with(gss, table(class))
with(gss, table(filteredclass))

输出是这样的,

> with(gss, table(class, useNA = "always"))
class
  Lower Class Working Class  Middle Class   Upper Class      No Class 
         7362          7469          7626          7450          7457 
         <NA> 
         7636 

> with(gss, table(class))
class
  Lower Class Working Class  Middle Class   Upper Class      No Class 
         7362          7469          7626          7450          7457 

> with(gss, table(filteredclass))
filteredclass
  Lower Class Working Class  Middle Class   Upper Class 
         7362          7469          7626          7450 

一种更快的方法是使用 droplevels() 而不是 ifelse() 语句链

# Filter/remove obs where class is No Class or NA
with(gss %>% mutate(filteredclass = droplevels(class, exclude = c(NA, "No Class"))),
     table(filteredclass))


filteredclass
  Lower Class Working Class  Middle Class   Upper Class 
         7362          7469          7626          7450