使用另一个变量的选定水平创建一个新变量
Make a new variable with selected levels of another variable
我在使用另一个变量的选定水平创建新变量时遇到了问题。数据集是gss,变量是class有5个水平"Lower Class""Working Class""Middle Class""Upper Class""No Class"和NA
如果我运行,
gss %>%
select(class) %>%
str()
它给了我
'data.frame': 57061 obs. of 1 variable:
$ class: Factor w/ 5 levels "Lower Class",..: 3 3 2 3 2 3 3 2 2 2 ...
因为我只对那些指定了经济 class 的人感兴趣,所以我想去掉 "No Class" 水平和 NA。我不知道有什么更好的方法,所以我做了
gss <- gss %>%
mutate(filteredclass = ifelse(class == "Lower Class", "Lower Class",
ifelse(class == "Working Class", "Working Class", ifelse(class == "Middle
Class", "Middle Class", ifelse(class == "Upper Class", "Upper Class", NA)))))
然后,我试着看看它是否有效,所以我 运行:
with (gss, table(filteredclass))
然后给了我如下的混合订单:
filteredclass
Lower Class Middle Class Upper Class Working Class
3147 24289 1741 24458
我希望新变量 filteredclass 显示为与变量 'class' 相同的顺序。因为如果我对变量 'class' 做同样的事情,它会给我:
with (gss, table(class))
class
Lower Class Working Class Middle Class Upper Class
3147 24458 24289 1741
No Class
1
有什么办法可以解决这个问题吗?或者,有没有什么办法可以在不通过我上面执行的 mutate 命令的情况下取消 No Class 级别?
提前感谢您的帮助!
以后如果你提供一个reproducible example就容易多了。
如果你想摆脱 "No Class" 你可以使用 filter
gss <- gss %>%
filter(class != "No Class") %>%
droplevels()
要删除 NA,只需使用
gss <- na.omit(gss)
最简单的方法可能是 factor
在 class 上:
gss$filteredclass <- factor(gss$class, c("Lower Class", "Working Class",
"Middle Class", "Upper Class"))
这将省略 "No class" 并将其设置为 NA
。
您必须按照与 gss$class
相同的顺序重新调整因子。
为此,您可以在 mutate()
语句中添加另一行,您可以在其中创建具有相同水平的因子并删除未使用的水平(否 Class)。
library(tidyverse)
# Generate the data you showed
gss <- data.frame(class = factor(sample(c("Lower Class", "Working Class", "Middle Class", "Upper Class", NA, "No Class"),
45000, replace = TRUE))) %>%
mutate(class = factor(class, levels = c("Lower Class", "Working Class", "Middle Class", "Upper Class", "No Class", NA)))
# Sampled data
with(gss, table(class, useNA = "always"))
# Mutate gss the way you did it
gss <- gss %>%
mutate(filteredclass = ifelse(class == "Lower Class", "Lower Class",
ifelse(class == "Working Class", "Working Class",
ifelse(class == "Middle Class", "Middle Class",
ifelse(class == "Upper Class", "Upper Class", NA)))),
# Then make filteredclass into a factor with the same levels as class
# Use droplevels() to remove unused classes (since we removed the No Class)
filteredclass = droplevels(factor(filteredclass, levels = levels(class))))
with(gss, table(class))
with(gss, table(filteredclass))
输出是这样的,
> with(gss, table(class, useNA = "always"))
class
Lower Class Working Class Middle Class Upper Class No Class
7362 7469 7626 7450 7457
<NA>
7636
> with(gss, table(class))
class
Lower Class Working Class Middle Class Upper Class No Class
7362 7469 7626 7450 7457
> with(gss, table(filteredclass))
filteredclass
Lower Class Working Class Middle Class Upper Class
7362 7469 7626 7450
一种更快的方法是使用 droplevels()
而不是 ifelse()
语句链
# Filter/remove obs where class is No Class or NA
with(gss %>% mutate(filteredclass = droplevels(class, exclude = c(NA, "No Class"))),
table(filteredclass))
filteredclass
Lower Class Working Class Middle Class Upper Class
7362 7469 7626 7450
我在使用另一个变量的选定水平创建新变量时遇到了问题。数据集是gss,变量是class有5个水平"Lower Class""Working Class""Middle Class""Upper Class""No Class"和NA
如果我运行,
gss %>%
select(class) %>%
str()
它给了我
'data.frame': 57061 obs. of 1 variable:
$ class: Factor w/ 5 levels "Lower Class",..: 3 3 2 3 2 3 3 2 2 2 ...
因为我只对那些指定了经济 class 的人感兴趣,所以我想去掉 "No Class" 水平和 NA。我不知道有什么更好的方法,所以我做了
gss <- gss %>%
mutate(filteredclass = ifelse(class == "Lower Class", "Lower Class",
ifelse(class == "Working Class", "Working Class", ifelse(class == "Middle
Class", "Middle Class", ifelse(class == "Upper Class", "Upper Class", NA)))))
然后,我试着看看它是否有效,所以我 运行:
with (gss, table(filteredclass))
然后给了我如下的混合订单:
filteredclass
Lower Class Middle Class Upper Class Working Class
3147 24289 1741 24458
我希望新变量 filteredclass 显示为与变量 'class' 相同的顺序。因为如果我对变量 'class' 做同样的事情,它会给我:
with (gss, table(class))
class
Lower Class Working Class Middle Class Upper Class
3147 24458 24289 1741
No Class
1
有什么办法可以解决这个问题吗?或者,有没有什么办法可以在不通过我上面执行的 mutate 命令的情况下取消 No Class 级别?
提前感谢您的帮助!
以后如果你提供一个reproducible example就容易多了。
如果你想摆脱 "No Class" 你可以使用 filter
gss <- gss %>%
filter(class != "No Class") %>%
droplevels()
要删除 NA,只需使用
gss <- na.omit(gss)
最简单的方法可能是 factor
在 class 上:
gss$filteredclass <- factor(gss$class, c("Lower Class", "Working Class",
"Middle Class", "Upper Class"))
这将省略 "No class" 并将其设置为 NA
。
您必须按照与 gss$class
相同的顺序重新调整因子。
为此,您可以在 mutate()
语句中添加另一行,您可以在其中创建具有相同水平的因子并删除未使用的水平(否 Class)。
library(tidyverse)
# Generate the data you showed
gss <- data.frame(class = factor(sample(c("Lower Class", "Working Class", "Middle Class", "Upper Class", NA, "No Class"),
45000, replace = TRUE))) %>%
mutate(class = factor(class, levels = c("Lower Class", "Working Class", "Middle Class", "Upper Class", "No Class", NA)))
# Sampled data
with(gss, table(class, useNA = "always"))
# Mutate gss the way you did it
gss <- gss %>%
mutate(filteredclass = ifelse(class == "Lower Class", "Lower Class",
ifelse(class == "Working Class", "Working Class",
ifelse(class == "Middle Class", "Middle Class",
ifelse(class == "Upper Class", "Upper Class", NA)))),
# Then make filteredclass into a factor with the same levels as class
# Use droplevels() to remove unused classes (since we removed the No Class)
filteredclass = droplevels(factor(filteredclass, levels = levels(class))))
with(gss, table(class))
with(gss, table(filteredclass))
输出是这样的,
> with(gss, table(class, useNA = "always"))
class
Lower Class Working Class Middle Class Upper Class No Class
7362 7469 7626 7450 7457
<NA>
7636
> with(gss, table(class))
class
Lower Class Working Class Middle Class Upper Class No Class
7362 7469 7626 7450 7457
> with(gss, table(filteredclass))
filteredclass
Lower Class Working Class Middle Class Upper Class
7362 7469 7626 7450
一种更快的方法是使用 droplevels()
而不是 ifelse()
语句链
# Filter/remove obs where class is No Class or NA
with(gss %>% mutate(filteredclass = droplevels(class, exclude = c(NA, "No Class"))),
table(filteredclass))
filteredclass
Lower Class Working Class Middle Class Upper Class
7362 7469 7626 7450