<NA> 的条形图列

barplot column for <NA>

我想在我的条形图中有一列用于缺失数据。

adult <- read.csv(
    "http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", 
    header = FALSE, 
    na.strings = "?", 
    strip.white = TRUE
)
colnames(adult) <- c("age", "workClass", "fnlwgt", "education", "educationNum", "maritalStatus", "occupation", "relationship", "race", "sex", "capitalGain", "capitalLoss", "hoursPerWeek", "nativeCountry", "prediction")
barplot(table(adult$workClass), main="Job Distribution", xlab="Job", ylab="Count",las=2)

我知道在这个数据集中,workClass 有 1836 个缺失值,来自

length(which(is.na(adult$workClass)))

您可以在 table 中使用参数 useNA = "ifany"

tab <- table(adult$workClass, useNA = "ifany")
#  Federal-gov        Local-gov     Never-worked          Private 
#          960             2093                7            22696 
# Self-emp-inc Self-emp-not-inc        State-gov      Without-pay 
#         1116             2541             1298               14 
#         <NA> 
#         1836 

默认情况下,NA 计数的名称是 NA 本身。您可以通过以下命令将名称更改为字符串"NA"

names(tab)[is.na(names(tab))] <- "NA"

现在,绘图也在 x 轴上显示名称 "NA"

barplot(tab, main = "Job Distribution", xlab = "Job", ylab = "Count", las = 2)

您可以在 table() 中组合 useNA = "ifany" 并在 barplot()

中组合 names.arg
barplot(table(adult$workClass, useNA = "ifany"),
        names.arg = c(levels(adult$workClass),"NA's") )

c(levels(adult$workClass),"NA's") 正在创建一个向量,其中包含变量中所有 levels/categories 的名称和自定义名称 NA's 以表示 NA 值