当某些变量缺少标签时在 R 中提取 Stata 标签
extracting Stata labels in R when some variables are missing labels
我正在处理带有变量名称和标签的大型 Stata 文件。我需要这些标签来理解每个变量是什么。
我一直在用
df[] %>% map_chr(~attributes(.)$label)
提取变量名称和相关标签。不幸的是,一些数据集的变量缺少任何标签(见下图)。
这意味着当我尝试上面的代码时,我只是得到一个错误。
Error: Result 1 is not a length 1 atomic vector
理想情况下,我有一种方法可以调用所有缺失的标签 "NA" 或什么都不调用,因此我可以获得如下输出:
#
仅当具有缺失值的变量根本没有标签,但仍包含在内时。
您可以只执行 map
后跟 map_chr
的传递,即
library(haven)
library(dplyr)
library(purrr)
dat <- read_dta("http://data.princeton.edu/wws509/datasets/salary.dta")
attributes(dat$yr)$label <- NULL
dat %>% map_chr(~attributes(.)$label)
# Error: Result 3 is not a length 1 atomic vector
dat %>%
map(~attributes(.)$label) %>%
map_chr(~ifelse(is.null(.), NA, .))
# sx rk
# "Sex (coded 1 for female)" "Rank"
# yr dg
# NA "Highest degree earned"
# yd sl
# "Years since highest degree earned" "Academic year salary in dollars"
或等效
dat %>%
map(~attributes(.)) %>%
map_chr("label", .default = NA)
我觉得 purrr 的严格性妨碍了你在这里想要的东西。如果你只是lapply()
(或purrr::map()
),你会得到一个列表,非常适合使用:
# get an example Stata dataset
webuse::webuse("auto")
# drop the label on `price`
attr(auto$price, "label") <- NULL
# get all of the labels as a list
labels <- lapply(auto, attr, "label")
这给你:
> str(labels)
List of 12
$ make : chr "Make and Model"
$ price : NULL
$ mpg : chr "Mileage (mpg)"
$ rep78 : chr "Repair Record 1978"
$ headroom : chr "Headroom (in.)"
$ trunk : chr "Trunk space (cu. ft.)"
$ weight : chr "Weight (lbs.)"
$ length : chr "Length (in.)"
$ turn : chr "Turn Circle (ft.) "
$ displacement: chr "Displacement (cu. in.)"
$ gear_ratio : chr "Gear Ratio"
$ foreign : chr "Car type"
如果您愿意为没有标签的变量排除标签,您可以unlist()
:
> unlist(labels)
make mpg rep78 headroom
"Make and Model" "Mileage (mpg)" "Repair Record 1978" "Headroom (in.)"
trunk weight length turn
"Trunk space (cu. ft.)" "Weight (lbs.)" "Length (in.)" "Turn Circle (ft.) "
displacement gear_ratio foreign
"Displacement (cu. in.)" "Gear Ratio" "Car type"
我正在处理带有变量名称和标签的大型 Stata 文件。我需要这些标签来理解每个变量是什么。
我一直在用
df[] %>% map_chr(~attributes(.)$label)
提取变量名称和相关标签。不幸的是,一些数据集的变量缺少任何标签(见下图)。
这意味着当我尝试上面的代码时,我只是得到一个错误。
Error: Result 1 is not a length 1 atomic vector
理想情况下,我有一种方法可以调用所有缺失的标签 "NA" 或什么都不调用,因此我可以获得如下输出:
仅当具有缺失值的变量根本没有标签,但仍包含在内时。
您可以只执行 map
后跟 map_chr
的传递,即
library(haven)
library(dplyr)
library(purrr)
dat <- read_dta("http://data.princeton.edu/wws509/datasets/salary.dta")
attributes(dat$yr)$label <- NULL
dat %>% map_chr(~attributes(.)$label)
# Error: Result 3 is not a length 1 atomic vector
dat %>%
map(~attributes(.)$label) %>%
map_chr(~ifelse(is.null(.), NA, .))
# sx rk
# "Sex (coded 1 for female)" "Rank"
# yr dg
# NA "Highest degree earned"
# yd sl
# "Years since highest degree earned" "Academic year salary in dollars"
或等效
dat %>%
map(~attributes(.)) %>%
map_chr("label", .default = NA)
我觉得 purrr 的严格性妨碍了你在这里想要的东西。如果你只是lapply()
(或purrr::map()
),你会得到一个列表,非常适合使用:
# get an example Stata dataset
webuse::webuse("auto")
# drop the label on `price`
attr(auto$price, "label") <- NULL
# get all of the labels as a list
labels <- lapply(auto, attr, "label")
这给你:
> str(labels)
List of 12
$ make : chr "Make and Model"
$ price : NULL
$ mpg : chr "Mileage (mpg)"
$ rep78 : chr "Repair Record 1978"
$ headroom : chr "Headroom (in.)"
$ trunk : chr "Trunk space (cu. ft.)"
$ weight : chr "Weight (lbs.)"
$ length : chr "Length (in.)"
$ turn : chr "Turn Circle (ft.) "
$ displacement: chr "Displacement (cu. in.)"
$ gear_ratio : chr "Gear Ratio"
$ foreign : chr "Car type"
如果您愿意为没有标签的变量排除标签,您可以unlist()
:
> unlist(labels)
make mpg rep78 headroom
"Make and Model" "Mileage (mpg)" "Repair Record 1978" "Headroom (in.)"
trunk weight length turn
"Trunk space (cu. ft.)" "Weight (lbs.)" "Length (in.)" "Turn Circle (ft.) "
displacement gear_ratio foreign
"Displacement (cu. in.)" "Gear Ratio" "Car type"