如何解决 label_eurostat() 中的错误:"Dictionary information is missing"

How to solve an error in label_eurostat(): "Dictionary information is missing"

我定期使用 R 中的 eurostat 包从 eurostat 下载数据集,并用函数 label_eurostat() 标记它。以下代码在过去运行良好,但自本周以来出现了一些错误:

> emprt <- get_eurostat("lfst_r_lfe2emprt", time_format = "num")
> emprt <- filter(emprt, sex == "T", age == "Y15-64", geo %in% c("AT", "DE", "FR"))
> emprt <- dcast(emprt, geo ~ time)
Using values as value column: use value.var to override.
> emprt <- label_eurostat(emprt, lang = "de")
Error in label_eurostat(emprt, lang = "de") :
 Dictionary information is missing

我也尝试了一个特定的词典,但收到了另一条警告消息:

> emprt <- label_eurostat(emprt, dic = "geo", lang = "de")
Warning message:
In label_eurostat(emprt, dic = "geo", lang = "de") :
  All labels for geo were not found.

我不确定字典是否适合选择,但它是我在欧盟统计局找到的唯一一本。 我还看到这个函数还有一些其他问题导致这样的错误:

Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  : 
factor level [19] is duplicated

但我不确定这是否与我的问题有关。 我感谢每一个提示!

你可以使用

packageVersion("eurostat")
# [1] ‘3.1.1’
library(eurostat)
library(tidyverse)
library(reshape2)
get_eurostat("lfst_r_lfe2emprt", time_format = "num") %>%
  filter(sex == "T", age == "Y15-64", geo %in% c("AT", "DE", "FR")) %>%
  dcast(geo ~ time) %>%
  droplevels %>%
  mutate(geo = label_eurostat(geo, dic = "geo", lang = "de"))

get_eurostat("lfst_r_lfe2emprt", time_format = "num") %>%
  filter(sex == "T", age == "Y15-64", geo %in% c("AT", "DE", "FR")) %>%
  label_eurostat(lang = "de") %>%
  dcast(geo ~ time)

关于警告:如果您不删除未使用的 geo 因子水平,label_eurostat 可能会分配重复的标签;例如考虑

get_eurostat("lfst_r_lfe2emprt", time_format = "num") %>% 
  pull(geo) %>% 
  levels %>% 
  grep(pattern = "^DE3", value = TRUE)
# [1] "DE3"  "DE30"

如果您现在查看 get_eurostat_dic("geo")DE3DE30 都会导致 Berlin

get_eurostat_dic("geo") %>% filter(grepl("^DE30?$", code_name))
# # A tibble: 2 x 2
#   code_name full_name
#       <chr>     <chr>
# 1       DE3    Berlin
# 2      DE30    Berlin

旁注:如果加载了 tidyverse,则不需要 reshape2::dcast;您也可以改为 select(geo, time, values) %>% spread(time, values)