如何解决 label_eurostat() 中的错误:"Dictionary information is missing"
How to solve an error in label_eurostat(): "Dictionary information is missing"
我定期使用 R 中的 eurostat 包从 eurostat 下载数据集,并用函数 label_eurostat() 标记它。以下代码在过去运行良好,但自本周以来出现了一些错误:
> emprt <- get_eurostat("lfst_r_lfe2emprt", time_format = "num")
> emprt <- filter(emprt, sex == "T", age == "Y15-64", geo %in% c("AT", "DE", "FR"))
> emprt <- dcast(emprt, geo ~ time)
Using values as value column: use value.var to override.
> emprt <- label_eurostat(emprt, lang = "de")
Error in label_eurostat(emprt, lang = "de") :
Dictionary information is missing
我也尝试了一个特定的词典,但收到了另一条警告消息:
> emprt <- label_eurostat(emprt, dic = "geo", lang = "de")
Warning message:
In label_eurostat(emprt, dic = "geo", lang = "de") :
All labels for geo were not found.
我不确定字典是否适合选择,但它是我在欧盟统计局找到的唯一一本。
我还看到这个函数还有一些其他问题导致这样的错误:
Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, :
factor level [19] is duplicated
但我不确定这是否与我的问题有关。
我感谢每一个提示!
你可以使用
packageVersion("eurostat")
# [1] ‘3.1.1’
library(eurostat)
library(tidyverse)
library(reshape2)
get_eurostat("lfst_r_lfe2emprt", time_format = "num") %>%
filter(sex == "T", age == "Y15-64", geo %in% c("AT", "DE", "FR")) %>%
dcast(geo ~ time) %>%
droplevels %>%
mutate(geo = label_eurostat(geo, dic = "geo", lang = "de"))
或
get_eurostat("lfst_r_lfe2emprt", time_format = "num") %>%
filter(sex == "T", age == "Y15-64", geo %in% c("AT", "DE", "FR")) %>%
label_eurostat(lang = "de") %>%
dcast(geo ~ time)
关于警告:如果您不删除未使用的 geo
因子水平,label_eurostat
可能会分配重复的标签;例如考虑
get_eurostat("lfst_r_lfe2emprt", time_format = "num") %>%
pull(geo) %>%
levels %>%
grep(pattern = "^DE3", value = TRUE)
# [1] "DE3" "DE30"
如果您现在查看 get_eurostat_dic("geo")
,DE3
和 DE30
都会导致 Berlin
:
get_eurostat_dic("geo") %>% filter(grepl("^DE30?$", code_name))
# # A tibble: 2 x 2
# code_name full_name
# <chr> <chr>
# 1 DE3 Berlin
# 2 DE30 Berlin
旁注:如果加载了 tidyverse,则不需要 reshape2::dcast
;您也可以改为 select(geo, time, values) %>% spread(time, values)
。
我定期使用 R 中的 eurostat 包从 eurostat 下载数据集,并用函数 label_eurostat() 标记它。以下代码在过去运行良好,但自本周以来出现了一些错误:
> emprt <- get_eurostat("lfst_r_lfe2emprt", time_format = "num")
> emprt <- filter(emprt, sex == "T", age == "Y15-64", geo %in% c("AT", "DE", "FR"))
> emprt <- dcast(emprt, geo ~ time)
Using values as value column: use value.var to override.
> emprt <- label_eurostat(emprt, lang = "de")
Error in label_eurostat(emprt, lang = "de") :
Dictionary information is missing
我也尝试了一个特定的词典,但收到了另一条警告消息:
> emprt <- label_eurostat(emprt, dic = "geo", lang = "de")
Warning message:
In label_eurostat(emprt, dic = "geo", lang = "de") :
All labels for geo were not found.
我不确定字典是否适合选择,但它是我在欧盟统计局找到的唯一一本。 我还看到这个函数还有一些其他问题导致这样的错误:
Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, :
factor level [19] is duplicated
但我不确定这是否与我的问题有关。 我感谢每一个提示!
你可以使用
packageVersion("eurostat")
# [1] ‘3.1.1’
library(eurostat)
library(tidyverse)
library(reshape2)
get_eurostat("lfst_r_lfe2emprt", time_format = "num") %>%
filter(sex == "T", age == "Y15-64", geo %in% c("AT", "DE", "FR")) %>%
dcast(geo ~ time) %>%
droplevels %>%
mutate(geo = label_eurostat(geo, dic = "geo", lang = "de"))
或
get_eurostat("lfst_r_lfe2emprt", time_format = "num") %>%
filter(sex == "T", age == "Y15-64", geo %in% c("AT", "DE", "FR")) %>%
label_eurostat(lang = "de") %>%
dcast(geo ~ time)
关于警告:如果您不删除未使用的 geo
因子水平,label_eurostat
可能会分配重复的标签;例如考虑
get_eurostat("lfst_r_lfe2emprt", time_format = "num") %>%
pull(geo) %>%
levels %>%
grep(pattern = "^DE3", value = TRUE)
# [1] "DE3" "DE30"
如果您现在查看 get_eurostat_dic("geo")
,DE3
和 DE30
都会导致 Berlin
:
get_eurostat_dic("geo") %>% filter(grepl("^DE30?$", code_name))
# # A tibble: 2 x 2
# code_name full_name
# <chr> <chr>
# 1 DE3 Berlin
# 2 DE30 Berlin
旁注:如果加载了 tidyverse,则不需要 reshape2::dcast
;您也可以改为 select(geo, time, values) %>% spread(time, values)
。