当从 API 返回错误时应用将数据框转换为字符的函数
Apply function converting data frame to character when error returned from API
我编写了查询 CMS 国家计划和提供商枚举系统 (NPPES) 的函数 API。
我希望传递 NPI 值的数据框和 return 它们的地址。
一些 NPI 值不再有效,我已尝试为这些情况构建一些错误处理。
我的错误处理 if else
语句指定使用 1 行 6 列的数据框,我已将错误的 NPI 值插入到第 1 行第 1 列中。
当我对我的数据框使用应用函数时,我得到一个列表 [1x6],其中包含所有成功的 API 调用,但错误值只是一个字符向量。
我已尝试调试此问题,但无法弄清楚从数据框到字符的转换发生在何处。如果有人能帮助我,我将不胜感激。
这是我要查询的值的数据框:
install.packages("pacman")
library(pacman)
pacman::p_load(tidyverse,data.table,httr,jsonlite)
values <- c(1598727430,
1083632731,
1710983663) # LAST VALUE PRODUCES THE ERROR CASE
npi_values <- data.frame(values)
这是 API 的 URL:
path <- "https://npiregistry.cms.hhs.gov/api/?"
我的函数:
# CREATE A FUNCTION TO PULL NPI INFORMATION FROM THE NPI REGISTRY
getNPI <- function(object) {
request <- httr::GET(
url = path,
query = list(
version = "2.0",
number = object
)
)
Sys.sleep(0.25)
warn_for_status(request)
npi_details <- content(request,
as = "text",
encoding = "UTF-8"
) %>%
fromJSON(.,
flatten = TRUE
) %>%
data.frame()
# IF THE API THROWS BACK A RESULT WHERE THE COLUMN NAMES CONTAIN 'ERROR'
# THEN ASSIGN ALL THE ROW VALUES TO NA AND ADD THE NPI VALUE TO THE FIRST
# COLUMN
if (any(grepl("ERROR", toupper(colnames(npi_details))))) {
npi_details <- as.data.frame(matrix(NA,ncol = 6,nrow = 1)) %>%
dplyr::rename(`NPI NUMBER` = V1,
`CMS REF ADDRESS 1` = V2,
`CMS REF ADDRESS 2` = V3,
`CMS REF CITY` = V4,
`CMS REF STATE` = V5,
`CMS REF ZIP` = V6)
npi_details[1,1] <- object
# ELSE IF THE DATA FRAME DOES NOT CONTAIN 'ERROR' THEN RUN THIS CHUNK
} else {
select(npi_details, contains(c("addresses", "number"))) %>%
unnest(c(contains("address"))) %>%
filter(address_purpose == "MAILING") %>%
rename_all(.funs = toupper) %>%
select(
`NPI NUMBER` = RESULTS.NUMBER,
-COUNTRY_CODE,
-COUNTRY_NAME,
-ADDRESS_PURPOSE,
-ADDRESS_TYPE,
`CMS REF ADDRESS 1` = ADDRESS_1,
`CMS REF ADDRESS 2` = ADDRESS_2,
`CMS REF CITY` = CITY,
`CMS REF STATE` = STATE,
`CMS REF ZIP` = POSTAL_CODE
)
}
}
然后我将此函数应用于上面 NPI 值的数据框:
out <- apply(npi_values, 1, getNPI)
当我将其应用于我的真实数据集时,您可以在下面看到错误大小写被转换为字符,即使我指定了大小为 1 行 6 列的数据框
根据@akrun 的反馈,我修改了我的应用语句,将 getNPI 函数包装在一个列表中,见下文:
out <- apply(npi_values, 1, function(x) list(getNPI(x)))
out
的结构现在如下所示:
str(out)
List of 3
$ :List of 1
..$ : tibble [1 × 6] (S3: tbl_df/tbl/data.frame)
.. ..$ NPI NUMBER : int 1598727430
.. ..$ CMS REF ADDRESS 1: chr "PO BOX 17567"
.. ..$ CMS REF ADDRESS 2: chr ""
.. ..$ CMS REF CITY : chr "PENSACOLA"
.. ..$ CMS REF STATE : chr "FL"
.. ..$ CMS REF ZIP : chr "325227567"
$ :List of 1
..$ : tibble [1 × 6] (S3: tbl_df/tbl/data.frame)
.. ..$ NPI NUMBER : int 1083632731
.. ..$ CMS REF ADDRESS 1: chr "PO BOX 17326"
.. ..$ CMS REF ADDRESS 2: chr ""
.. ..$ CMS REF CITY : chr "DENVER"
.. ..$ CMS REF STATE : chr "CO"
.. ..$ CMS REF ZIP : chr "802170326"
$ :List of 1
..$ : Named num 1.71e+09
.. ..- attr(*, "names")= chr "values"
当我尝试将这些列表折叠成 3 行 6 列的数据框时,最后一个案例(错误的案例)落入了第 7 列,这是不需要的。我希望将第 3 个案例的值存储在第一列中,其余值用 NA 填充。
期望的结果:
`NPI NUMBER` <- c(1598727430,1083632731,1710983663)
`CMS REF ADDRESS 1` <- c("PO BOX 17567","PO BOX 17326",NA)
`CMS REF ADDRESS 2` <- c("","",NA)
`CMS REF CITY` <- c("PENSACOLA","DENVER",NA)
`CMS REF STATE` <- c("FL","CO",NA)
`CMS REF ZIP` <- c("325227567","802170326",NA)
desired <- data.frame(`NPI NUMBER`,`CMS REF ADDRESS 1`,`CMS REF ADDRESS 2`,`CMS REF CITY`,`CMS REF STATE`,`CMS REF ZIP`)
apply()
将其提供的对象转换为矩阵,其中所有值必须属于同一类型。最通用的类型是字符,因此它被转换为字符,您的函数将应用于此字符矩阵。
见?apply()
:
If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.
原来我需要在 if else 语句的 if
部分中 return npi_details 的值,以保持我为错误创建的 tibble 的工作案例!
# CREATE A FUNCTION TO PULL NPI INFORMATION FROM THE NPI REGISTRY
getNPI <- function(object) {
request <- httr::GET(
url = path,
query = list(
version = "2.0",
number = object
)
)
Sys.sleep(0.25)
warn_for_status(request)
npi_details <- content(request,
as = "text",
encoding = "UTF-8"
) %>%
fromJSON(.,
flatten = TRUE
) %>%
data.frame()
# IF THE API THROWS BACK A RESULT WHERE THE COLUMN NAMES CONTAIN 'ERROR'
# THEN ASSIGN ALL THE ROW VALUES TO NA AND ADD THE NPI VALUE TO THE FIRST
# COLUMN
if (any(grepl("ERROR", toupper(colnames(npi_details))))) {
npi_details <- as.data.frame(matrix("error", ncol = 6, nrow = 1), stringsAsFactors = FALSE) %>%
dplyr::rename(
`NPI NUMBER` = V1,
`CMS REF ADDRESS 1` = V2,
`CMS REF ADDRESS 2` = V3,
`CMS REF CITY` = V4,
`CMS REF STATE` = V5,
`CMS REF ZIP` = V6
) %>% as_tibble()
npi_details[1,1] <- as.character(object)
return(npi_details)
# ELSE IF THE DATA FRAME DOES NOT CONTAIN 'ERROR' THEN RUN THIS CHUNK
} else {
select(npi_details, contains(c("addresses", "number"))) %>%
unnest(c(contains("address"))) %>%
filter(address_purpose == "MAILING") %>%
rename_all(.funs = toupper) %>%
select(
`NPI NUMBER` = RESULTS.NUMBER,
-COUNTRY_CODE,
-COUNTRY_NAME,
-ADDRESS_PURPOSE,
-ADDRESS_TYPE,
`CMS REF ADDRESS 1` = ADDRESS_1,
`CMS REF ADDRESS 2` = ADDRESS_2,
`CMS REF CITY` = CITY,
`CMS REF STATE` = STATE,
`CMS REF ZIP` = POSTAL_CODE
) %>%
mutate(`NPI NUMBER` = as.character(`NPI NUMBER`))
}
}
我编写了查询 CMS 国家计划和提供商枚举系统 (NPPES) 的函数 API。
我希望传递 NPI 值的数据框和 return 它们的地址。
一些 NPI 值不再有效,我已尝试为这些情况构建一些错误处理。
我的错误处理 if else
语句指定使用 1 行 6 列的数据框,我已将错误的 NPI 值插入到第 1 行第 1 列中。
当我对我的数据框使用应用函数时,我得到一个列表 [1x6],其中包含所有成功的 API 调用,但错误值只是一个字符向量。
我已尝试调试此问题,但无法弄清楚从数据框到字符的转换发生在何处。如果有人能帮助我,我将不胜感激。
这是我要查询的值的数据框:
install.packages("pacman")
library(pacman)
pacman::p_load(tidyverse,data.table,httr,jsonlite)
values <- c(1598727430,
1083632731,
1710983663) # LAST VALUE PRODUCES THE ERROR CASE
npi_values <- data.frame(values)
这是 API 的 URL:
path <- "https://npiregistry.cms.hhs.gov/api/?"
我的函数:
# CREATE A FUNCTION TO PULL NPI INFORMATION FROM THE NPI REGISTRY
getNPI <- function(object) {
request <- httr::GET(
url = path,
query = list(
version = "2.0",
number = object
)
)
Sys.sleep(0.25)
warn_for_status(request)
npi_details <- content(request,
as = "text",
encoding = "UTF-8"
) %>%
fromJSON(.,
flatten = TRUE
) %>%
data.frame()
# IF THE API THROWS BACK A RESULT WHERE THE COLUMN NAMES CONTAIN 'ERROR'
# THEN ASSIGN ALL THE ROW VALUES TO NA AND ADD THE NPI VALUE TO THE FIRST
# COLUMN
if (any(grepl("ERROR", toupper(colnames(npi_details))))) {
npi_details <- as.data.frame(matrix(NA,ncol = 6,nrow = 1)) %>%
dplyr::rename(`NPI NUMBER` = V1,
`CMS REF ADDRESS 1` = V2,
`CMS REF ADDRESS 2` = V3,
`CMS REF CITY` = V4,
`CMS REF STATE` = V5,
`CMS REF ZIP` = V6)
npi_details[1,1] <- object
# ELSE IF THE DATA FRAME DOES NOT CONTAIN 'ERROR' THEN RUN THIS CHUNK
} else {
select(npi_details, contains(c("addresses", "number"))) %>%
unnest(c(contains("address"))) %>%
filter(address_purpose == "MAILING") %>%
rename_all(.funs = toupper) %>%
select(
`NPI NUMBER` = RESULTS.NUMBER,
-COUNTRY_CODE,
-COUNTRY_NAME,
-ADDRESS_PURPOSE,
-ADDRESS_TYPE,
`CMS REF ADDRESS 1` = ADDRESS_1,
`CMS REF ADDRESS 2` = ADDRESS_2,
`CMS REF CITY` = CITY,
`CMS REF STATE` = STATE,
`CMS REF ZIP` = POSTAL_CODE
)
}
}
然后我将此函数应用于上面 NPI 值的数据框:
out <- apply(npi_values, 1, getNPI)
当我将其应用于我的真实数据集时,您可以在下面看到错误大小写被转换为字符,即使我指定了大小为 1 行 6 列的数据框
根据@akrun 的反馈,我修改了我的应用语句,将 getNPI 函数包装在一个列表中,见下文:
out <- apply(npi_values, 1, function(x) list(getNPI(x)))
out
的结构现在如下所示:
str(out)
List of 3
$ :List of 1
..$ : tibble [1 × 6] (S3: tbl_df/tbl/data.frame)
.. ..$ NPI NUMBER : int 1598727430
.. ..$ CMS REF ADDRESS 1: chr "PO BOX 17567"
.. ..$ CMS REF ADDRESS 2: chr ""
.. ..$ CMS REF CITY : chr "PENSACOLA"
.. ..$ CMS REF STATE : chr "FL"
.. ..$ CMS REF ZIP : chr "325227567"
$ :List of 1
..$ : tibble [1 × 6] (S3: tbl_df/tbl/data.frame)
.. ..$ NPI NUMBER : int 1083632731
.. ..$ CMS REF ADDRESS 1: chr "PO BOX 17326"
.. ..$ CMS REF ADDRESS 2: chr ""
.. ..$ CMS REF CITY : chr "DENVER"
.. ..$ CMS REF STATE : chr "CO"
.. ..$ CMS REF ZIP : chr "802170326"
$ :List of 1
..$ : Named num 1.71e+09
.. ..- attr(*, "names")= chr "values"
当我尝试将这些列表折叠成 3 行 6 列的数据框时,最后一个案例(错误的案例)落入了第 7 列,这是不需要的。我希望将第 3 个案例的值存储在第一列中,其余值用 NA 填充。
期望的结果:
`NPI NUMBER` <- c(1598727430,1083632731,1710983663)
`CMS REF ADDRESS 1` <- c("PO BOX 17567","PO BOX 17326",NA)
`CMS REF ADDRESS 2` <- c("","",NA)
`CMS REF CITY` <- c("PENSACOLA","DENVER",NA)
`CMS REF STATE` <- c("FL","CO",NA)
`CMS REF ZIP` <- c("325227567","802170326",NA)
desired <- data.frame(`NPI NUMBER`,`CMS REF ADDRESS 1`,`CMS REF ADDRESS 2`,`CMS REF CITY`,`CMS REF STATE`,`CMS REF ZIP`)
apply()
将其提供的对象转换为矩阵,其中所有值必须属于同一类型。最通用的类型是字符,因此它被转换为字符,您的函数将应用于此字符矩阵。
见?apply()
:
If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.
原来我需要在 if else 语句的 if
部分中 return npi_details 的值,以保持我为错误创建的 tibble 的工作案例!
# CREATE A FUNCTION TO PULL NPI INFORMATION FROM THE NPI REGISTRY
getNPI <- function(object) {
request <- httr::GET(
url = path,
query = list(
version = "2.0",
number = object
)
)
Sys.sleep(0.25)
warn_for_status(request)
npi_details <- content(request,
as = "text",
encoding = "UTF-8"
) %>%
fromJSON(.,
flatten = TRUE
) %>%
data.frame()
# IF THE API THROWS BACK A RESULT WHERE THE COLUMN NAMES CONTAIN 'ERROR'
# THEN ASSIGN ALL THE ROW VALUES TO NA AND ADD THE NPI VALUE TO THE FIRST
# COLUMN
if (any(grepl("ERROR", toupper(colnames(npi_details))))) {
npi_details <- as.data.frame(matrix("error", ncol = 6, nrow = 1), stringsAsFactors = FALSE) %>%
dplyr::rename(
`NPI NUMBER` = V1,
`CMS REF ADDRESS 1` = V2,
`CMS REF ADDRESS 2` = V3,
`CMS REF CITY` = V4,
`CMS REF STATE` = V5,
`CMS REF ZIP` = V6
) %>% as_tibble()
npi_details[1,1] <- as.character(object)
return(npi_details)
# ELSE IF THE DATA FRAME DOES NOT CONTAIN 'ERROR' THEN RUN THIS CHUNK
} else {
select(npi_details, contains(c("addresses", "number"))) %>%
unnest(c(contains("address"))) %>%
filter(address_purpose == "MAILING") %>%
rename_all(.funs = toupper) %>%
select(
`NPI NUMBER` = RESULTS.NUMBER,
-COUNTRY_CODE,
-COUNTRY_NAME,
-ADDRESS_PURPOSE,
-ADDRESS_TYPE,
`CMS REF ADDRESS 1` = ADDRESS_1,
`CMS REF ADDRESS 2` = ADDRESS_2,
`CMS REF CITY` = CITY,
`CMS REF STATE` = STATE,
`CMS REF ZIP` = POSTAL_CODE
) %>%
mutate(`NPI NUMBER` = as.character(`NPI NUMBER`))
}
}