转换在 Amazon Athena (Presto) 中无法正常工作?
Casting not working correctly in Amazon Athena (Presto)?
我有一个医生执照登记数据集,其中包括每位医生的 total_submitted_charge_amount 以及 medicare 和 medicaid 的权利数量。我使用了下面建议的答案中的查询:
with datamart AS
(SELECT npi,
provider_last_name,
provider_first_name,
provider_mid_initial,
provider_address_1,
provider_address_2,
provider_city,
provider_zipcode,
provider_state_code,
provider_country_code,
provider_type,
number_of_services,
CASE
WHEN REPLACE(num_entitlement_medicare_medicaid,',', '') ='' THEN
null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS medicare_medicaid_entitlement,
CASE
WHEN REPLACE(total_submitted_charge_amount,',', '') ='' THEN
null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS total_submitted_charge_amount
FROM cmsaggregatepayment2017)
SELECT *
FROM datamart
ORDER BY total_submitted_charge_amount DESC
不幸的是我得到了错误
INVALID_CAST_ARGUMENT: Cannot cast VARCHAR '' to DECIMAL(38, 0)
此查询 运行 针对 aggregatepayment_data_2017
数据库,除非查询符合条件。请 post 我们论坛上的错误消息或使用查询 ID 联系客户支持:be01d1e8-dc4d-4c75-a648-428dcb6be3a5
。”我已经尝试了 Decimal、Real、Big int,但都无法转换 num_entitlement_medicare_medicaid。下面是数据的截图:
有人可以建议如何改写这个查询吗?
您收到错误的原因是列中有空白值(但它不为空),我们无法将 varchar '' 转换为十进制。您可能可以使用 case 语句。此外,根据数据集列 num_entitlement_medicare_medicaid 中有逗号 ',',您不会替换它。
SELECT npi,
case
when REPLACE(num_entitlement_medicare_medicaid,'[^A-Za-z0-9.]', '') ='' then null
else CAST(REPLACE(num_entitlement_medicare_medicaid,'[^0-9.]', '') AS DECIMAL)
end as medicare_medicaid_entitlement,
case
when REPLACE(total_submitted_charge_amount,'[^A-Za-z0-9.]', '') ='' then null
else CAST(REPLACE(total_submitted_charge_amount,'[^0-9.]', '') AS DECIMAL)
end as total_submitted_charge_amount
FROM cmsaggregatepayment2017
您可以 将数据转换为新的 table 并使用 'clean' 数据:
CREATE TABLE clean_table
WITH (format='Parquet', external_location='s3://my_bucket/clean_data/')
AS
SELECT
npi,
provider_last_name,
provider_first_name,
...
CASE WHEN REPLACE(num_entitlement_medicare_medicaid,',', '') ='' THEN null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS medicare_medicaid_entitlement,
CASE WHEN REPLACE(total_submitted_charge_amount,',', '') ='' THEN null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS total_submitted_charge_amount
FROM cmsaggregatepayment2017
您可以 SELECT ... FROM clean_table
而无需进行任何转换。
在数据仓库中,这种类型的过程称为 ETL(提取、转换、加载)。清理过程是 'transform' 将数据转换成更有用的格式。
您可能想立即尝试 try_cast()。这个版本适用于强制。如果有任何错误,它会避免它并移动到下一个项目。
文档:https://prestodb.io/docs/current/functions/conversion.html
我有一个医生执照登记数据集,其中包括每位医生的 total_submitted_charge_amount 以及 medicare 和 medicaid 的权利数量。我使用了下面建议的答案中的查询:
with datamart AS
(SELECT npi,
provider_last_name,
provider_first_name,
provider_mid_initial,
provider_address_1,
provider_address_2,
provider_city,
provider_zipcode,
provider_state_code,
provider_country_code,
provider_type,
number_of_services,
CASE
WHEN REPLACE(num_entitlement_medicare_medicaid,',', '') ='' THEN
null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS medicare_medicaid_entitlement,
CASE
WHEN REPLACE(total_submitted_charge_amount,',', '') ='' THEN
null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS total_submitted_charge_amount
FROM cmsaggregatepayment2017)
SELECT *
FROM datamart
ORDER BY total_submitted_charge_amount DESC
不幸的是我得到了错误
INVALID_CAST_ARGUMENT: Cannot cast VARCHAR '' to DECIMAL(38, 0)
此查询 运行 针对 aggregatepayment_data_2017
数据库,除非查询符合条件。请 post 我们论坛上的错误消息或使用查询 ID 联系客户支持:be01d1e8-dc4d-4c75-a648-428dcb6be3a5
。”我已经尝试了 Decimal、Real、Big int,但都无法转换 num_entitlement_medicare_medicaid。下面是数据的截图:
有人可以建议如何改写这个查询吗?
您收到错误的原因是列中有空白值(但它不为空),我们无法将 varchar '' 转换为十进制。您可能可以使用 case 语句。此外,根据数据集列 num_entitlement_medicare_medicaid 中有逗号 ',',您不会替换它。
SELECT npi,
case
when REPLACE(num_entitlement_medicare_medicaid,'[^A-Za-z0-9.]', '') ='' then null
else CAST(REPLACE(num_entitlement_medicare_medicaid,'[^0-9.]', '') AS DECIMAL)
end as medicare_medicaid_entitlement,
case
when REPLACE(total_submitted_charge_amount,'[^A-Za-z0-9.]', '') ='' then null
else CAST(REPLACE(total_submitted_charge_amount,'[^0-9.]', '') AS DECIMAL)
end as total_submitted_charge_amount
FROM cmsaggregatepayment2017
您可以 将数据转换为新的 table 并使用 'clean' 数据:
CREATE TABLE clean_table
WITH (format='Parquet', external_location='s3://my_bucket/clean_data/')
AS
SELECT
npi,
provider_last_name,
provider_first_name,
...
CASE WHEN REPLACE(num_entitlement_medicare_medicaid,',', '') ='' THEN null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS medicare_medicaid_entitlement,
CASE WHEN REPLACE(total_submitted_charge_amount,',', '') ='' THEN null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS total_submitted_charge_amount
FROM cmsaggregatepayment2017
您可以 SELECT ... FROM clean_table
而无需进行任何转换。
在数据仓库中,这种类型的过程称为 ETL(提取、转换、加载)。清理过程是 'transform' 将数据转换成更有用的格式。
您可能想立即尝试 try_cast()。这个版本适用于强制。如果有任何错误,它会避免它并移动到下一个项目。
文档:https://prestodb.io/docs/current/functions/conversion.html