转换在 Amazon Athena (Presto) 中无法正常工作?

Casting not working correctly in Amazon Athena (Presto)?

我有一个医生执照登记数据集,其中包括每位医生的 total_submitted_charge_amount 以及 medicare 和 medicaid 的权利数量。我使用了下面建议的答案中的查询:

    with datamart AS 
    (SELECT npi,
         provider_last_name,
         provider_first_name,
         provider_mid_initial,
         provider_address_1,
         provider_address_2,
         provider_city,
         provider_zipcode,
         provider_state_code,
         provider_country_code,
         provider_type,
         number_of_services,

        CASE
        WHEN REPLACE(num_entitlement_medicare_medicaid,',', '') ='' THEN
        null
        ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
        END AS medicare_medicaid_entitlement,
        CASE
        WHEN REPLACE(total_submitted_charge_amount,',', '') ='' THEN
        null
        ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
        END AS total_submitted_charge_amount
    FROM cmsaggregatepayment2017)
SELECT *
FROM datamart
ORDER BY  total_submitted_charge_amount DESC

不幸的是我得到了错误

INVALID_CAST_ARGUMENT: Cannot cast VARCHAR '' to DECIMAL(38, 0)

此查询 运行 针对 aggregatepayment_data_2017 数据库,除非查询符合条件。请 post 我们论坛上的错误消息或使用查询 ID 联系客户支持:be01d1e8-dc4d-4c75-a648-428dcb6be3a5。”我已经尝试了 Decimal、Real、Big int,但都无法转换 num_entitlement_medicare_medicaid。下面是数据的截图:

有人可以建议如何改写这个查询吗?

您收到错误的原因是列中有空白值(但它不为空),我们无法将 varchar '' 转换为十进制。您可能可以使用 case 语句。此外,根据数据集列 num_entitlement_medicare_medicaid 中有逗号 ',',您不会替换它。

    SELECT npi, 
case
when REPLACE(num_entitlement_medicare_medicaid,'[^A-Za-z0-9.]', '') ='' then null
else CAST(REPLACE(num_entitlement_medicare_medicaid,'[^0-9.]', '') AS DECIMAL)
end as medicare_medicaid_entitlement,
case 
when REPLACE(total_submitted_charge_amount,'[^A-Za-z0-9.]', '') ='' then null
else CAST(REPLACE(total_submitted_charge_amount,'[^0-9.]', '') AS DECIMAL)
end as total_submitted_charge_amount
FROM cmsaggregatepayment2017

您可以 将数据转换为新的 table 并使用 'clean' 数据:

CREATE TABLE clean_table
WITH (format='Parquet', external_location='s3://my_bucket/clean_data/')
AS
SELECT
  npi,
  provider_last_name,
  provider_first_name,
  ...
  CASE WHEN REPLACE(num_entitlement_medicare_medicaid,',', '') ='' THEN null
       ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
       END AS medicare_medicaid_entitlement,
  CASE WHEN REPLACE(total_submitted_charge_amount,',', '') ='' THEN null
       ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
       END AS total_submitted_charge_amount
  FROM cmsaggregatepayment2017

您可以 SELECT ... FROM clean_table 而无需进行任何转换。

在数据仓库中,这种类型的过程称为 ETL(提取、转换、加载)。清理过程是 'transform' 将数据转换成更有用的格式。

参见:CREATE TABLE AS - Amazon Athena

您可能想立即尝试 try_cast()。这个版本适用于强制。如果有任何错误,它会避免它并移动到下一个项目。

文档:https://prestodb.io/docs/current/functions/conversion.html