Hive 日期字符串验证

Hive Date String Validation

我正在尝试检查字符串是否为“YYYYMMDD”的有效日期格式。

我正在使用以下技术。但是对于无效的日期字符串,我得到了有效的日期结果。

我做错了什么?

SELECT'20019999',CASE WHEN unix_timestamp('20019999','YYYYMMDD')  > 0 THEN  'Good'ELSE 'Bad'END;

首先,你使用的格式不对

select  from_unixtime(unix_timestamp())                 as default_format
       ,from_unixtime(unix_timestamp(),'YYYY-MM-DD')    as wrong_format
       ,from_unixtime(unix_timestamp(),'yyyy-MM-dd')    as right_format
;

+----------------------+---------------+---------------+
|    default_format    | wrong_format  | right_format  |
+----------------------+---------------+---------------+
| 2017-10-07 04:13:26  | 2017-10-280   | 2017-10-07    |
+----------------------+---------------+---------------+

其次,没有对日期部分范围进行验证。
如果您将日期部分增加 1,它会将您转到第二天。

with t as (select stack(7,'27','28','29','30','31','32','33') as dy)
select  t.dy
       ,from_unixtime(unix_timestamp(concat('2017-02-',t.dy),'yyyy-MM-dd'),'yyyy-MM-dd') as dt

from    t
;

+-----+-------------+
| dy  |     dt      |
+-----+-------------+
| 27  | 2017-02-27  |
| 28  | 2017-02-28  |
| 29  | 2017-03-01  |
| 30  | 2017-03-02  |
| 31  | 2017-03-03  |
| 32  | 2017-03-04  |
| 33  | 2017-03-05  |
+-----+-------------+

如果您将月份部分增加 1,它会将您转到下一个月。

with t as (select stack(5,'10','11','12','13','14') as mn)
select  t.mn
       ,from_unixtime(unix_timestamp(concat('2017-',t.mn,'-01'),'yyyy-MM-dd'),'yyyy-MM-dd') as dt

from    t
;

+-----+-------------+
| mn  |     dt      |
+-----+-------------+
| 10  | 2017-10-01  |
| 11  | 2017-11-01  |
| 12  | 2017-12-01  |
| 13  | 2018-01-01  |
| 14  | 2018-02-01  |
+-----+-------------+

即使使用 CAST,验证也只针对零件范围而不是日期本身。

select cast('2010-02-32' as date);

+-------+
|  _c0  |
+-------+
| NULL  |
+-------+

select cast('2010-02-29' as date);

+-------------+
|     _c0     |
+-------------+
| 2010-03-01  |
+-------------+

这是实现目标的方法:

with t as (select '20019999' as dt)
select  dt  
       ,from_unixtime(unix_timestamp(dt,'yyyyMMdd'),'yyyyMMdd') as double_converted_dt    

       ,case 
            when from_unixtime(unix_timestamp(dt,'yyyyMMdd'),'yyyyMMdd')  = dt 
            then 'Good' 
            else 'Bad' 
        end             as dt_status

from    t
;

+-----------+----------------------+------------+
|    dt     | double_converted_dt  | dt_status  |
+-----------+----------------------+------------+
| 20019999  | 20090607             | Bad        |
+-----------+----------------------+------------+