运行 多次计数并加入结果

running multiple counts and joining the results

我正在尝试通过计算分区中的行数、计算我每天看到的 'uses' 的数量以及计算我每天看到的值的数量来对我的数据进行一些抽查.

我之前能够使用以下查询的先前版本,但我一定是在没有意识到的情况下更改了某些内容:

src as
(
   select partition_date_column, count(*) as src_row_count
   from database.table
   where partition_date_column > '2016-01-01' 
   group by partition_date_column
)

,
pst as
(
  select timestamp_pst as datevalue, count(*) as timestamp_row_count
  from database.table
  where partition_date_column > '2016-01-01'
  and timestamp_pst between '2016-01-01' and '2017-07-01'
  group by timestamp_pst
),

users as
(
  select timestamp_pst as user_datevalue, count(*) as user_count
  from database.table
  where partition_date_column > '2016-01-01'
  and timestamp_pst between '2016-01-01' and '2017-07-01'
  and filter_column in ('filterA', 'filterB')
  group by timestamp_pst
)

select datevalue as dayval, src_row_count, timestamp_row_count, user_count
from pst
left join src
on datevalue = partition_date_column
left join users
on datevalue = user_datevalue
order by dayval;

我不清楚是什么格式错误导致 Hive 无法识别这一点。我还觉得可能有更好的方法来计算这三项,即使其中一项被分组在不同的列中。

select      pe.val  as dt

           ,count(case when pe.pos = 0 then 1 end)  as src_row_count

           ,count
            (
                case  
                    when    pe.pos = 1 
                        and pe.val between date '2016-01-01' and date '2017-07-01' 
                    then    1 
                end
            ) as    timestamp_row_count 

           ,count
            (
                case  
                    when    pe.pos = 1 
                        and pe.val between date '2016-01-01' and date '2017-07-01' 
                        and filter_column in ('filterA', 'filterB')
                    then    1 
                end
            ) as    user_count

from        database.table  t
            lateral view posexplode (array(partition_date_column,timestamp_pst)) pe

where       partition_date_column > date '2016-01-01' 

group by    pe.val

我明白了。我在允许多个 select 语句的代码开头缺少 "WITH"。

With src as
(
   select partition_date_column, count(*) as src_row_count
   from database.table
   where partition_date_column > '2016-01-01' 
   group by partition_date_column
)

,
pst as
(
  select timestamp_pst as datevalue, count(*) as timestamp_row_count
  from database.table
  where partition_date_column > '2016-01-01'
  and timestamp_pst between '2016-01-01' and '2017-07-01'
  group by timestamp_pst
),

users as
(
  select timestamp_pst as user_datevalue, count(*) as user_count
  from database.table
  where partition_date_column > '2016-01-01'
  and timestamp_pst between '2016-01-01' and '2017-07-01'
  and filter_column in ('filterA', 'filterB')
  group by timestamp_pst
)

select datevalue as dayval, src_row_count, timestamp_row_count, user_count
from pst
left join src
on datevalue = partition_date_column
left join users
on datevalue = user_datevalue
order by dayval;