运行 多次计数并加入结果
running multiple counts and joining the results
我正在尝试通过计算分区中的行数、计算我每天看到的 'uses' 的数量以及计算我每天看到的值的数量来对我的数据进行一些抽查.
我之前能够使用以下查询的先前版本,但我一定是在没有意识到的情况下更改了某些内容:
src as
(
select partition_date_column, count(*) as src_row_count
from database.table
where partition_date_column > '2016-01-01'
group by partition_date_column
)
,
pst as
(
select timestamp_pst as datevalue, count(*) as timestamp_row_count
from database.table
where partition_date_column > '2016-01-01'
and timestamp_pst between '2016-01-01' and '2017-07-01'
group by timestamp_pst
),
users as
(
select timestamp_pst as user_datevalue, count(*) as user_count
from database.table
where partition_date_column > '2016-01-01'
and timestamp_pst between '2016-01-01' and '2017-07-01'
and filter_column in ('filterA', 'filterB')
group by timestamp_pst
)
select datevalue as dayval, src_row_count, timestamp_row_count, user_count
from pst
left join src
on datevalue = partition_date_column
left join users
on datevalue = user_datevalue
order by dayval;
我不清楚是什么格式错误导致 Hive 无法识别这一点。我还觉得可能有更好的方法来计算这三项,即使其中一项被分组在不同的列中。
select pe.val as dt
,count(case when pe.pos = 0 then 1 end) as src_row_count
,count
(
case
when pe.pos = 1
and pe.val between date '2016-01-01' and date '2017-07-01'
then 1
end
) as timestamp_row_count
,count
(
case
when pe.pos = 1
and pe.val between date '2016-01-01' and date '2017-07-01'
and filter_column in ('filterA', 'filterB')
then 1
end
) as user_count
from database.table t
lateral view posexplode (array(partition_date_column,timestamp_pst)) pe
where partition_date_column > date '2016-01-01'
group by pe.val
我明白了。我在允许多个 select 语句的代码开头缺少 "WITH"。
With src as
(
select partition_date_column, count(*) as src_row_count
from database.table
where partition_date_column > '2016-01-01'
group by partition_date_column
)
,
pst as
(
select timestamp_pst as datevalue, count(*) as timestamp_row_count
from database.table
where partition_date_column > '2016-01-01'
and timestamp_pst between '2016-01-01' and '2017-07-01'
group by timestamp_pst
),
users as
(
select timestamp_pst as user_datevalue, count(*) as user_count
from database.table
where partition_date_column > '2016-01-01'
and timestamp_pst between '2016-01-01' and '2017-07-01'
and filter_column in ('filterA', 'filterB')
group by timestamp_pst
)
select datevalue as dayval, src_row_count, timestamp_row_count, user_count
from pst
left join src
on datevalue = partition_date_column
left join users
on datevalue = user_datevalue
order by dayval;
我正在尝试通过计算分区中的行数、计算我每天看到的 'uses' 的数量以及计算我每天看到的值的数量来对我的数据进行一些抽查.
我之前能够使用以下查询的先前版本,但我一定是在没有意识到的情况下更改了某些内容:
src as
(
select partition_date_column, count(*) as src_row_count
from database.table
where partition_date_column > '2016-01-01'
group by partition_date_column
)
,
pst as
(
select timestamp_pst as datevalue, count(*) as timestamp_row_count
from database.table
where partition_date_column > '2016-01-01'
and timestamp_pst between '2016-01-01' and '2017-07-01'
group by timestamp_pst
),
users as
(
select timestamp_pst as user_datevalue, count(*) as user_count
from database.table
where partition_date_column > '2016-01-01'
and timestamp_pst between '2016-01-01' and '2017-07-01'
and filter_column in ('filterA', 'filterB')
group by timestamp_pst
)
select datevalue as dayval, src_row_count, timestamp_row_count, user_count
from pst
left join src
on datevalue = partition_date_column
left join users
on datevalue = user_datevalue
order by dayval;
我不清楚是什么格式错误导致 Hive 无法识别这一点。我还觉得可能有更好的方法来计算这三项,即使其中一项被分组在不同的列中。
select pe.val as dt
,count(case when pe.pos = 0 then 1 end) as src_row_count
,count
(
case
when pe.pos = 1
and pe.val between date '2016-01-01' and date '2017-07-01'
then 1
end
) as timestamp_row_count
,count
(
case
when pe.pos = 1
and pe.val between date '2016-01-01' and date '2017-07-01'
and filter_column in ('filterA', 'filterB')
then 1
end
) as user_count
from database.table t
lateral view posexplode (array(partition_date_column,timestamp_pst)) pe
where partition_date_column > date '2016-01-01'
group by pe.val
我明白了。我在允许多个 select 语句的代码开头缺少 "WITH"。
With src as
(
select partition_date_column, count(*) as src_row_count
from database.table
where partition_date_column > '2016-01-01'
group by partition_date_column
)
,
pst as
(
select timestamp_pst as datevalue, count(*) as timestamp_row_count
from database.table
where partition_date_column > '2016-01-01'
and timestamp_pst between '2016-01-01' and '2017-07-01'
group by timestamp_pst
),
users as
(
select timestamp_pst as user_datevalue, count(*) as user_count
from database.table
where partition_date_column > '2016-01-01'
and timestamp_pst between '2016-01-01' and '2017-07-01'
and filter_column in ('filterA', 'filterB')
group by timestamp_pst
)
select datevalue as dayval, src_row_count, timestamp_row_count, user_count
from pst
left join src
on datevalue = partition_date_column
left join users
on datevalue = user_datevalue
order by dayval;