SQL 雅典娜中每组最多 n 个语法 + 聚合

SQL syntax greatest-n-per-group + aggregation in athena

到目前为止,我在这上面花了几个小时,我使用的是 aws athena,但没有取得任何进展,我认为我缺少一些东西:

所以我有一个像这样的table

------------------------------------------------------------------
caseid | postcode | streetname | state | dateandtime             
-----------------------------------------------------------------
123123 | 4000     | arthur     | QLD   | 2018-09-30 10:32:51.000 
------------------------------------------------------------------

现在这个 table 将有多个重复的 caseid,我想按日期和时间获取最新的,我发现我可以执行以下操作:

SELECT b.caseid, MAX(b.dateandtime) as dateandtime
FROM  jsonmanual b
GROUP BY b.caseid

这就是我想要的。

现在我需要用日期和时间的 between 语句过滤这些结果,并从这些我无法做到的独特条目中获取 postcode/streetname/state 的计数,下面是我目前的主要猜测,显示两个时间戳之间的邮政编码计数:

SELECT a.postcode, count(a.postcode) as countof
FROM  jsonmanual a
INNER JOIN (
    SELECT distinct b.caseid, MAX(b.dateandtime) as dateandtime, b.postcode
    FROM  jsonmanual b
    GROUP BY b.caseid, b.postcode
) b ON a.caseid = b.caseid and a.postcode = b.postcode
where dateandtime between TIMESTAMP '2016-05-05 09:51:00' and TIMESTAMP '2020-01-10 15:36:00'
group by a.postcode

任何帮助将不胜感激,因为你可能会说我不是一个 SQL 的人,但我的目标是变得更好:-)

SQL小提琴: http://www.sqlfiddle.com/#!9/2f4fbd/1

我的理想输出

--------------------
|postcode | countof |
|-------------------|
|1166     | 1       |
|1231     | 1       |
|2171     | 1       |
|3651     | 1       |
|4469     | 1       |
|4697     | 2       |
--------------------

不确定您要做什么,但我最好的猜测是您应该将“where”子句作为“having”子句移动到内部查询中

group by b.caseid, b.postcode
having max(b.datetime) between ...

amazon-athena 支持 window 函数所以,你可以尝试使用 ROW_NUMBER [window 函数][1] 生成行号 order by dateExact desc 然后得到行号为 1 行。

下一步使用 COUNTgroup by

架构 (MySQL v8.0)

CREATE TABLE cases
    (`country` varchar(3), `vetClinic` varchar(11), `ageMonths` int, `vaxStatus` varchar(11), `patientId` long, `ageWeeks` int, `methodDiag` varchar(8), `dateExact` varchar(19), `vetName` varchar(14), `streetName` varchar(5), `caseNumber` int, `caseId` varchar(36), `dataOrigin` varchar(10), `datePresented` varchar(19), `state` varchar(3), `vaxDate` varchar(19), `cognitoSubNumber` varchar(36), `dateAndTime` varchar(19), `streetNumber` int, `postcode` int, `clinicalSigns` varchar(8), `caseOutCome` varchar(7), `isOpen` varchar(4), `ageYears` int, `species` varchar(8), `suburb` varchar(13), `vaxBrand` varchar(7))
;

INSERT INTO cases
    (`country`, `vetClinic`, `ageMonths`, `vaxStatus`, `patientId`, `ageWeeks`, `methodDiag`, `dateExact`, `vetName`, `streetName`, `caseNumber`, `caseId`, `dataOrigin`, `datePresented`, `state`, `vaxDate`, `cognitoSubNumber`, `dateAndTime`, `streetNumber`, `postcode`, `clinicalSigns`, `caseOutCome`, `isOpen`, `ageYears`, `species`, `suburb`, `vaxBrand`)
VALUES
    ('AUS', 'whoopwhoop', 9, 'vaxinated', 9839815985, 9, 'vomiting', '2019-05-05 09:54:26', 'adam de mamp', 'ann', 3, '2edd7dd0-c49c-11e8-b678-a5dc64edc7ee', 'ParvoAlert', '2019-08-19 06:50:59', 'SA', '2019-04-02 19:52:07', 'c70c64ad-d1d0-40be-86e6-a96de1b8de8b', '2018-09-30 10:32:51', 126, 3651, 'hat', 'alive', 'True', 9, 'pug', 'carindale', 'digimon'),
    ('AUS', 'whoopwhoop', 9, 'vaxinated', 9839815985, 9, 'vomiting', '2019-05-05 09:52:26', 'adam de mamp', 'buts', 3, '2edd7dd0-c49c-11e8-b678-a5dc64edc7ee', 'poops', '2019-08-19 06:50:59', 'SA', '2019-04-02 19:52:07', 'c70c64ad-d1d0-40be-86e6-a96de1b8de8b', '2018-09-30 10:32:51', 126, 3651, 'hat', 'alive', 'True', 9, 'pug', 'carindale', 'digimon'),
    ('AUS', 'whoopwhoop', 9, 'vaxinated', 9839815985, 9, 'rash', '2019-05-05 09:51:26', 'adam de mamp', 'ann', 3, '2ecb7c70-c49c-11e8-b678-a5dc64edc7ee', 'ParvoAlert', '2019-08-19 06:50:59', 'SA', '2019-04-02 19:52:07', 'c70c64ad-d1d0-40be-86e6-a96de1b8de8b', '2018-09-30 10:32:51', 126, 3651, 'hat', 'alive', 'True', 9, 'pug', 'carindale', 'digimon'),
    ('AUS', 'rbh', 9, 'vaxinated', 2114598894, 4, 'blood', '2019-01-10 15:36:29', 'adam de mamp', 'queen', 2, '2ed78a60-c49c-11e8-b678-a5dc64edc7ee', 'ParvoAlert', '2018-09-30 19:28:34', 'WA', '2019-01-19 03:38:28', 'c70c64ad-d1d0-40be-86e6-a96de1b8de8b', '2018-09-30 10:32:51', 39, 1166, 'hat', 'ongoing', 'True', 1, 'pitbull', 'carindale', 'digimon'),
    ('AUS', 'rbh', 9, 'unvaxinated', 9606793080, 46, 'blood', '2018-11-01 16:18:51', 'sumo man', 'annie', 1, '2edabeb0-c49c-11e8-b678-a5dc64edc7ee', 'ParvoAlert', '2018-10-14 16:21:43', 'ACT', '2018-12-10 03:36:49', 'c70c64ad-d1d0-40be-86e6-a96de1b8de8b', '2018-09-30 10:32:51', 59, 1231, 'bad', 'ongoing', 'True', 12, 'aligator', 'fendalton', 'digimon'),
    ('AUS', 'rbh', 12, 'unvaxinated', 2406607356, 47, 'blood', '2018-12-18 05:36:22', 'adam de mamp', 'annie', 3, '2eddf300-c49c-11e8-b678-a5dc64edc7ee', 'ParvoAlert', '2019-05-12 22:21:49', 'TA', '2019-03-15 17:28:35', 'c70c64ad-d1d0-40be-86e6-a96de1b8de8b', '2018-09-30 10:32:51', 180, 2171, 'hat', 'dead', 'True', 7, 'staffy', 'brisbane city', 'digimon'),
    ('AUS', 'examplevet', 2, 'vaxinated', 2449508561, 4, 'rash', '2018-12-07 15:36:05', 'anders holmvic', 'annie', 3, '2ed196f0-c49c-11e8-b678-a5dc64edc7ee', 'ParvoAlert', '2019-04-12 04:31:22', 'WA', '2019-02-13 17:09:51', 'c70c64ad-d1d0-40be-86e6-a96de1b8de8b', '2018-09-30 10:32:51', 10, 4450, 'fateigue', 'alive', 'True', 14, 'aligator', 'spring hill', 'varex'),
    ('AUS', 'rural', 6, 'vaxinated', 3900464429, 33, 'rash', '2019-09-24 15:03:15', 'adam de mamp', 'queen', 2, '2ed47d20-c49c-11e8-b678-a5dc64edc7ee', 'ParvoAlert', '2019-06-02 20:01:12', 'NSW', '2019-02-19 10:10:35', 'c70c64ad-d1d0-40be-86e6-a96de1b8de8b', '2018-09-30 10:32:51', 129, 4697, 'fateigue', 'dead', 'True', 15, 'staffy', 'balanora', 'suplex'),
    ('AUS', 'Vets are us', 9, 'unvaxinated', 8871302949, 1, 'vomiting', '2019-03-29 09:17:00', 'Lucy foxtrot', 'annie', 1, '2edd7dd0-c49c-11e8-b678-a5dc64edc7ee', 'ParvoAlert', '2018-11-21 08:51:38', 'SA', '2019-02-04 06:05:07', 'c70c64ad-d1d0-40be-86e6-a96de1b8de8b', '2018-09-30 10:32:51', 67, 4469, 'hat', 'dead', 'True', 13, 'aligator', 'carindale', 'digimon')
;

查询#1

SELECT postcode ,COUNT(*) FROM (
  SELECT t1.*,ROW_NUMBER() OVER(PARTITION BY caseid ORDER BY dateExact desc) rn
  FROM cases t1
) t1
where rn = 1
group by postcode;

| postcode | COUNT(*) |
| -------- | -------- |
| 3651     | 2        |
| 4450     | 1        |
| 4697     | 1        |
| 1166     | 1        |
| 1231     | 1        |
| 2171     | 1        |

View on DB Fiddle