Select 按最大日期计算的值
Select Value by Max Date
我在 PostgreSQL 数据库中有一个 table,数据如下:
id customer_id item value timestamp
1 001 price 1000 11/1/2021
2 001 price 1500 11/2/2021
3 001 condition good 11/3/2021
4 002 condition bad 11/4/2021
5 002 condition good 11/5/2021
6 002 price 1000 11/6/2021
7 001 condition good 11/7/2021
8 001 price 1400 11/8/2021
9 002 price 1500 11/9/2021
10 001 condition ok 11/10/2021
11 002 price 1600 11/11/2021
12 002 price 1550 11/12/2021
从这个 table,我想按日期查询最新值并将其转换为 table,如下所示。
customer_id price condition
001 1400 ok
002 1550 good
为了得到这种 table 我尝试了下面的查询,但是当数据太多时它不能很好地工作。 (像Min和Max这样的操作对文本和数字)
我在 pgAdmin 4 中对此进行了测试:
SELECT customer_id,
MAX (Case WHEN item='price' THEN value END) price,
MAX (Case WHEN item='condition' THEN value END) condition
FROM table_name GROUP BY customer_id
我想查询最新日期更新数据的值
SELECT X.CUSTOMER_ID,X.PRICE,X.CONDITION
FROM
(
SELECT A.CUSTOMER_ID,A.PRICE,A.CONDITION,
ROW_NUMBER()OVER(PARTITION BY A.CUSTOMER_ID ORDER BY A.TIMESTAMP DESC)XCOL
FROM YOUR_TABLE A
)X WHERE X.XCOL=1
以上是否适合你,请你试试
您的关系设计可能会得到改进。在同一列中混合不同类型的数据是一种反模式。
虽然坚持使用给定的设置,但使用 DISTINCT ON
和 FULL OUTER JOIN
的两个子查询可以完成工作:
SELECT customer_id, p.value AS price, c.value AS condition
FROM (
SELECT DISTINCT ON (customer_id)
customer_id, value
FROM tbl
WHERE item = 'condition'
ORDER BY customer_id, timestamp DESC
) c
FULL JOIN (
SELECT DISTINCT ON (customer_id)
customer_id, value
FROM tbl
WHERE item = 'price'
ORDER BY customer_id, timestamp DESC
) p USING (customer_id)
db<>fiddle here
参见:
- Select first row in each GROUP BY group?
这假定 timestamp
被定义为 NOT NULL
,或者您需要 NULLS LAST
。
根据未公开的基数和值分布,可能会有(多)更快的查询变体。
如果 customer
table 具有不同的 customer_id
,(多)更快的查询样式成为可能。
这些部分的多列索引在任何情况下都非常适合使其快速:
CREATE INDEX tbl_condition_special_idx ON tbl (customer_id, timestamp DESC, value) WHERE item = 'condition';
CREATE INDEX tbl_price_special_idx ON tbl (customer_id, timestamp DESC, value) WHERE item = 'price';
参见:
- Optimize GROUP BY query to retrieve latest row per user
我在 PostgreSQL 数据库中有一个 table,数据如下:
id customer_id item value timestamp
1 001 price 1000 11/1/2021
2 001 price 1500 11/2/2021
3 001 condition good 11/3/2021
4 002 condition bad 11/4/2021
5 002 condition good 11/5/2021
6 002 price 1000 11/6/2021
7 001 condition good 11/7/2021
8 001 price 1400 11/8/2021
9 002 price 1500 11/9/2021
10 001 condition ok 11/10/2021
11 002 price 1600 11/11/2021
12 002 price 1550 11/12/2021
从这个 table,我想按日期查询最新值并将其转换为 table,如下所示。
customer_id price condition
001 1400 ok
002 1550 good
为了得到这种 table 我尝试了下面的查询,但是当数据太多时它不能很好地工作。 (像Min和Max这样的操作对文本和数字)
我在 pgAdmin 4 中对此进行了测试:
SELECT customer_id,
MAX (Case WHEN item='price' THEN value END) price,
MAX (Case WHEN item='condition' THEN value END) condition
FROM table_name GROUP BY customer_id
我想查询最新日期更新数据的值
SELECT X.CUSTOMER_ID,X.PRICE,X.CONDITION
FROM
(
SELECT A.CUSTOMER_ID,A.PRICE,A.CONDITION,
ROW_NUMBER()OVER(PARTITION BY A.CUSTOMER_ID ORDER BY A.TIMESTAMP DESC)XCOL
FROM YOUR_TABLE A
)X WHERE X.XCOL=1
以上是否适合你,请你试试
您的关系设计可能会得到改进。在同一列中混合不同类型的数据是一种反模式。
虽然坚持使用给定的设置,但使用 DISTINCT ON
和 FULL OUTER JOIN
的两个子查询可以完成工作:
SELECT customer_id, p.value AS price, c.value AS condition
FROM (
SELECT DISTINCT ON (customer_id)
customer_id, value
FROM tbl
WHERE item = 'condition'
ORDER BY customer_id, timestamp DESC
) c
FULL JOIN (
SELECT DISTINCT ON (customer_id)
customer_id, value
FROM tbl
WHERE item = 'price'
ORDER BY customer_id, timestamp DESC
) p USING (customer_id)
db<>fiddle here
参见:
- Select first row in each GROUP BY group?
这假定 timestamp
被定义为 NOT NULL
,或者您需要 NULLS LAST
。
根据未公开的基数和值分布,可能会有(多)更快的查询变体。
如果 customer
table 具有不同的 customer_id
,(多)更快的查询样式成为可能。
这些部分的多列索引在任何情况下都非常适合使其快速:
CREATE INDEX tbl_condition_special_idx ON tbl (customer_id, timestamp DESC, value) WHERE item = 'condition';
CREATE INDEX tbl_price_special_idx ON tbl (customer_id, timestamp DESC, value) WHERE item = 'price';
参见:
- Optimize GROUP BY query to retrieve latest row per user