需要找到列的平均值和重复次数
Need to find average and number of repetitions of column
我有一个 SQL 句子 :
SELECT application.id,title,url,company.name AS company_name,package_name,ranking,date,platform,country.name AS country_name,collection.name AS collection_name,category.name AS category_name FROM application
JOIN application_history ON application_history.application_id = application.id
JOIN company ON application.company_id = company.id
JOIN country ON application_history.country_id = country.id
JOIN collection ON application_history.collection_id = collection.id
JOIN category ON application_history.category_id = category.id
WHERE application.platform=0
AND country.name ='CZ'
AND collection.name='topfreeapplications'
AND category.name='UTILITIES'
AND application_history.ranking <= 10
AND date::date BETWEEN date (CURRENT_DATE - INTERVAL '1 month') AND CURRENT_DATE
ORDER BY application_history.ranking ASC
它产生这个结果:
我想添加给定包裹的列平均排名和出现次数列,这将计算包裹在列表中出现的次数。我还想按 package_name 对结果进行分组,这样我就不会出现冗余。
到目前为止,我已经尝试在 ORDER BY 之前添加一个 GROUP BY By 子句:
GROUP BY package_name
但是 returns 我出错了:
column "application.id" must appear in the GROUP BY clause or be used in an aggregate function
如果我添加它要求我提供的每一列,那是行不通的。
我还尝试通过在 SELECT 之后添加来计算包名称的数量:
COUNT(package_name) AS count
它会产生类似的错误。
我怎样才能得到我想要的结果?我应该改为进行两个查询,还是可以一次获取所有内容?
我准确地说,我看过 S.O 上的其他答案,但其中 none 试图在 "produced" 列上进行计数。
感谢您的帮助。
编辑:
这是我最初预期的结果:
虽然 Gordon 的建议没有给我正确的结果,但它让我走上了正轨,当我读到这篇文章时:
来自 docs : "Unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row."
所以我又开始单独使用 COUNT 和 AVG。我的问题是我想显示排名列和日期以检查事情是否正确。但是将这些列放入 Select 会阻止 GROUP BY 按预期工作,正如 Jarlh 在评论中提到的那样。
工作查询:
SELECT application.id,title,url,company.name AS company_name,package_name,platform,country.name AS country_name,collection.name AS collection_name,category.name AS category_name,
COUNT(package_name) AS count, AVG(application_history.ranking) AS avg
FROM application
JOIN application_history ON application_history.application_id = application.id
JOIN company ON application.company_id = company.id
JOIN country ON application_history.country_id = country.id
JOIN collection ON application_history.collection_id = collection.id
JOIN category ON application_history.category_id = category.id
WHERE application.platform=0
AND country.name ='CZ'
AND collection.name='topfreeapplications'
AND category.name='UTILITIES'
AND application_history.ranking <= 10
AND date::date BETWEEN date (CURRENT_DATE - INTERVAL '1 month') AND CURRENT_DATE
GROUP BY package_name,application.id,company.name,country.name,collection.name,category.name
ORDER BY count DESC
我认为您需要 window/analytic 函数。下面添加两列,一列是每个包裹的行数,另一列是它们的平均排名:
SELECT application.id, title, url, company.name AS company_name, package_name,
ranking, date, platform, country.name AS country_name,
collection.name AS collection_name, category.name AS category_name,
count(*) over (partition by package_name) as count,
avg(ranking) over (partition by package_name) as avg_package_ranking
FROM application . . .
我有一个 SQL 句子 :
SELECT application.id,title,url,company.name AS company_name,package_name,ranking,date,platform,country.name AS country_name,collection.name AS collection_name,category.name AS category_name FROM application
JOIN application_history ON application_history.application_id = application.id
JOIN company ON application.company_id = company.id
JOIN country ON application_history.country_id = country.id
JOIN collection ON application_history.collection_id = collection.id
JOIN category ON application_history.category_id = category.id
WHERE application.platform=0
AND country.name ='CZ'
AND collection.name='topfreeapplications'
AND category.name='UTILITIES'
AND application_history.ranking <= 10
AND date::date BETWEEN date (CURRENT_DATE - INTERVAL '1 month') AND CURRENT_DATE
ORDER BY application_history.ranking ASC
它产生这个结果:
我想添加给定包裹的列平均排名和出现次数列,这将计算包裹在列表中出现的次数。我还想按 package_name 对结果进行分组,这样我就不会出现冗余。
到目前为止,我已经尝试在 ORDER BY 之前添加一个 GROUP BY By 子句:
GROUP BY package_name
但是 returns 我出错了:
column "application.id" must appear in the GROUP BY clause or be used in an aggregate function
如果我添加它要求我提供的每一列,那是行不通的。 我还尝试通过在 SELECT 之后添加来计算包名称的数量:
COUNT(package_name) AS count
它会产生类似的错误。
我怎样才能得到我想要的结果?我应该改为进行两个查询,还是可以一次获取所有内容? 我准确地说,我看过 S.O 上的其他答案,但其中 none 试图在 "produced" 列上进行计数。
感谢您的帮助。
编辑:
这是我最初预期的结果:
虽然 Gordon 的建议没有给我正确的结果,但它让我走上了正轨,当我读到这篇文章时: 来自 docs : "Unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row."
所以我又开始单独使用 COUNT 和 AVG。我的问题是我想显示排名列和日期以检查事情是否正确。但是将这些列放入 Select 会阻止 GROUP BY 按预期工作,正如 Jarlh 在评论中提到的那样。
工作查询:
SELECT application.id,title,url,company.name AS company_name,package_name,platform,country.name AS country_name,collection.name AS collection_name,category.name AS category_name,
COUNT(package_name) AS count, AVG(application_history.ranking) AS avg
FROM application
JOIN application_history ON application_history.application_id = application.id
JOIN company ON application.company_id = company.id
JOIN country ON application_history.country_id = country.id
JOIN collection ON application_history.collection_id = collection.id
JOIN category ON application_history.category_id = category.id
WHERE application.platform=0
AND country.name ='CZ'
AND collection.name='topfreeapplications'
AND category.name='UTILITIES'
AND application_history.ranking <= 10
AND date::date BETWEEN date (CURRENT_DATE - INTERVAL '1 month') AND CURRENT_DATE
GROUP BY package_name,application.id,company.name,country.name,collection.name,category.name
ORDER BY count DESC
我认为您需要 window/analytic 函数。下面添加两列,一列是每个包裹的行数,另一列是它们的平均排名:
SELECT application.id, title, url, company.name AS company_name, package_name,
ranking, date, platform, country.name AS country_name,
collection.name AS collection_name, category.name AS category_name,
count(*) over (partition by package_name) as count,
avg(ranking) over (partition by package_name) as avg_package_ranking
FROM application . . .