如何获取 SQL 服务器 table 中列的相似文本的百分比年龄
How to get % age of similar text of a column in SQL Server table
我在 SQL 服务器 table 中有一个名为 research_area
的列,像这样
digital library
approximation algorithm
real time application
approximation algorithm
applied mathematics
image processing
applied mathematics
evolutionary computation
image processing
image processing
image processing
image annotation
image segmentation
natural language processing
image processing
image segmentation
anomaly detection
image annotation
efficient algorithm
time series analysis
image annotation
image annotation
image processing
routing wireless networks
constrained project scheduling
image annotation
image segmentation
differential equation
image processing
collaborative filtering
image segmentation
image annotation
efficient algorithm
data reduction
image segmentation
image annotation
image processing
applied mathematics
image segmentation
image segmentation
现在我想进行某种处理,以便我能够得到这样的结果,即
image processing 8
image annotation 7
image segmentation 7
applied mathematics 3
approximation algorithm 2
efficient algorithm 2
digital library 1
real time application 1
evolutionary computation 1
natural language processing 1
anomaly detection 1
time series analysis 1
routing wireless networks 1
constrained project scheduling 1
differential equation 1
collaborative filtering 1
data reduction 1
那么现在我如何通过添加列或其他方式来获得它?
这是我试过的:
SELECT
aid, research_area as [Name], COUNT(research_area) as [Count]
FROM
sub_aminer_paper
GROUP BY
research_area
WHERE
aid = 1653869
但是报错:
The text, ntext, and image data types cannot be compared or sorted,
except when using IS NULL or LIKE operator.
您必须 CAST
您的列 varchar
或 nvarchar
才能在其上使用 GROUP BY 子句:
SELECT aid, CAST(research_area as VARCHAR(100)) [research_area], COUNT(research_area) [Count]
FROM sub_aminer_paper
GROUP BY CAST(research_area as VARCHAR(100)), aid
WHERE aid = 1653869
等等,在这里澄清你的问题。您想获得给定列中每个 "IDENTICAL" 值相对于指定列中总行数的权重?
例如,您想知道是否有 100 列,并且其中 8 列具有相同的名称数字 8 应该表示给定行中的 8% 被命名为 "whatever" 等等?
你可以用这个,
select * from (SELECT
research_area,aid,count(*) AS SumOfValues,
(100.0 * (count(*)) / (SUM(count(*)) OVER())) AS percnt
FROM table
GROUP BY research_area,aid) b where aid=1653869;
编辑:
为您提供每个值的计数和百分比。
修改了@Shaharyar 给出的答案后,这是有效的答案
SELECT aid, CAST(research_area as VARCHAR(100)) [research_area], COUNT(research_area) [Count]
FROM sub_aminer_paper
GROUP BY CAST(research_area as VARCHAR(100)), aid
WHERE aid = 1653869
这是所需的输出。
谢谢 Shaharyar
我在 SQL 服务器 table 中有一个名为 research_area
的列,像这样
digital library
approximation algorithm
real time application
approximation algorithm
applied mathematics
image processing
applied mathematics
evolutionary computation
image processing
image processing
image processing
image annotation
image segmentation
natural language processing
image processing
image segmentation
anomaly detection
image annotation
efficient algorithm
time series analysis
image annotation
image annotation
image processing
routing wireless networks
constrained project scheduling
image annotation
image segmentation
differential equation
image processing
collaborative filtering
image segmentation
image annotation
efficient algorithm
data reduction
image segmentation
image annotation
image processing
applied mathematics
image segmentation
image segmentation
现在我想进行某种处理,以便我能够得到这样的结果,即
image processing 8
image annotation 7
image segmentation 7
applied mathematics 3
approximation algorithm 2
efficient algorithm 2
digital library 1
real time application 1
evolutionary computation 1
natural language processing 1
anomaly detection 1
time series analysis 1
routing wireless networks 1
constrained project scheduling 1
differential equation 1
collaborative filtering 1
data reduction 1
那么现在我如何通过添加列或其他方式来获得它?
这是我试过的:
SELECT
aid, research_area as [Name], COUNT(research_area) as [Count]
FROM
sub_aminer_paper
GROUP BY
research_area
WHERE
aid = 1653869
但是报错:
The text, ntext, and image data types cannot be compared or sorted, except when using IS NULL or LIKE operator.
您必须 CAST
您的列 varchar
或 nvarchar
才能在其上使用 GROUP BY 子句:
SELECT aid, CAST(research_area as VARCHAR(100)) [research_area], COUNT(research_area) [Count]
FROM sub_aminer_paper
GROUP BY CAST(research_area as VARCHAR(100)), aid
WHERE aid = 1653869
等等,在这里澄清你的问题。您想获得给定列中每个 "IDENTICAL" 值相对于指定列中总行数的权重?
例如,您想知道是否有 100 列,并且其中 8 列具有相同的名称数字 8 应该表示给定行中的 8% 被命名为 "whatever" 等等?
你可以用这个,
select * from (SELECT
research_area,aid,count(*) AS SumOfValues,
(100.0 * (count(*)) / (SUM(count(*)) OVER())) AS percnt
FROM table
GROUP BY research_area,aid) b where aid=1653869;
编辑: 为您提供每个值的计数和百分比。
修改了@Shaharyar 给出的答案后,这是有效的答案
SELECT aid, CAST(research_area as VARCHAR(100)) [research_area], COUNT(research_area) [Count]
FROM sub_aminer_paper
GROUP BY CAST(research_area as VARCHAR(100)), aid
WHERE aid = 1653869
这是所需的输出。
谢谢 Shaharyar