如何获取 SQL 服务器 table 中列的相似文本的百分比年龄

How to get % age of similar text of a column in SQL Server table

我在 SQL 服务器 table 中有一个名为 research_area 的列,像这样

digital library
approximation algorithm
real time application
approximation algorithm
applied mathematics
image processing
applied mathematics
evolutionary computation
image processing
image processing
image processing
image annotation
image segmentation
natural language processing
image processing
image segmentation
anomaly detection
image annotation
efficient algorithm
time series analysis
image annotation
image annotation
image processing
routing wireless networks
constrained project scheduling
image annotation
image segmentation
differential equation
image processing
collaborative filtering
image segmentation
image annotation
efficient algorithm
data reduction
image segmentation
image annotation
image processing
applied mathematics
image segmentation
image segmentation

现在我想进行某种处理,以便我能够得到这样的结果,即

image processing    8
image annotation    7
image segmentation  7
applied mathematics 3
approximation algorithm 2
efficient algorithm 2
digital library 1
real time application   1
evolutionary computation    1
natural language processing 1
anomaly detection   1
time series analysis    1
routing wireless networks   1
constrained project scheduling  1
differential equation   1
collaborative filtering 1
data reduction  1

那么现在我如何通过添加列或其他方式来获得它?

这是我试过的:

SELECT 
    aid, research_area as [Name], COUNT(research_area) as [Count] 
FROM
    sub_aminer_paper 
GROUP BY 
    research_area 
WHERE
    aid = 1653869

但是报错:

The text, ntext, and image data types cannot be compared or sorted, except when using IS NULL or LIKE operator.

您必须 CAST 您的列 varcharnvarchar 才能在其上使用 GROUP BY 子句:

SELECT aid, CAST(research_area as VARCHAR(100)) [research_area], COUNT(research_area) [Count] 
FROM sub_aminer_paper 
GROUP BY  CAST(research_area as VARCHAR(100)), aid
WHERE aid = 1653869

SQL Server Error Messages - Msg 306

等等,在这里澄清你的问题。您想获得给定列中每个 "IDENTICAL" 值相对于指定列中总行数的权重?

例如,您想知道是否有 100 列,并且其中 8 列具有相同的名称数字 8 应该表示给定行中的 8% 被命名为 "whatever" 等等?

你可以用这个,

select * from (SELECT   
research_area,aid,count(*) AS SumOfValues,
(100.0 * (count(*)) / (SUM(count(*)) OVER())) AS percnt
FROM    table
GROUP BY research_area,aid) b where aid=1653869;

编辑: 为您提供每个值的计数和百分比。

修改了@Shaharyar 给出的答案后,这是有效的答案

SELECT aid, CAST(research_area as VARCHAR(100)) [research_area], COUNT(research_area) [Count] 
FROM sub_aminer_paper 
GROUP BY  CAST(research_area as VARCHAR(100)), aid
WHERE aid = 1653869  

这是所需的输出。
谢谢 Shaharyar