SQL 服务器按列对查询

Question

我正在研究像亚马逊这样的产品过滤器（多面搜索）。我有一个 table 具有如下属性（颜色、内存、屏幕）：

ArticleID  PropertyID  Value
---------  ----------  ------------
1          1           Black
1          2           8 GB
1          3           15"
2          1           White
2          2           8 GB
3          3           13"

我必须 select 文章，具体取决于 select 编辑的属性。您可以 select 一个属性的多个值（例如 RAM：4 GB 和 8 GB）并且您可以 select 多个属性（例如 RAM 和屏幕尺寸）。

我需要这样的功能：

SELECT ArticleID
FROM ArticlesProperties
WHERE (PropertyID = 2 AND Value IN ('4 GB', '8 GB'))
  AND (PropertyID = 3 AND Value IN ('13"'))

我曾经通过创建动态查询然后执行该查询来做到这一点：

SELECT ArticleID
FROM ArticlesProperties
WHERE PropertyID = 2 AND Value IN ('4 GB', '8 GB')

INTERSECT

SELECT ArticleID
FROM ArticlesProperties
WHERE PropertyID = 3 AND Value IN ('13"')

但我认为这不是什么好方法，必须有更好的解决方案。 table中有数百万个属性，因此需要优化。

解决方案应该适用于 SQL Server 2014 Standard Edition，无需某些附加组件或搜索引擎，如 solr 等

我很困惑，所以如果有人有什么想法或解决方案，我将不胜感激。谢谢！

Answer 1

intersect 可能效果很好。

另一种方法是构造一个 where 子句并使用聚合和 having:

SELECT ArticleID
FROM ArticlesProperties
WHERE ( PropertyID = 2 AND Value IN ('4 GB', '8 GB') ) OR
      ( PropertyID = 3 AND Value IN ('13"') )
GROUP BY ArticleId
HAVING COUNT(DISTINCT PropertyId) = 2;

但是，INTERSECT 方法可能会更好地利用 ArticlesProperties(PropertyId, Value) 上的索引，因此请先尝试该方法，看看替代方案必须具备什么样的性能。

Answer 2

XML参数

您的程序采用 XML 参数 @criteria XML 一些我用来调试的东西：删除 table #properties 下降 table #criteria

create table #properties (propertyId int)
insert into #properties values (1), (2) --presuming that you have a list of all the possible properties somewhere

-- This would be passed in by the application
declare @criteria XML = '<criteria>
<property id="1">
    <item value="8 GB" />
    <item value="4 GB" />
</property>
<property id="2">
    <item value="13 in" /> 
    <item value="4 in" />
</property>
</criteria>'

--encode the '"' and replace 'in' as needed

您需要的代码从这里开始：

create table #criteria 
(propertyId int, searchvalue nvarchar(20))


insert into #criteria (propertyId, searchvalue)
select  
    cc.propertyId,
    c.value('@value','nvarchar(20)')  
from #properties cc
cross apply @criteria.nodes(N'/criteria/property[@id=sql:column("PropertyID")]/item') t(c)

SELECT ArticleID, count(1)
FROM ArticlesProperties ap
join #criteria cc on  cc.propertyId = ap.propertyId and cc.searchvalue = ap.value
group by ArticleID 
having count(1) = (select count(distinct propertyid from #criteria))

Answer 3

我假设 (ArticleID, PropertyID) 是一个键。

这看起来像实体属性值 (EAV) table 或 "open schema" 设计，因此基本上没有 good 查询任何内容的方法.您甚至可以考虑设置动态 PIVOT，但这相当复杂。

一种方法是 EXISTS 表达式：

SELECT DISTINCT ArticleID
FROM ArticlesProperties ap
WHERE EXISTS (SELECT 1 FROM ArticlesProperties 
        WHERE ArticleID = ap.ArticleID AND PropertyID = 2 AND Value IN ('4 GB', '8 GB'))
    AND (SELECT 1 FROM ArticlesProperties 
        WHERE ArticleID = ap.ArticleID AND PropertyID = 3 AND Value IN ('13"'));

或者您可以尝试 OR 结合 COUNT() 和 HAVING:

SELECT ArticleID
FROM ArticlesProperties
WHERE (PropertyID = 2 AND Value IN ('4 GB', '8 GB'))
    OR (PropertyID = 3 AND Value IN ('13"'))
GROUP BY ArticleID
HAVING COUNT(PropertyID) = 2;

Answer 4

我制作了一个片段，展示了我的工作路线。选择好的索引对于加快查询速度很重要。始终检查调整索引的执行计划。

备注：

该脚本使用临时 tables，但本质上它们与常规 tables 没有区别。除了#select_properties，如果您打算使用脚本中概述的工作方式，临时tables 应该变成常规tables。
使用属性选择值的 ID 存储文章属性，而不是实际的选择值。当这些 table 由 SQL 服务器缓存时，这可以节省磁盘 space 和内存。 SQL 服务器将尽可能多地在内存中缓存 table 以更快地为 select 语句提供服务。

如果文章属性 table 太大，SQL 服务器可能必须执行磁盘 IO 才能执行 select 语句，这肯定会减慢语句速度.

额外的好处是，对于查找，您正在查找 ID（整数）而不是文本（VARCHAR）。查找整数比查找字符串快很多。
在 tables 上提供 suitable 索引以加快查询速度。为此，通过检查 Actual Execution Plan 来分析查询是一个很好的做法。

我在下面的代码片段中包含了几个这样的索引。根据文章属性中的行数 table 和统计信息，SQL 服务器将选择最佳索引来加速查询。

如果 SQL 服务器认为查询缺少 SQL 语句的正确索引，实际执行计划将指示您缺少索引。最好的做法是，当您的查询变慢时，通过检查 SQL Server Management Studio 中的实际执行计划来分析这些查询。
该代码段使用临时 table 来指定您要查找的属性：#select_properties。通过插入属性 ID 和属性选择值 ID 来提供 table 中的条件。最终 selection 查询 selects 篇文章，其中至少有一个属性选择值适用于每个属性.

您可以在要 select 文章的会话中创建此临时 table。然后插入搜索条件，触发 select 语句，最后删除临时 table。

CREATE TABLE #articles(
    article_id INT NOT NULL,
    article_desc VARCHAR(128) NOT NULL,
    CONSTRAINT PK_articles PRIMARY KEY CLUSTERED(article_id)
);

CREATE TABLE #properties(
    property_id INT NOT NULL, -- color, size, capacity
    property_desc VARCHAR(128) NOT NULL,
    CONSTRAINT PK_properties PRIMARY KEY CLUSTERED(property_id)
);

CREATE TABLE #property_values(
    property_id INT NOT NULL,
    property_choice_id INT NOT NULL, -- eg color -> black, white, red
    property_choice_val VARCHAR(128) NOT NULL,
    CONSTRAINT PK_property_values PRIMARY KEY CLUSTERED(property_id,property_choice_id),
    CONSTRAINT FK_values_to_properties FOREIGN KEY (property_id) REFERENCES #properties(property_id)
);

CREATE TABLE #article_properties(
    article_id INT NOT NULL,
    property_id INT NOT NULL,
    property_choice_id INT NOT NULL
    CONSTRAINT PK_article_properties PRIMARY KEY CLUSTERED(article_id,property_id,property_choice_id),
    CONSTRAINT FK_ap_to_articles FOREIGN KEY (article_id) REFERENCES #articles(article_id),
    CONSTRAINT FK_ap_to_property_values FOREIGN KEY (property_id,property_choice_id) REFERENCES #property_values(property_id,property_choice_id)

);
CREATE NONCLUSTERED INDEX IX_article_properties ON #article_properties(property_id,property_choice_id) INCLUDE(article_id);

INSERT INTO #properties(property_id,property_desc)VALUES
    (1,'color'),(2,'capacity'),(3,'size');

INSERT INTO #property_values(property_id,property_choice_id,property_choice_val)VALUES
    (1,1,'black'),(1,2,'white'),(1,3,'red'),
    (2,1,'4 Gb') ,(2,2,'8 Gb') ,(2,3,'16 Gb'),
    (3,1,'13"')  ,(3,2,'15"')  ,(3,3,'17"');

INSERT INTO #articles(article_id,article_desc)VALUES
    (1,'First article'),(2,'Second article'),(3,'Third article');

-- the table you have in your question, slightly modified
INSERT INTO #article_properties(article_id,property_id,property_choice_id)VALUES 
    (1,1,1),(1,2,2),(1,3,2), -- article 1: color=black, capacity=8gb, size=15"
    (2,1,2),(2,2,2),(2,3,1), -- article 2: color=white, capacity=8Gb, size=13"
    (3,1,3),        (3,3,3); -- article 3: color=red, size=17"

-- The table with the criteria you are selecting on
CREATE TABLE #select_properties(
    property_id INT NOT NULL,
    property_choice_id INT NOT NULL,
    CONSTRAINT PK_select_properties PRIMARY KEY CLUSTERED(property_id,property_choice_id)
);
INSERT INTO #select_properties(property_id,property_choice_id)VALUES
    (2,1),(2,2),(3,1); -- looking for '4Gb' or '8Gb', and size 13"

;WITH aid AS (  
    SELECT ap.article_id
    FROM #select_properties AS sp
         INNER JOIN #article_properties AS ap ON
            ap.property_id=sp.property_id AND
            ap.property_choice_id=sp.property_choice_id
    GROUP BY ap.article_id
    HAVING COUNT(DISTINCT ap.property_id)=(SELECT COUNT(DISTINCT property_id) FROM #select_properties)
    -- criteria met when article has a number of properties matching, equal to the distinct number of properties in the selection set
)
SELECT a.article_id,a.article_desc
FROM aid 
     INNER JOIN #articles AS a ON 
         a.article_id=aid.article_id
ORDER BY a.article_id;
-- result is the 'Second article' with id 2

DROP TABLE #select_properties;
DROP TABLE #article_properties;
DROP TABLE #property_values;
DROP TABLE #properties;
DROP TABLE #articles;

SQL 服务器按列对查询

SQL Server query by column pair

sql

sql-server

faceted-search