列存储索引如何知道一列中的哪些数据连接到其他列中的数据？

Question

我是列存储索引的新手。列存储数据的新的不同结构提出了一个问题。我们如何知道某一列 1（第 1 页）中的哪些数据连接到其他 column2 (page2)。

例如，如果我们有以下使用传统行存储的 table 表示：

row1  1   2  3 -- page1
row2  4   5  6 -- page2

对于列存储索引：

col1  col2  col3
1      2     3
4      5     6

我们如何使用列存储索引知道哪些数据与谁相关联？

Answer 1

您并没有完全摆脱列与行之间的关系。简化的区别是 table 的存储方式。传统存储以 row-wise 的方式进行物理存储，而列存储则以 column-wise 的方式存储。此处提供的文档 link 包含更多我不想复制和粘贴的信息。

来自docs：

Key terms and concepts These are key terms and concepts are associated with columnstore indexes.

columnstore A columnstore is data that is logically organized as a table with rows and columns, and physically stored in a column-wise data format.

rowstore A rowstore is data that is logically organized as a table with rows and columns, and then physically stored in a row-wise data format. This has been the traditional way to store relational table data. In SQL Server, rowstore refers to table where the underlying data storage format is a heap, a clustered index, or a memory-optimized table.

下面是一个如何在 TSQL 中保留关系的示例。运行这是针对具有 CS 索引的 table（免责声明：我不是 CS 索引专家）：

SELECT o.name AS table_,
i.name AS index_, 
i.type_desc AS index_type, 
p.partition_number, 
rg.row_group_id, 
cs.column_id, 
c.name AS column_


FROM sys.objects o

INNER JOIN sys.indexes i 
    ON i.object_id = o.object_id

INNER JOIN sys.partitions p
    ON p.object_id = o.object_id
    AND i.index_id = p.index_id

INNER JOIN sys.column_store_row_groups rg
    ON rg.object_id = o.object_id 
    AND i.index_id = rg.index_id

INNER JOIN sys.column_store_segments cs
    ON cs.partition_id = p.partition_id

INNER JOIN sys.columns c
    ON c.object_id = o.object_id
    AND c.column_id = cs.column_id


WHERE o.object_id = OBJECT_ID(your_table_name)

Answer 2

没有明确的联系，就像 row-based table 中的列值之间没有明确的联系一样。即便如此，我们总能通过简单的枚举从一个转到另一个。

想象一下以 row-based 方式读取列组（第一个值 col1、第一个值 col2、第一个值 col3），这就是你的排。当相同的列值被压缩到范围内时，想象一下它们带有数字，告诉您它们出现了多少次——您仍然可以通过简单的计数以这种方式读取行，即使该过程效率低下。请求任何特定行 (SELECT * FROM T WHERE Column = uniquevalue) 需要在列存储中搜索该值，这非常快，然后使用它的位置在所有其他列组中查找所有其他值以取回一行，这通常不是，因为在最坏的情况下我们需要通读所有范围内的所有值。（当然，传统的 B-tree 索引可以帮助解决这个问题，这就是为什么您会使用它们进行行查找。）

列存储索引如何知道一列中的哪些数据连接到其他列中的数据？

How columnstore index knows which data from one column are connected to data from other columns?

sql-server

indexing

columnstore