在客户端验证行比使用整个主键的二级索引更好?

Validating row at client side better than secondary index with whole primary key?

众所周知,在 cassandra 中,应该非常谨慎地使用二级索引。

如果我有 table 例如:

User(username, usertype, email, etc..)

这里的用户名是分区键。现在我想支持当且仅当用户类型是特定值 X.returns 特定用户(将给出用户名)的操作。

我有两种方法可以做到:

一个: 在用户类型上创建二级索引,可能的值('A'、'B'、'C') 用户名是分区键。

SELECT * FROM user WHERE username='something' AND usertype='A';

两个:

我可以只将带有用户名的行提取到客户端,然后检查用户类型是否为 A。

哪种方法更好?还请考虑一个宽行(不是很大,10 秒)的场景,其中并非分区的所有行都可能具有给定值(这需要一些客户端过滤)。

我不清楚二级索引是如何在特定节点中查找数据的。

例如:SELECT * FROM user WHERE username='something' AND usertype='A'

例如usertype hidden CF有数据'A'-> 'jhon', 'miller', 'chris',...等, 100个用户名

并且带有分区键的查询与用户类型一起给出,它是扫描所有这 100 个用户名以与用户名 'something' 匹配,还是只是先按用户名获取,如果匹配则查看用户类型列'A'? 它究竟是如何进行搜索的?给定索引在低基数数据上并且每个索引都映射到许多行,查询票价如何?

如果重要的话,我正在使用 java 作为客户端。

更新: 我知道我可以为这个特定的例子使用集群(用户类型)键,但我想知道我问过的权衡。我原来的table要复杂得多。

这里最好的选择是创建一个由用户名和用户类型组成的复合主键,用户名是分区键,用户类型是集群键。您甚至不需要索引,查询也能正常工作。

CREATE TABLE users (
  username text,
  usertype text,
   ....
  PRIMARY KEY ((username), usertype)
)

对于这个例子,假设我创建了一个 table 来按船和 ID 跟踪船员:

CREATE TABLE crewByShip (
  ship text,
  id int,
  firstname text,
  lastname text,
  gender text,
  PRIMARY KEY(ship,id));

我会在 gender:

上创建一个索引
CREATE INDEX crewByShipG_idx ON crewByShip(gender);

插入一些数据后,我的 table 看起来像这样:

 ship     | id | firstname | gender | lastname
----------+----+-----------+--------+-----------
 Serenity |  1 |     Hoban |      M | Washburne
 Serenity |  2 |      Zoey |      F | Washburne
 Serenity |  3 |   Malcolm |      M |  Reynolds
 Serenity |  4 |    Kaylee |      F |      Frye
 Serenity |  5 |  Sheppard |      M |      Book
 Serenity |  6 |     Jayne |      M |      Cobb
 Serenity |  7 |     Simon |      M |       Tam
 Serenity |  8 |     River |      F |       Tam
 Serenity |  9 |     Inara |      F |     Serra

现在我将打开跟踪,并使用 PRIMARY KEY 查询不同的行,但也受我们在 gender 上的索引限制。

aploetz@cqlsh:Whosebug2> tracing on;
aploetz@cqlsh:Whosebug2> SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3 AND gender='M';

 ship     | id | firstname | gender | lastname
----------+----+-----------+--------+----------
 Serenity |  3 |   Malcolm |      M | Reynolds

(1 rows)

Tracing session: 34ea1840-e8e1-11e4-9cb7-21b264d4c94d

 activity                                                                                                                                                                                                                                                                                                       | timestamp                  | source         | source_elapsed
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
                                                                                                                                                                                                                                                                                             Execute CQL3 query | 2015-04-22 06:17:48.102000 | 192.168.23.129 |              0
                                                                                                                                                                                                          Parsing SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3 AND gender='M'; [SharedPool-Worker-1] | 2015-04-22 06:17:48.114000 | 192.168.23.129 |           3715
                                                                                                                                                                                                                                                                      Preparing statement [SharedPool-Worker-1] | 2015-04-22 06:17:48.116000 | 192.168.23.129 |           4846
                                                                                                                                                                                                                                                Executing single-partition query on users [SharedPool-Worker-2] | 2015-04-22 06:17:48.118000 | 192.168.23.129 |           5730
                                                                                                                                                                                                                                                             Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.118000 | 192.168.23.129 |           5757
                                                                                                                                                                                                                                                              Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.119000 | 192.168.23.129 |           5793
                                                                                                                                                                                                                                                              Key cache hit for sstable 1 [SharedPool-Worker-2] | 2015-04-22 06:17:48.119000 | 192.168.23.129 |           5848
                                                                                                                                                                                                                                              Seeking to partition beginning in data file [SharedPool-Worker-2] | 2015-04-22 06:17:48.120000 | 192.168.23.129 |           5856
                                                                                                                                                                                                                Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.120000 | 192.168.23.129 |           7056
                                                                                                                                                                                                                                               Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.121000 | 192.168.23.129 |           7080
                                                                                                                                                                                                                                                       Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7143
                                                                                                                                                                                                                                                                Computing ranges to query [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7578
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7742
                                                                                                                                                                                              Submitting range requests on 1 ranges with a concurrency of 1 (0.0 rows per range expected) [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7807
                                                                                                                                                                                                                                  Submitted 1 concurrent range requests covering 1 ranges [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7851
                                                                                                                                                                                                                                          Executing indexed scan for [Serenity, Serenity] [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          10848
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          10936
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          11007
                                                                                                                                                                                                                           Executing single-partition query on crewbyship.crewbyshipg_idx [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          11130
                                                                                                                                                                                                                                                             Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          11139
                                                                                                                                                                                                                                                              Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 |          11155
                                                                                                                                                                                                                Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 |          11253
                                                                                                                                                                                                                                               Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 |          11262
                                                                                                                                                                                                                                                       Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.127000 | 192.168.23.129 |          11281
                                                                                                                                                                                                                                           Executing single-partition query on crewbyship [SharedPool-Worker-2] | 2015-04-22 06:17:48.130000 | 192.168.23.129 |          11369
                                                                                                                                                                                                                                                             Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.131000 | 192.168.23.129 |          11375
                                                                                                                                                                                                                                                              Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.131000 | 192.168.23.129 |          11383
                                                                                                                                                                                                                Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.133000 | 192.168.23.129 |          11409
                                                                                                                                                                                                                                               Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.134000 | 192.168.23.129 |          11415
                                                                                                                                                                                                                                                       Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.138000 | 192.168.23.129 |          11430
                                                                                                                                                                                                                                                             Scanned 1 rows and matched 1 [SharedPool-Worker-2] | 2015-04-22 06:17:48.138000 | 192.168.23.129 |          11490
                                                                                                                                                                                                                                                                                               Request complete | 2015-04-22 06:17:48.115679 | 192.168.23.129 |          13679

现在,我将重新运行 相同的查询,但没有 gender 上的多余索引。

aploetz@cqlsh:Whosebug2> SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3;

 ship     | id | firstname | gender | lastname
----------+----+-----------+--------+----------
 Serenity |  3 |   Malcolm |      M | Reynolds

(1 rows)

Tracing session: 38d7f440-e8e1-11e4-9cb7-21b264d4c94d

 activity                                                                                        | timestamp                  | source         | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
                                                                              Execute CQL3 query | 2015-04-22 06:17:54.692000 | 192.168.23.129 |              0
          Parsing SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3; [SharedPool-Worker-1] | 2015-04-22 06:17:54.695000 | 192.168.23.129 |             87
                                                       Preparing statement [SharedPool-Worker-1] | 2015-04-22 06:17:54.696000 | 192.168.23.129 |            246
                                 Executing single-partition query on users [SharedPool-Worker-3] | 2015-04-22 06:17:54.697000 | 192.168.23.129 |           1185
                                              Acquiring sstable references [SharedPool-Worker-3] | 2015-04-22 06:17:54.698000 | 192.168.23.129 |           1197
                                               Merging memtable tombstones [SharedPool-Worker-3] | 2015-04-22 06:17:54.698000 | 192.168.23.129 |           1215
                                               Key cache hit for sstable 1 [SharedPool-Worker-3] | 2015-04-22 06:17:54.700000 | 192.168.23.129 |           1249
                               Seeking to partition beginning in data file [SharedPool-Worker-3] | 2015-04-22 06:17:54.700000 | 192.168.23.129 |           1278
 Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-04-22 06:17:54.701000 | 192.168.23.129 |           3309
                                Merging data from memtables and 1 sstables [SharedPool-Worker-3] | 2015-04-22 06:17:54.701000 | 192.168.23.129 |           3333
                                        Read 1 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-04-22 06:17:54.702000 | 192.168.23.129 |           3368
                            Executing single-partition query on crewbyship [SharedPool-Worker-2] | 2015-04-22 06:17:54.702000 | 192.168.23.129 |           4607
                                              Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:54.704000 | 192.168.23.129 |           4633
                                               Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:54.704000 | 192.168.23.129 |           4643
 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:54.705000 | 192.168.23.129 |           4678
                                Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:54.705000 | 192.168.23.129 |           4683
                                        Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:54.706000 | 192.168.23.129 |           4697
                                                                                Request complete | 2015-04-22 06:17:54.697676 | 192.168.23.129 |           5676

如您所见,使用二级索引的查询的 "source_elapsed" 是不使用索引的同一查询(返回同一行)的两倍多。

我想我们可以肯定地说,在宽行 table 的低基数列上使用二级索引 不会 表现良好。现在虽然我不会说过滤客户端是个好主意,在这种情况下,如果结果集较小,它可能是更好的选择。