Cassandra 2.1 插入性能是否取决于受影响的列?

Does Cassandra 2.1 insert performance depends on affected columns?

环境:Cassandra 2.1,DataStax Driver 2.1.9,带 DSE 4.8 的单节点集群

我创建了一个 table:

create table calc_data_test2(
    data_set_id uuid,svod_type text,section text,index_code text,value_type text,data_hash text,c1 text,c2 text,c3 text,c4 text,c5 text,c6 text,c7 text,c8 text,c9 text,c10 text,c11 text,c12 text,c13 text,c14 text,c15 text,c16 text,c17 text,c18 text,c19 text,c20 text,c21 text,c22 text,c23 text,c24 text,c25 text,c26 text,c27 text,c28 text,c29 text,c30 text,c31 text,c32 text,c33 text,c34 text,c35 text,c36 text,c37 text,c38 text,c39 text,c40 text,c41 text,c42 text,c43 text,c44 text,c45 text,c46 text,c47 text,c48 text,c49 text,c50 text,c51 text,c52 text,c53 text,c54 text,c55 text,c56 text,c57 text,c58 text,c59 text,c60 text,c61 text,c62 text,c63 text,c64 text,c65 text,c66 text,c67 text,c68 text,c69 text,c70 text,c71 text,c72 text,c73 text,c74 text,c75 text,c76 text,c77 text,c78 text,c79 text,c80 text,c81 text,c82 text,c83 text,c84 text,c85 text,c86 text,c87 text,c88 text,c89 text,c90 text,c91 text,c92 text,c93 text,c94 text,c95 text,c96 text,c97 text,c98 text,c99 text,c100 text,se1 text,se2 text,data_value double,
    primary key ((data_set_id))
);

然后我对 table 进行了一些异步插入实验。在同一个 table 中有 1000000 个插入,每个案例有 50 个并行请求。受影响列数的差异。这是结果:

详情如下。


插入 85 列:

>java -jar store-utils-cli.jar -pt "insert into csod.calc_data_test2(data_set_id, svod_type,section,index_code,value_type,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20,c21,c22,c23,c24,c25,c26,c27,c28,c29,c30,c31,c32,c33,c34,c35,c36,c37,c38,c39,c40,c41,c42,c43,c44,c45,c46,c47,c48,c49,c50,c51,c52,c53,c54,c55,c56,c57,c58,c59,c60,c61,c62,c63,c64,c65,c66,c67,c68,c69,c70,c71,c72,c73,c74,c75,c76,c77,c78,c79,c80) VALUES(now(), '58','9281','7611','367','7371','8353','4269','134','5884','6794','3147','7639','7798','7890','8547','4212','8630','5962','8686','4482','372','7218','6070','5525','1381','9816','5721','3632','5364','3980','6635','9641','518','6394','2560','1202','5595','7466','1507','7783','9586','6724','9169','9673','7867','8509','6889','3540','5994','4290','1925','8924','4704','4987','803','4291','4987','1111','4934','9885','6441','8212','9349','6852','6628','42','6713','3696','3316','8122','3288','3845','6063','5430','2052','5121','3343','6362','8724','2184','1380','5828','3723','8185');" 1000000 --cassandra.connection.requests.max.local=50
22:56:40,398  INFO ru.croc.rosstat.csod.store.cassandra.connection.CassandraCluster:-1 - Connection to CassandraSettings$Connection(nodes:[csodx01.lab.croc.ru], port:9042, keyspace:csod, requests:CassandraSettings$Connection$Requests(fetchSize:1000, batchSize:2000, consistencyLevel:LOCAL_QUORUM, max:CassandraSettings$Connection$Requests$Max(local:50, remote:20, retry:CassandraSettings$Connection$Requests$Max$Retry(enabled:true, read:10, write:10, unavailable:5)))) established

Entering: Overall process
Entering: Prebuilding of statements
Leaving [1086 ms]: Prebuilding of statements
Entering: Executing statements async
Leaving [143860 ms][6951.202558042542 ops/s]: Executing statements async
Leaving [144954 ms]: Overall process

插入 65 列:

>java -jar store-utils-cli.jar -pt "insert into csod.calc_data_test2(data_set_id, svod_type,section,index_code,value_type,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20,c21,c22,c23,c24,c25,c26,c27,c28,c29,c30,c31,c32,c33,c34,c35,c36,c37,c38,c39,c40,c41,c42,c43,c44,c45,c46,c47,c48,c49,c50,c51,c52,c53,c54,c55,c56,c57,c58,c59,c60) VALUES(now(), '58','9281','7611','367','7371','8353','4269','134','5884','6794','3147','7639','7798','7890','8547','4212','8630','5962','8686','4482','372','7218','6070','5525','1381','9816','5721','3632','5364','3980','6635','9641','518','6394','2560','1202','5595','7466','1507','7783','9586','6724','9169','9673','7867','8509','6889','3540','5994','4290','1925','8924','4704','4987','803','4291','4987','1111','4934','9885','6441','8212','9349','6852');" 1000000 --cassandra.connection.requests.max.local=50
00:28:27,393  INFO ru.croc.rosstat.csod.store.cassandra.connection.CassandraCluster:-1 - Connection to CassandraSettings$Connection(nodes:[csodx01.lab.croc.ru], port:9042, keyspace:csod, requests:CassandraSettings$Connection$Requests(fetchSize:1000, batchSize:2000, consistencyLevel:LOCAL_QUORUM, max:CassandraSettings$Connection$Requests$Max(local:50, remote:20, retry:CassandraSettings$Connection$Requests$Max$Retry(enabled:true, read:10, write:10, unavailable:5)))) established

Entering: Overall process
Entering: Prebuilding of statements
Leaving [847 ms]: Prebuilding of statements
Entering: Executing statements async
Leaving [108564 ms][9211.15655281677 ops/s]: Executing statements async
Leaving [109413 ms]: Overall process

插入 45 列:

>java -jar store-utils-cli.jar -pt "insert into csod.calc_data_test2(data_set_id, svod_type,section,index_code,value_type,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20,c21,c22,c23,c24,c25,c26,c27,c28,c29,c30,c31,c32,c33,c34,c35,c36,c37,c38,c39,c40) VALUES(now(), '58','9281','7611','367','7371','8353','4269','134','5884','6794','3147','7639','7798','7890','8547','4212','8630','5962','8686','4482','372','7218','6070','5525','1381','9816','5721','3632','5364','3980','6635','9641','518','6394','2560','1202','5595','7466','1507','7783','9586','6724','9169','9673');" 1000000 --cassandra.connection.requests.max.local=50
00:33:19,972  INFO ru.croc.rosstat.csod.store.cassandra.connection.CassandraCluster:-1 - Connection to CassandraSettings$Connection(nodes:[csodx01.lab.croc.ru], port:9042, keyspace:csod, requests:CassandraSettings$Connection$Requests(fetchSize:1000, batchSize:2000, consistencyLevel:LOCAL_QUORUM, max:CassandraSettings$Connection$Requests$Max(local:50, remote:20, retry:CassandraSettings$Connection$Requests$Max$Retry(enabled:true, read:10, write:10, unavailable:5)))) established

Entering: Overall process
Entering: Prebuilding of statements
Leaving [845 ms]: Prebuilding of statements
Entering: Executing statements async
Leaving [78213 ms][12785.598302072545 ops/s]: Executing statements async
Leaving [79060 ms]: Overall process

插入 25 列:

>java -jar store-utils-cli-1.2.0-SNAPSHOT.jar -pt "insert into csod.calc_data_test2(data_set_id, svod_type,section,index_code,value_type,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20) VALUES(now(), '58','9281','7611','367','7371','8353','4269','134','5884','6794','3147','7639','7798','7890','8547','4212','8630','5962','8686','4482','372','7218','6070','5525');" 1000000 --cassandra.connection.requests.max.local=50
00:39:29,337  INFO ru.croc.rosstat.csod.store.cassandra.connection.CassandraCluster:-1 - Connection to CassandraSettings$Connection(nodes:[csodx01.lab.croc.ru], port:9042, keyspace:csod, requests:CassandraSettings$Connection$Requests(fetchSize:1000, batchSize:2000, consistencyLevel:LOCAL_QUORUM, max:CassandraSettings$Connection$Requests$Max(local:50, remote:20, retry:CassandraSettings$Connection$Requests$Max$Retry(enabled:true, read:10, write:10, unavailable:5)))) established

Entering: Overall process
Entering: Prebuilding of statements
Leaving [885 ms]: Prebuilding of statements
Entering: Executing statements async
Leaving [68447 ms][14609.844112963314 ops/s]: Executing statements async
Leaving [69339 ms]: Overall process

并插入 5 列:

>java -jar store-utils-cli-1.2.0-SNAPSHOT.jar -pt "insert into csod.calc_data_test2(data_set_id, svod_type,section,index_code,value_type) VALUES(now(), '58','9281','7611','367');" 1000000 --cassandra.connection.requests.max.local=50
00:43:35,293  INFO ru.croc.rosstat.csod.store.cassandra.connection.CassandraCluster:-1 - Connection to CassandraSettings$Connection(nodes:[csodx01.lab.croc.ru], port:9042, keyspace:csod, requests:CassandraSettings$Connection$Requests(fetchSize:1000, batchSize:2000, consistencyLevel:LOCAL_QUORUM, max:CassandraSettings$Connection$Requests$Max(local:50, remote:20, retry:CassandraSettings$Connection$Requests$Max$Retry(enabled:true, read:10, write:10, unavailable:5)))) established

Entering: Overall process
Entering: Prebuilding of statements
Leaving [968 ms]: Prebuilding of statements
Entering: Executing statements async
Leaving [49812 ms][20075.483819160043 ops/s]: Executing statements async
Leaving [50782 ms]: Overall process

插入中受影响的列数对性能的影响真的这么大吗?我还没有找到关于这种依赖的任何信息。可能是我做错了什么?

插入的所有有意义的代码都在这里:

override fun run(args: Array<String?>) {
    if (args.isEmpty() || args.size < 2){
        System.err.println("You should specify a query and a number of iterations: ${args.toList()}")
        return
    }

    val query: String? = args[0]
    val iterationCount: Long = args[1]!!.toLong()

    // get the session
    val session: Session = cassandraCluster.connection().driverSession
    // prepare the query
    val preparedQuery: PreparedStatement = session.prepare(query)

    MeasureTime("Overall process").use {
        // create bound statements 
        val statements = MeasureTime("Prebuild statements").use {
            (1..iterationCount).map { BoundStatement(preparedQuery) }
        }

        // execute async
        MeasureTime("Execute statements async", iterationCount).use {
            val phaser = Phaser(1)
            statements.map { statement ->
                phaser.register()
                session.executeAsync(statement).withCallback({
                    phaser.arriveAndDeregister()
                }, { err ->
                    System.err.println(err)
                    phaser.arriveAndDeregister()
                })
            }
            // block until all tasks are done
            phaser.arriveAndAwaitAdvance()
        }
    }
}

// extension method for convenience
private fun <T> ListenableFuture<T>.withCallback(onSuccessCallback: (T?) -> Unit, onFailureCallback: (Throwable?) -> Unit): ListenableFuture<T> {
    Futures.addCallback(this, object: FutureCallback<T> {
        override fun onSuccess(p0: T?) {
            onSuccessCallback(p0)
        }

        override fun onFailure(p0: Throwable?) {
            onFailureCallback(p0)
        }
    })
    return this
}

class MeasureTime(val message: String, val operationCount: Long? = null): Closeable {
    private val startTime: Long

    init {
        startTime = System.nanoTime()
        System.out.println("Entering: $message")
    }

    override fun close() {
        val endTime = System.nanoTime()
        val elapsed = (endTime - startTime)/1000000
        val opStats = if (operationCount != null) {
            val f = operationCount/elapsed.toDouble()*1000
            "[$f ops/s]"
        } else ""
        val message = "Leaving [$elapsed ms]$opStats: $message"
        System.out.println(message)
    }
}

我相信 Java-man 理解 kotlin 代码中发生的事情不是问题。

如果您在第二个 table 中插入的数据多于第一个(c1 到 c100 + 其他几列),那么插入速度较慢是正常的。

现在,即使您在两个 table 中插入相同数量的数据(根据字节数),在第二个 table 中插入仍然会慢一点,因为的:

  1. 元数据的开销。您有更多列,因此需要在内存中创建更多对象来存储它们

  2. CPU 序列化大量列而不是少数列的消耗

  3. 可能我忘了其他参数