使用 cassandra-stress 写入 100 MB 的列
Using cassandra-stress to write columns of 100 MB
我想写100MB的分区,使用Cassandra 2.1.17中的压力工具。为了简单起见,首先我只是尝试用一个列编写一个分区。
我的压力 yaml 如下所示:
keyspace: stresscql
keyspace_definition: |
CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
table: insanitytest
table_definition: |
CREATE TABLE insanitytest (
name text,
value blob,
PRIMARY KEY(name)
);
columnspec:
- name: value
size: FIXED(100000000)
insert:
partitions: fixed(1) # number of unique partitions to update in a single operation
# if batchcount > 1, multiple batches will be used but all partitions will
# occur in all batches (unless they finish early); only the row counts will vary
batchtype: LOGGED # type of batch to use
select: fixed(1)/1 # uniform chance any single generated CQL row will be visited in a partition;
# generated for each partition independently, each time we visit it
queries:
simple1:
cql: select * from insanitytest where name = ? LIMIT 100
fields: samerow # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)
我 运行 它是:
./tools/bin/cassandra-stress user profile=~/Software/cassandra/tools/cqlstress-insanity-example.yaml n=1 "ops(insert=1,simple1=0)"
查看我的输出:
Connected to cluster: Test Cluster
Datatacenter: datacenter1; Host: localhost/127.0.0.1; Rack: rack1
Created schema. Sleeping 1s for propagation.
Sleeping 2s...
Running with 4 threadCount
Running [insert, simple1] with 4 threads for 1 iteration
type, total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb
Generating batches with [1..1] partitions and [1..1] rows (of [1..1] total rows in the partitions)
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency LOCAL_ONE (1 replica were required but only 0 acknowledged the write)
insert, 1, 0, 0, 0, 3985.0, 3985.0, 3985.0, 3985.0, 3985.0, 3985.0, 4.0, -0.00000, 0, 1, 34, 34, 0, 219
simple1, 0, NaN, NaN, NaN, NaN, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.00000, 0, 1, 34, 34, 0, 219
total, 1, 0, 0, 0, 3985.0, 3985.0, 3985.0, 3985.0, 3985.0, 3985.0, 4.0, -0.00000, 0, 1, 34, 34, 0, 219
Results:
op rate : 0 [insert:0, simple1:NaN]
partition rate : 0 [insert:0, simple1:NaN]
row rate : 0 [insert:0, simple1:NaN]
latency mean : 3985.0 [insert:3985.0, simple1:NaN]
latency median : 3985.0 [insert:3985.0, simple1:0.0]
latency 95th percentile : 3985.0 [insert:3985.0, simple1:0.0]
latency 99th percentile : 3985.0 [insert:3985.0, simple1:0.0]
latency 99.9th percentile : 3985.0 [insert:3985.0, simple1:0.0]
latency max : 3985.0 [insert:3985.0, simple1:0.0]
Total partitions : 1 [insert:1, simple1:0]
Total errors : 0 [insert:0, simple1:0]
total gc count : 1
total gc mb : 219
total gc time (s) : 0
avg gc time(ms) : 34
stdev gc time(ms) : 0
Total operation time : 00:00:03
然而,查看 'nodetool tpstats' 我有一个成功的突变(所以即使我有超时,突变似乎是成功的):
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 1 0 0
ReadStage 0 0 33 0 0
RequestResponseStage 0 0 0 0 0
ReadRepairStage 0 0 0 0 0
CounterMutationStage 0 0 0 0 0
MiscStage 0 0 0 0 0
HintedHandoff 0 0 0 0 0
GossipStage 0 0 0 0 0
CacheCleanupExecutor 0 0 0 0 0
InternalResponseStage 0 0 0 0 0
CommitLogArchiver 0 0 0 0 0
CompactionExecutor 0 0 30 0 0
ValidationExecutor 0 0 0 0 0
MigrationStage 0 0 3 0 0
AntiEntropyStage 0 0 0 0 0
PendingRangeCalculator 0 0 1 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 13 0 0
MemtablePostFlush 0 0 24 0 0
MemtableReclaimMemory 0 0 13 0 0
Native-Transport-Requests 0 0 170 0 0
Message type Dropped
READ 0
RANGE_SLICE 0
_TRACE 0
MUTATION 0
COUNTER_MUTATION 0
BINARY 0
REQUEST_RESPONSE 0
PAGED_RANGE 0
READ_REPAIR 0
但是如果我做 'nodetool flush' 和 'nodetool status stresscql',这就是我得到的:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.0.1 131.99 KB 256 100.0% 285b13ec-0b9b-4325-9095-c5f5c0f51079 rack1
由于没有交易被丢弃,数据去了哪里?根据我的理解,负载列中的值应该约为 100MB,对吗?
问题不在于压力或数据定义,而在于 commit_log_segment_size_in_mb。它需要至少比数据块大 50%。更多信息在此 answer.
我想写100MB的分区,使用Cassandra 2.1.17中的压力工具。为了简单起见,首先我只是尝试用一个列编写一个分区。 我的压力 yaml 如下所示:
keyspace: stresscql
keyspace_definition: |
CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
table: insanitytest
table_definition: |
CREATE TABLE insanitytest (
name text,
value blob,
PRIMARY KEY(name)
);
columnspec:
- name: value
size: FIXED(100000000)
insert:
partitions: fixed(1) # number of unique partitions to update in a single operation
# if batchcount > 1, multiple batches will be used but all partitions will
# occur in all batches (unless they finish early); only the row counts will vary
batchtype: LOGGED # type of batch to use
select: fixed(1)/1 # uniform chance any single generated CQL row will be visited in a partition;
# generated for each partition independently, each time we visit it
queries:
simple1:
cql: select * from insanitytest where name = ? LIMIT 100
fields: samerow # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)
我 运行 它是:
./tools/bin/cassandra-stress user profile=~/Software/cassandra/tools/cqlstress-insanity-example.yaml n=1 "ops(insert=1,simple1=0)"
查看我的输出:
Connected to cluster: Test Cluster
Datatacenter: datacenter1; Host: localhost/127.0.0.1; Rack: rack1
Created schema. Sleeping 1s for propagation.
Sleeping 2s...
Running with 4 threadCount
Running [insert, simple1] with 4 threads for 1 iteration
type, total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb
Generating batches with [1..1] partitions and [1..1] rows (of [1..1] total rows in the partitions)
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency LOCAL_ONE (1 replica were required but only 0 acknowledged the write)
insert, 1, 0, 0, 0, 3985.0, 3985.0, 3985.0, 3985.0, 3985.0, 3985.0, 4.0, -0.00000, 0, 1, 34, 34, 0, 219
simple1, 0, NaN, NaN, NaN, NaN, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.00000, 0, 1, 34, 34, 0, 219
total, 1, 0, 0, 0, 3985.0, 3985.0, 3985.0, 3985.0, 3985.0, 3985.0, 4.0, -0.00000, 0, 1, 34, 34, 0, 219
Results:
op rate : 0 [insert:0, simple1:NaN]
partition rate : 0 [insert:0, simple1:NaN]
row rate : 0 [insert:0, simple1:NaN]
latency mean : 3985.0 [insert:3985.0, simple1:NaN]
latency median : 3985.0 [insert:3985.0, simple1:0.0]
latency 95th percentile : 3985.0 [insert:3985.0, simple1:0.0]
latency 99th percentile : 3985.0 [insert:3985.0, simple1:0.0]
latency 99.9th percentile : 3985.0 [insert:3985.0, simple1:0.0]
latency max : 3985.0 [insert:3985.0, simple1:0.0]
Total partitions : 1 [insert:1, simple1:0]
Total errors : 0 [insert:0, simple1:0]
total gc count : 1
total gc mb : 219
total gc time (s) : 0
avg gc time(ms) : 34
stdev gc time(ms) : 0
Total operation time : 00:00:03
然而,查看 'nodetool tpstats' 我有一个成功的突变(所以即使我有超时,突变似乎是成功的):
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 1 0 0
ReadStage 0 0 33 0 0
RequestResponseStage 0 0 0 0 0
ReadRepairStage 0 0 0 0 0
CounterMutationStage 0 0 0 0 0
MiscStage 0 0 0 0 0
HintedHandoff 0 0 0 0 0
GossipStage 0 0 0 0 0
CacheCleanupExecutor 0 0 0 0 0
InternalResponseStage 0 0 0 0 0
CommitLogArchiver 0 0 0 0 0
CompactionExecutor 0 0 30 0 0
ValidationExecutor 0 0 0 0 0
MigrationStage 0 0 3 0 0
AntiEntropyStage 0 0 0 0 0
PendingRangeCalculator 0 0 1 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 13 0 0
MemtablePostFlush 0 0 24 0 0
MemtableReclaimMemory 0 0 13 0 0
Native-Transport-Requests 0 0 170 0 0
Message type Dropped
READ 0
RANGE_SLICE 0
_TRACE 0
MUTATION 0
COUNTER_MUTATION 0
BINARY 0
REQUEST_RESPONSE 0
PAGED_RANGE 0
READ_REPAIR 0
但是如果我做 'nodetool flush' 和 'nodetool status stresscql',这就是我得到的:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.0.1 131.99 KB 256 100.0% 285b13ec-0b9b-4325-9095-c5f5c0f51079 rack1
由于没有交易被丢弃,数据去了哪里?根据我的理解,负载列中的值应该约为 100MB,对吗?
问题不在于压力或数据定义,而在于 commit_log_segment_size_in_mb。它需要至少比数据块大 50%。更多信息在此 answer.