在 Cassandra 中从 CSV 导入时,table 中没有插入行
No rows inserted in table when import from CSV in Cassandra
我正在尝试将 CSV 文件导入 Cassandra table,但是我遇到了问题。
插入成功后,至少Cassandra是这么说的,我还是看不到任何记录。这里有更多细节:
qlsh:recommendation_engine> COPY row_historical_game_outcome_data FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|';
2 rows imported in 0.216 seconds.
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data;
customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------
(0 rows)
cqlsh:recommendation_engine>
这就是我的数据的样子
'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|123123|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0|
'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|456456|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0|
这是 cassandra 版本 apache-cassandra-2.2.0
已编辑:
CREATE TABLE row_historical_game_outcome_data (
customer_id int,
game_id int,
time timestamp,
channel text,
currency_code text,
game_code text,
game_name text,
game_type text,
game_vendor text,
progressive_winnings double,
stake_amount double,
win_amount double,
PRIMARY KEY ((customer_id, game_id, time))
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
我还按照 uri2x
的建议尝试了以下方法
但还是一无所获:
select * from row_historical_game_outcome_data;
customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------
(0 rows)
cqlsh:recommendation_engine> COPY row_historical_game_outcome_data ("game_vendor","game_id","game_code","game_name","game_type","channel","customer_id","stake_amount","win_amount","currency_code","time","progressive_winnings") FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|';
2 rows imported in 0.192 seconds.
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data;
customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------
(0 rows)
在您的 CSV 文件中有两件事困扰着 cqlsh:
- 去掉尾部 |在每个 CSV 行的末尾
- 从您的时间值中删除微秒(精度最多应为毫秒)。
好的,我必须更改一些有关您的数据文件的内容才能使这项工作正常进行:
SomeName|673|SomeName|SomeName|TYPE|M|123123|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0
SomeName|673|SomeName|SomeName|TYPE|M|456456|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0
- 移除尾管。
- 将时间缩短为秒。
- 删除了所有单引号。
一旦我这样做了,然后我执行了:
aploetz@cqlsh:Whosebug> COPY row_historical_game_outcome_data
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount,
win_amount,currency_code , time , progressive_winnings)
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|';
Improper COPY command.
这个有点棘手。我终于想通了 COPY
不喜欢列名 time
。我将 table 调整为使用名称 game_time
,而 re-ran COPY
:
aploetz@cqlsh:Whosebug> DROP TABLE row_historical_game_outcome_data ;
aploetz@cqlsh:Whosebug> CREATE TABLE row_historical_game_outcome_data (
... customer_id int,
... game_id int,
... game_time timestamp,
... channel text,
... currency_code text,
... game_code text,
... game_name text,
... game_type text,
... game_vendor text,
... progressive_winnings double,
... stake_amount double,
... win_amount double,
... PRIMARY KEY ((customer_id, game_id, game_time))
... );
aploetz@cqlsh:Whosebug> COPY row_historical_game_outcome_data
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount,
win_amount,currency_code , game_time , progressive_winnings)
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|';
3 rows imported in 0.738 seconds.
aploetz@cqlsh:Whosebug> SELECT * FROM row_historical_game_outcome_data ;
customer_id | game_id | game_time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+--------------------------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------
123123 | 673 | 2015-07-01 00:01:42-0500 | M | GBP | SomeName | SomeName | TYPE | SomeName | 0 | 0.2 | 0
456456 | 673 | 2015-07-01 00:01:42-0500 | M | GBP | SomeName | SomeName | TYPE | SomeName | 0 | 0.2 | 0
(2 rows)
- 我不确定为什么它说“导入了 3 行”,所以我猜它正在计算 header 行。
- 您的键都是分区键。不确定你是否真的理解了。我只是指出来,因为我想不出指定多个分区键 而不 也指定集群键的原因。
- 我在 DataStax 文档中找不到任何表明 "time" 是保留字的内容。这可能是 cqlsh 中的错误。但说真的,无论如何,您可能应该将 time-based 列名称指定为 "time" 以外的名称。
另一条评论。 CQL 中的 COPY 添加了 WITH HEADER = TRUE ,这将导致 CSV 文件的 header 行(第一行)被忽略。
(http://docs.datastax.com/en/cql/3.3/cql/cql_reference/copy_r.html)
"time" 不是 CQL 中的保留字(相信我,因为我自己刚刚更新了 DataStax 文档中的 CQL 保留字)。但是,您确实在列名 "time" 周围的 COPY 命令中的列名之间显示了空格,我认为这就是问题所在。没有空格,只有逗号;在 CSV 文件中对所有行执行相同的操作。
(http://docs.datastax.com/en/cql/3.3/cql/cql_reference/keywords_r.html)
我正在尝试将 CSV 文件导入 Cassandra table,但是我遇到了问题。 插入成功后,至少Cassandra是这么说的,我还是看不到任何记录。这里有更多细节:
qlsh:recommendation_engine> COPY row_historical_game_outcome_data FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|';
2 rows imported in 0.216 seconds.
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data;
customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------
(0 rows)
cqlsh:recommendation_engine>
这就是我的数据的样子
'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|123123|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0|
'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|456456|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0|
这是 cassandra 版本 apache-cassandra-2.2.0
已编辑:
CREATE TABLE row_historical_game_outcome_data (
customer_id int,
game_id int,
time timestamp,
channel text,
currency_code text,
game_code text,
game_name text,
game_type text,
game_vendor text,
progressive_winnings double,
stake_amount double,
win_amount double,
PRIMARY KEY ((customer_id, game_id, time))
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
我还按照 uri2x
的建议尝试了以下方法但还是一无所获:
select * from row_historical_game_outcome_data;
customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------
(0 rows)
cqlsh:recommendation_engine> COPY row_historical_game_outcome_data ("game_vendor","game_id","game_code","game_name","game_type","channel","customer_id","stake_amount","win_amount","currency_code","time","progressive_winnings") FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|';
2 rows imported in 0.192 seconds.
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data;
customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------
(0 rows)
在您的 CSV 文件中有两件事困扰着 cqlsh:
- 去掉尾部 |在每个 CSV 行的末尾
- 从您的时间值中删除微秒(精度最多应为毫秒)。
好的,我必须更改一些有关您的数据文件的内容才能使这项工作正常进行:
SomeName|673|SomeName|SomeName|TYPE|M|123123|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0
SomeName|673|SomeName|SomeName|TYPE|M|456456|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0
- 移除尾管。
- 将时间缩短为秒。
- 删除了所有单引号。
一旦我这样做了,然后我执行了:
aploetz@cqlsh:Whosebug> COPY row_historical_game_outcome_data
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount,
win_amount,currency_code , time , progressive_winnings)
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|';
Improper COPY command.
这个有点棘手。我终于想通了 COPY
不喜欢列名 time
。我将 table 调整为使用名称 game_time
,而 re-ran COPY
:
aploetz@cqlsh:Whosebug> DROP TABLE row_historical_game_outcome_data ;
aploetz@cqlsh:Whosebug> CREATE TABLE row_historical_game_outcome_data (
... customer_id int,
... game_id int,
... game_time timestamp,
... channel text,
... currency_code text,
... game_code text,
... game_name text,
... game_type text,
... game_vendor text,
... progressive_winnings double,
... stake_amount double,
... win_amount double,
... PRIMARY KEY ((customer_id, game_id, game_time))
... );
aploetz@cqlsh:Whosebug> COPY row_historical_game_outcome_data
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount,
win_amount,currency_code , game_time , progressive_winnings)
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|';
3 rows imported in 0.738 seconds.
aploetz@cqlsh:Whosebug> SELECT * FROM row_historical_game_outcome_data ;
customer_id | game_id | game_time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+--------------------------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------
123123 | 673 | 2015-07-01 00:01:42-0500 | M | GBP | SomeName | SomeName | TYPE | SomeName | 0 | 0.2 | 0
456456 | 673 | 2015-07-01 00:01:42-0500 | M | GBP | SomeName | SomeName | TYPE | SomeName | 0 | 0.2 | 0
(2 rows)
- 我不确定为什么它说“导入了 3 行”,所以我猜它正在计算 header 行。
- 您的键都是分区键。不确定你是否真的理解了。我只是指出来,因为我想不出指定多个分区键 而不 也指定集群键的原因。
- 我在 DataStax 文档中找不到任何表明 "time" 是保留字的内容。这可能是 cqlsh 中的错误。但说真的,无论如何,您可能应该将 time-based 列名称指定为 "time" 以外的名称。
另一条评论。 CQL 中的 COPY 添加了 WITH HEADER = TRUE ,这将导致 CSV 文件的 header 行(第一行)被忽略。 (http://docs.datastax.com/en/cql/3.3/cql/cql_reference/copy_r.html)
"time" 不是 CQL 中的保留字(相信我,因为我自己刚刚更新了 DataStax 文档中的 CQL 保留字)。但是,您确实在列名 "time" 周围的 COPY 命令中的列名之间显示了空格,我认为这就是问题所在。没有空格,只有逗号;在 CSV 文件中对所有行执行相同的操作。 (http://docs.datastax.com/en/cql/3.3/cql/cql_reference/keywords_r.html)