如何使用 CQL 在 Cassandra 中填充相关 table?
How to populate related table in Cassandra using CQL?
我正在尝试使用 this example 练习 Cassandra(在 Composite Columns 段落下):
所以,我创建了 table 推文 ,它看起来如下所示:
cqlsh:twitter> SELECT * from tweets;
tweet_id | author | body
--------------------------------------+-------------+--------------
73954b90-baf7-11e4-a7d0-27983e9e7f51 | gwashington | I chopped...
(1 rows)
现在我正在尝试填充时间线,这是一个使用 CQL 的相关 table,但我不确定该怎么做。我试过 SQL 方法,但没有用:
cqlsh:twitter> INSERT INTO timeline (user_id, tweet_id, author, body) SELECT 'gmason', 73954b90-baf7-11e4-a7d0-27983e9e7f51, author, body FROM tweets WHERE tweet_id = 73954b90-baf7-11e4-a7d0-27983e9e7f51;
Bad Request: line 1:55 mismatched input 'select' expecting K_VALUES
所以我有两个问题:
- 如何用 SQL 填充 时间轴 table,以便它与 推文 相关?
- 我如何确保 时间轴物理布局 将按照该示例中所示创建?
谢谢。
编辑:
这是对我上面问题#2的解释(图片来自here):
为此,您需要使用 ETL 工具。使用 Hadoop 或 Spark。 CQL 中没有 INSERT/SELECT
,这是有原因的。在现实世界中,您将需要从您的应用程序执行 2 次插入 - 每个插入一次 table.
您将不得不相信,当您拥有带分区键和集群键的主键时,这将以宽行格式存储数据。
tldr;
使用cqlsh COPY
导出tweets
,修改文件,使用COPY
导入timeline
.
使用cassandra-cli验证物理结构
长版...
- 我将在这方面采取不同的方式,并建议在 cqlsh 中使用本机
COPY
命令可能会更容易。
我跟了类似的examples found here。在 cqlsh 中创建 tweets
和 timeline
table 之后,我按照指示将行插入 tweets
中。我的 tweets
table 看起来像这样:
aploetz@cqlsh:Whosebug> SELECT * FROM tweets;
tweet_id | author | body
--------------------------------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------
05a5f177-f070-486d-b64d-4e2bb28eaecc | gmason | Those gentlemen, who will be elected senators, will fix themselves in the federal town, and become citizens of that town more than of your state.
b67fe644-4dbe-489b-bc71-90f809f88636 | jmadison | All men having power ought to be distrusted to a certain degree.
819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1 | gwashington | To be prepared for war is one of the most effectual means of preserving peace.
然后我像这样导出它们:
aploetz@cqlsh:Whosebug> COPY tweets TO '/home/aploetz/tweets_20150223.txt'
WITH DELIMITER='|' AND HEADER=true;
3 rows exported in 0.052 seconds.
然后我编辑了 tweets_20150223.txt file
,在前面添加了一个 user_id
列并复制了几行,如下所示:
userid|tweet_id|author|body
gmason|05a5f177-f070-486d-b64d-4e2bb28eaecc|gmason|Those gentlemen, who will be elected senators, will fix themselves in the federal town, and become citizens of that town more than of your state.
jmadison|b67fe644-4dbe-489b-bc71-90f809f88636|jmadison|All men having power ought to be distrusted to a certain degree.
gwashington|819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1|gwashington|To be prepared for war is one of the most effectual means of preserving peace.
jmadison|819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1|gwashington|To be prepared for war is one of the most effectual means of preserving peace.
ahamilton|819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1|gwashington|To be prepared for war is one of the most effectual means of preserving peace.
ahamilton|05a5f177-f070-486d-b64d-4e2bb28eaecc|gmason|Those gentlemen, who will be elected senators, will fix themselves in the federal town, and become citizens of that town more than of your state.
我将该文件保存为 timeline_20150223.txt
,并将其导入 timeline
table,如下所示:
aploetz@cqlsh:Whosebug> COPY timeline FROM '/home/aploetz/timeline_20150223.txt'
WITH DELIMITER='|' AND HEADER=true;
6 rows imported in 0.016 seconds.
- 是的,
timeline
将是一个宽行 table,在 user_id
上进行分区,然后在 tweet_id
上进行聚类。我通过 运行 cassandra-cli 工具和 list
ing timeline
列族 (table) 验证了 "under the hood" 结构。在这里您可以看到行是如何按 user_id
分区的,并且每列都有 tweet_id
uuid 作为其名称的一部分:
-
[default@Whosebug] list timeline;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: ahamilton
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:, value=, timestamp=1424707827585904)
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:author, value=676d61736f6e, timestamp=1424707827585904)
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:body, value=54686f73652067656e746c656d656e2c2077686f2077696c6c20626520656c65637465642073656e61746f72732c2077696c6c20666978207468656d73656c76657320696e20746865206665646572616c20746f776e2c20616e64206265636f6d6520636974697a656e73206f66207468617420746f776e206d6f7265207468616e206f6620796f75722073746174652e, timestamp=1424707827585904)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:, value=, timestamp=1424707827585715)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:author, value=6777617368696e67746f6e, timestamp=1424707827585715)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:body, value=546f20626520707265706172656420666f7220776172206973206f6e65206f6620746865206d6f73742065666665637475616c206d65616e73206f662070726573657276696e672070656163652e, timestamp=1424707827585715)
-------------------
RowKey: gmason
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:, value=, timestamp=1424707827585150)
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:author, value=676d61736f6e, timestamp=1424707827585150)
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:body, value=54686f73652067656e746c656d656e2c2077686f2077696c6c20626520656c65637465642073656e61746f72732c2077696c6c20666978207468656d73656c76657320696e20746865206665646572616c20746f776e2c20616e64206265636f6d6520636974697a656e73206f66207468617420746f776e206d6f7265207468616e206f6620796f75722073746174652e, timestamp=1424707827585150)
-------------------
RowKey: gwashington
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:, value=, timestamp=1424707827585475)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:author, value=6777617368696e67746f6e, timestamp=1424707827585475)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:body, value=546f20626520707265706172656420666f7220776172206973206f6e65206f6620746865206d6f73742065666665637475616c206d65616e73206f662070726573657276696e672070656163652e, timestamp=1424707827585475)
-------------------
RowKey: jmadison
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:, value=, timestamp=1424707827585597)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:author, value=6777617368696e67746f6e, timestamp=1424707827585597)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:body, value=546f20626520707265706172656420666f7220776172206973206f6e65206f6620746865206d6f73742065666665637475616c206d65616e73206f662070726573657276696e672070656163652e, timestamp=1424707827585597)
=> (name=b67fe644-4dbe-489b-bc71-90f809f88636:, value=, timestamp=1424707827585348)
=> (name=b67fe644-4dbe-489b-bc71-90f809f88636:author, value=6a6d616469736f6e, timestamp=1424707827585348)
=> (name=b67fe644-4dbe-489b-bc71-90f809f88636:body, value=416c6c206d656e20686176696e6720706f776572206f7567687420746f206265206469737472757374656420746f2061206365727461696e206465677265652e, timestamp=1424707827585348)
4 Rows Returned.
Elapsed time: 35 msec(s).
我正在尝试使用 this example 练习 Cassandra(在 Composite Columns 段落下):
所以,我创建了 table 推文 ,它看起来如下所示:
cqlsh:twitter> SELECT * from tweets;
tweet_id | author | body
--------------------------------------+-------------+--------------
73954b90-baf7-11e4-a7d0-27983e9e7f51 | gwashington | I chopped...
(1 rows)
现在我正在尝试填充时间线,这是一个使用 CQL 的相关 table,但我不确定该怎么做。我试过 SQL 方法,但没有用:
cqlsh:twitter> INSERT INTO timeline (user_id, tweet_id, author, body) SELECT 'gmason', 73954b90-baf7-11e4-a7d0-27983e9e7f51, author, body FROM tweets WHERE tweet_id = 73954b90-baf7-11e4-a7d0-27983e9e7f51;
Bad Request: line 1:55 mismatched input 'select' expecting K_VALUES
所以我有两个问题:
- 如何用 SQL 填充 时间轴 table,以便它与 推文 相关?
- 我如何确保 时间轴物理布局 将按照该示例中所示创建?
谢谢。
编辑:
这是对我上面问题#2的解释(图片来自here):
为此,您需要使用 ETL 工具。使用 Hadoop 或 Spark。 CQL 中没有
INSERT/SELECT
,这是有原因的。在现实世界中,您将需要从您的应用程序执行 2 次插入 - 每个插入一次 table.您将不得不相信,当您拥有带分区键和集群键的主键时,这将以宽行格式存储数据。
tldr;
使用cqlsh
COPY
导出tweets
,修改文件,使用COPY
导入timeline
.使用cassandra-cli验证物理结构
长版...
- 我将在这方面采取不同的方式,并建议在 cqlsh 中使用本机
COPY
命令可能会更容易。
我跟了类似的examples found here。在 cqlsh 中创建 tweets
和 timeline
table 之后,我按照指示将行插入 tweets
中。我的 tweets
table 看起来像这样:
aploetz@cqlsh:Whosebug> SELECT * FROM tweets;
tweet_id | author | body
--------------------------------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------
05a5f177-f070-486d-b64d-4e2bb28eaecc | gmason | Those gentlemen, who will be elected senators, will fix themselves in the federal town, and become citizens of that town more than of your state.
b67fe644-4dbe-489b-bc71-90f809f88636 | jmadison | All men having power ought to be distrusted to a certain degree.
819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1 | gwashington | To be prepared for war is one of the most effectual means of preserving peace.
然后我像这样导出它们:
aploetz@cqlsh:Whosebug> COPY tweets TO '/home/aploetz/tweets_20150223.txt'
WITH DELIMITER='|' AND HEADER=true;
3 rows exported in 0.052 seconds.
然后我编辑了 tweets_20150223.txt file
,在前面添加了一个 user_id
列并复制了几行,如下所示:
userid|tweet_id|author|body
gmason|05a5f177-f070-486d-b64d-4e2bb28eaecc|gmason|Those gentlemen, who will be elected senators, will fix themselves in the federal town, and become citizens of that town more than of your state.
jmadison|b67fe644-4dbe-489b-bc71-90f809f88636|jmadison|All men having power ought to be distrusted to a certain degree.
gwashington|819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1|gwashington|To be prepared for war is one of the most effectual means of preserving peace.
jmadison|819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1|gwashington|To be prepared for war is one of the most effectual means of preserving peace.
ahamilton|819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1|gwashington|To be prepared for war is one of the most effectual means of preserving peace.
ahamilton|05a5f177-f070-486d-b64d-4e2bb28eaecc|gmason|Those gentlemen, who will be elected senators, will fix themselves in the federal town, and become citizens of that town more than of your state.
我将该文件保存为 timeline_20150223.txt
,并将其导入 timeline
table,如下所示:
aploetz@cqlsh:Whosebug> COPY timeline FROM '/home/aploetz/timeline_20150223.txt'
WITH DELIMITER='|' AND HEADER=true;
6 rows imported in 0.016 seconds.
- 是的,
timeline
将是一个宽行 table,在user_id
上进行分区,然后在tweet_id
上进行聚类。我通过 运行 cassandra-cli 工具和list
ingtimeline
列族 (table) 验证了 "under the hood" 结构。在这里您可以看到行是如何按user_id
分区的,并且每列都有tweet_id
uuid 作为其名称的一部分:
-
[default@Whosebug] list timeline;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: ahamilton
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:, value=, timestamp=1424707827585904)
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:author, value=676d61736f6e, timestamp=1424707827585904)
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:body, value=54686f73652067656e746c656d656e2c2077686f2077696c6c20626520656c65637465642073656e61746f72732c2077696c6c20666978207468656d73656c76657320696e20746865206665646572616c20746f776e2c20616e64206265636f6d6520636974697a656e73206f66207468617420746f776e206d6f7265207468616e206f6620796f75722073746174652e, timestamp=1424707827585904)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:, value=, timestamp=1424707827585715)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:author, value=6777617368696e67746f6e, timestamp=1424707827585715)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:body, value=546f20626520707265706172656420666f7220776172206973206f6e65206f6620746865206d6f73742065666665637475616c206d65616e73206f662070726573657276696e672070656163652e, timestamp=1424707827585715)
-------------------
RowKey: gmason
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:, value=, timestamp=1424707827585150)
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:author, value=676d61736f6e, timestamp=1424707827585150)
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:body, value=54686f73652067656e746c656d656e2c2077686f2077696c6c20626520656c65637465642073656e61746f72732c2077696c6c20666978207468656d73656c76657320696e20746865206665646572616c20746f776e2c20616e64206265636f6d6520636974697a656e73206f66207468617420746f776e206d6f7265207468616e206f6620796f75722073746174652e, timestamp=1424707827585150)
-------------------
RowKey: gwashington
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:, value=, timestamp=1424707827585475)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:author, value=6777617368696e67746f6e, timestamp=1424707827585475)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:body, value=546f20626520707265706172656420666f7220776172206973206f6e65206f6620746865206d6f73742065666665637475616c206d65616e73206f662070726573657276696e672070656163652e, timestamp=1424707827585475)
-------------------
RowKey: jmadison
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:, value=, timestamp=1424707827585597)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:author, value=6777617368696e67746f6e, timestamp=1424707827585597)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:body, value=546f20626520707265706172656420666f7220776172206973206f6e65206f6620746865206d6f73742065666665637475616c206d65616e73206f662070726573657276696e672070656163652e, timestamp=1424707827585597)
=> (name=b67fe644-4dbe-489b-bc71-90f809f88636:, value=, timestamp=1424707827585348)
=> (name=b67fe644-4dbe-489b-bc71-90f809f88636:author, value=6a6d616469736f6e, timestamp=1424707827585348)
=> (name=b67fe644-4dbe-489b-bc71-90f809f88636:body, value=416c6c206d656e20686176696e6720706f776572206f7567687420746f206265206469737472757374656420746f2061206365727461696e206465677265652e, timestamp=1424707827585348)
4 Rows Returned.
Elapsed time: 35 msec(s).