如何使用 CQL 在 Cassandra 中填充相关 table?

How to populate related table in Cassandra using CQL?

我正在尝试使用 this example 练习 Cassandra(在 Composite Columns 段落下):

所以,我创建了 table 推文 ,它看起来如下所示:

cqlsh:twitter> SELECT * from tweets;

 tweet_id                             | author      | body
--------------------------------------+-------------+--------------
 73954b90-baf7-11e4-a7d0-27983e9e7f51 | gwashington | I chopped...

(1 rows)

现在我正在尝试填充时间线,这是一个使用 CQL 的相关 table,但我不确定该怎么做。我试过 SQL 方法,但没有用:

cqlsh:twitter> INSERT INTO timeline (user_id, tweet_id, author, body) SELECT 'gmason', 73954b90-baf7-11e4-a7d0-27983e9e7f51, author, body FROM tweets WHERE tweet_id = 73954b90-baf7-11e4-a7d0-27983e9e7f51;
Bad Request: line 1:55 mismatched input 'select' expecting K_VALUES

所以我有两个问题:

  1. 如何用 SQL 填充 时间轴 table,以便它与 推文 相关?
  2. 我如何确保 时间轴物理布局 将按照该示例中所示创建?

谢谢。

编辑:

这是对我上面问题#2的解释(图片来自here):

  1. 为此,您需要使用 ETL 工具。使用 Hadoop 或 Spark。 CQL 中没有 INSERT/SELECT,这是有原因的。在现实世界中,您将需要从您的应用程序执行 2 次插入 - 每个插入一次 table.

  2. 您将不得不相信,当您拥有带分区键和集群键的主键时,这将以宽行格式存储数据。

tldr;

  1. 使用cqlsh COPY导出tweets,修改文件,使用COPY导入timeline.

  2. 使用cassandra-cli验证物理结构

长版...

  1. 我将在这方面采取不同的方式,并建议在 cqlsh 中使用本机 COPY 命令可能会更容易。

我跟了类似的examples found here。在 cqlsh 中创建 tweetstimeline table 之后,我按照指示将行插入 tweets 中。我的 tweets table 看起来像这样:

aploetz@cqlsh:Whosebug> SELECT * FROM tweets;

 tweet_id                             | author      | body
--------------------------------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------
 05a5f177-f070-486d-b64d-4e2bb28eaecc |      gmason | Those gentlemen, who will be elected senators, will fix themselves in the federal town, and become citizens of that town more than of your state.
 b67fe644-4dbe-489b-bc71-90f809f88636 |    jmadison |                                                                                  All men having power ought to be distrusted to a certain degree.
 819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1 | gwashington |                                                                    To be prepared for war is one of the most effectual means of preserving peace.

然后我像这样导出它们:

aploetz@cqlsh:Whosebug> COPY tweets TO '/home/aploetz/tweets_20150223.txt' 
WITH DELIMITER='|' AND HEADER=true;

3 rows exported in 0.052 seconds.

然后我编辑了 tweets_20150223.txt file,在前面添加了一个 user_id 列并复制了几行,如下所示:

userid|tweet_id|author|body
gmason|05a5f177-f070-486d-b64d-4e2bb28eaecc|gmason|Those gentlemen, who will be elected senators, will fix themselves in the federal town, and become citizens of that town more than of your state.
jmadison|b67fe644-4dbe-489b-bc71-90f809f88636|jmadison|All men having power ought to be distrusted to a certain degree.
gwashington|819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1|gwashington|To be prepared for war is one of the most effectual means of preserving peace.
jmadison|819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1|gwashington|To be prepared for war is one of the most effectual means of preserving peace.
ahamilton|819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1|gwashington|To be prepared for war is one of the most effectual means of preserving peace.
ahamilton|05a5f177-f070-486d-b64d-4e2bb28eaecc|gmason|Those gentlemen, who will be elected senators, will fix themselves in the federal town, and become citizens of that town more than of your state.

我将该文件保存为 timeline_20150223.txt,并将其导入 timeline table,如下所示:

aploetz@cqlsh:Whosebug> COPY timeline FROM '/home/aploetz/timeline_20150223.txt' 
WITH DELIMITER='|' AND HEADER=true;

6 rows imported in 0.016 seconds.
  1. 是的,timeline 将是一个宽行 table,在 user_id 上进行分区,然后在 tweet_id 上进行聚类。我通过 运行 cassandra-cli 工具和 listing timeline 列族 (table) 验证了 "under the hood" 结构。在这里您可以看到行是如何按 user_id 分区的,并且每列都有 tweet_id uuid 作为其名称的一部分:

-

[default@Whosebug] list timeline;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: ahamilton
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:, value=, timestamp=1424707827585904)
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:author, value=676d61736f6e, timestamp=1424707827585904)
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:body, value=54686f73652067656e746c656d656e2c2077686f2077696c6c20626520656c65637465642073656e61746f72732c2077696c6c20666978207468656d73656c76657320696e20746865206665646572616c20746f776e2c20616e64206265636f6d6520636974697a656e73206f66207468617420746f776e206d6f7265207468616e206f6620796f75722073746174652e, timestamp=1424707827585904)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:, value=, timestamp=1424707827585715)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:author, value=6777617368696e67746f6e, timestamp=1424707827585715)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:body, value=546f20626520707265706172656420666f7220776172206973206f6e65206f6620746865206d6f73742065666665637475616c206d65616e73206f662070726573657276696e672070656163652e, timestamp=1424707827585715)
-------------------
RowKey: gmason
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:, value=, timestamp=1424707827585150)
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:author, value=676d61736f6e, timestamp=1424707827585150)
=> (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:body, value=54686f73652067656e746c656d656e2c2077686f2077696c6c20626520656c65637465642073656e61746f72732c2077696c6c20666978207468656d73656c76657320696e20746865206665646572616c20746f776e2c20616e64206265636f6d6520636974697a656e73206f66207468617420746f776e206d6f7265207468616e206f6620796f75722073746174652e, timestamp=1424707827585150)
-------------------
RowKey: gwashington
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:, value=, timestamp=1424707827585475)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:author, value=6777617368696e67746f6e, timestamp=1424707827585475)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:body, value=546f20626520707265706172656420666f7220776172206973206f6e65206f6620746865206d6f73742065666665637475616c206d65616e73206f662070726573657276696e672070656163652e, timestamp=1424707827585475)
-------------------
RowKey: jmadison
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:, value=, timestamp=1424707827585597)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:author, value=6777617368696e67746f6e, timestamp=1424707827585597)
=> (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:body, value=546f20626520707265706172656420666f7220776172206973206f6e65206f6620746865206d6f73742065666665637475616c206d65616e73206f662070726573657276696e672070656163652e, timestamp=1424707827585597)
=> (name=b67fe644-4dbe-489b-bc71-90f809f88636:, value=, timestamp=1424707827585348)
=> (name=b67fe644-4dbe-489b-bc71-90f809f88636:author, value=6a6d616469736f6e, timestamp=1424707827585348)
=> (name=b67fe644-4dbe-489b-bc71-90f809f88636:body, value=416c6c206d656e20686176696e6720706f776572206f7567687420746f206265206469737472757374656420746f2061206365727461696e206465677265652e, timestamp=1424707827585348)

4 Rows Returned.
Elapsed time: 35 msec(s).