如何使用 cassandra 更新处理竞争条件？

Question

我正在学习 Cassandra。我正在为特定用例建模 cassandra table。下面描述的用例-

用户可以写 post。其他用户可以回复post。用户也可以"up vote"或"down vote"一个post。用户按日期或赞成票或反对票对 post 进行排序。

这是我的table定义-

CREATE TABLE post.comments_by_post (
postid text,
parentpostid text,
createdon bigint,
username text,
userid text,
displayname text,
upvotes int,
downvotes int,
comment text,
PRIMARY KEY ((postid, parentpostid), createdon)
) WITH CLUSTERING ORDER BY (createdon DESC);

增加"upvote"我有一个更新查询-

UPDATE post.comments_by_post SET upvotes = incrementedValue where postid=1 and parentpostid = 2 ;

incrementedValue 是在之前的值上加 1。

incrementedValue = previousValue + 1

我的问题是，如果我必须根据 table 中的先前值计算增量，这将导致竞争条件和数据损坏。

我们有更好的方法吗？

我知道cassandra有counter column definition type，可以用于这种增量值，但是需要额外的table。计数器列不能与不属于主键的普通列一起使用。

Answer 1

发生并发更新时您将丢失一些更新。
前任。用户 A 读取当前值，比如 10。同时另一个用户 B 也读取当前值，他将获得 10。然后用户 A 使用新值 11 发出更新请求。然后用户 B 也将使用新值发出更新请求11.So 您丢失了用户 A 更新。

专柜table是您最好的选择

A counter is a special column used to store a number that is changed in increments. Cassandra counters were redesigned in Cassandra 2.1 to alleviate some of the difficulties. Read What’s New in Cassandra 2.1: Better Implementation of Counters to discover the improvements made in the counters.

您可以像这样创建一个计数器 table：

CREATE TABLE vote_counter (
   postid text,
   parentpostid text,
   upvotes counter,
   downvotes counter,
   PRIMARY KEY((postid,parentpostid))
)

现在您可以这样查询：

UPDATE vote_counter SET upvotes = upvotes + 1 WHERE postid = ? AND parentpostid = ?
UPDATE vote_counter SET upvotes = upvotes - 1 WHERE postid = ? AND parentpostid = ?
UPDATE vote_counter SET downvotes = downvotes + 1 WHERE postid = ? AND parentpostid = ?
UPDATE vote_counter SET downvotes = downvotes - 1 WHERE postid = ? AND parentpostid = ?

Answer 2

根据您的描述：

...User sort the posts by date or up votes or down votes.

您的目标是三个用例，但是您的table定义只解决了第一个（按日期）。为了解决另外两个，你需要创建两个 tables，使用 upvotes 和 downvotes 字段作为你的聚类键（分别），并努力保持所有三个table 秒同步：

CREATE TABLE post.comments_by_post (
    postid text,
    parentpostid text,
    createdon bigint,
    username text,
    userid text,
    displayname text,
    upvotes int,
    downvotes int,
    comment text,
    PRIMARY KEY ((postid, parentpostid), upvotes) 
) WITH CLUSTERING ORDER BY (createdon DESC);

如果升级 C* 并使用 3.0，您可以节省大量工作并创建一个 Materialized View。

回到你的并发问题，在分布式环境中计数真的很难。根据您的要求，我建议您两种可能的解决方案:

1) 你不需要精确（你可以容忍over/under 计数）。在这种情况下，我建议您使用新的 Cassandra 计数器 table 来存储您的计数器。这种方法的主要缺点是您实际上失去了获得结果的能力（从您的应用程序的角度来看）顺序，因此您需要在应用程序级别应用顺序。您还保存了上面描述的其他两个 table，因为计数器保留在另一个 table.

中

2) 你需要精确。在这种情况下，您需要 序列化 对每个 post 计数器的访问。您可以通过保留您要更新的或最近更新的 post 个计数器的小缓存来实现此目的，并获取一个锁每次要更新时，在应用程序级别的每个项目上。 64k posts 应该足够了。现在您知道对于每个 post 执行更新顺序。这不会出错，因为您没有应用 global 锁，您只应用 local 锁。对于 C* 2.0，您仍然需要三个 table，或者对于 C* 3.0，还需要一个 + 实体化视图。

Answer 3

下面的table和二级索引将允许你在没有计数器table和没有任何锁的情况下实现计数：

CREATE TABLE votes_by_comment (
   postid text,
   parentpostid text,
   userid text,
   vote text, //can be 'up' or 'down'
PRIMARY KEY (( postid, parentpostid ), userid))

CREATE INDEX ON votes_by_comment (vote);

当用户执行 'up votes':

INSERT INTO votes_by_comment (postid, parentpostid, userid, vote) VALUES ('comment1', 'post1', 'user1', 'up');

当用户执行 'down votes':

INSERT INTO votes_by_comment (postid, parentpostid, userid, vote) VALUES ('comment1', 'post1', 'user1', 'down');

userid 因为集群列将允许它避免竞争条件并限制一个用户的多次投票。

计票：

SELECT count(*) from votes_by_comment WHERE postid='comment1' AND parentpostid='post1' and vote='up';

二级索引将允许它执行select by vote值，因为二级索引的select将在一个分区键内执行，它会有很好的性能.

但是这种方法不允许您在 Cassandra 端实现投票排序，它应该在应用程序端实现。

如何使用 cassandra 更新处理竞争条件？

How to handle race condition with cassandra updates?

race-condition

cassandra

cassandra-2.0