使用 Cassandra 的内部网络应用程序数据模型

Internal network application data model with Cassandra

我正在设计一个应用程序,该应用程序将使用户能够发送请求以相互联系,查看他们发送或接收的请求,在他们的交互过程中做笔记以供以后连接时参考,以及从他们的联系人中删除用户列出。

在 RDBMS 中,架构将是:

table 列为

的用户

table 请求列:

table 与列的连接:

主键(从,到)

我预计需要的查询是:

我正在尝试在 Cassandra 中对此进行建模,但对为实现最高效率而选择的键感到困惑。

到目前为止,我有以下想法,欢迎更多有经验的Cassandra用户反馈:

create table users(
uid text PRIMARY KEY
); 

create table requestsByFrom(
from text,
to text,
message text,
created timestamp,
expiry timestamp,
PRIMARY KEY (from,to)

create table requestsByTo(
from text,
to text,
message text,
created timestamp,
expiry timestamp,
PRIMARY KEY (to,from)
);

create table connections(
from text,
to text,
notes text,
created timestamp,
modified timestamp,
isFavourite boolean,
isActive boolean,
pairedConnection boolean,
PRIMARY KEY (from,to)
);

create table activeConnections(
from text,
to text,
isActive boolean,
PRIMARY KEY (from,isActive)
);

create table favouriteConnections(
from text,
to text,
isFavourite boolean,
PRIMARY KEY (from, isFavourite)
);

create table pairedConnection(
from text,
to text,
pairedConnection boolean,
PRIMARY KEY ((from,to), pairedConnection)
);

Cassandra 具有与 RDBMS 不同的范例,这一点在必须完成数据建模的方式上更为明显。您需要记住,首选非规范化,并且您会有重复的数据。

table的定义应该是基于查询来检索数据的,这在问题的定义中有部分说明,例如:

find the sent requests for a user

采用 table requestsByFrom 的初始设计,另一种选择是

CREATE TABLE IF NOT EXISTS requests_sent_by_user(
    requester_email TEXT,
    recipient_email TEXT,
    recipient_name TEXT,
    message TEXT,
    created TIMESTAMP
PRIMARY KEY (requester_email, recipient_email)
) WITH default_time_to_live = 864000;

注意from是一个限制关键字,expiry信息可以通过default_time_to_live子句(TTL)的定义来设置,它会在定义的时间后删除记录;此值为插入记录后的秒数,示例为 10 天(864,000 秒)。

主键建议是邮箱地址,也可以是UUID,名字不推荐,因为可以多人同名(如James Smith)或同一个人名字可以有多种写法(下面的例子Jim SmithJ. Smithj smith可能指的是同一个人)

名称 recipient_name 也已添加,因为您很可能希望显示它;应添加任何其他将 displayed/used 与查询一起使用的信息。

find the received requests for a user

CREATE TABLE IF NOT EXISTS requests_received_by_user(
    recipient_email TEXT,
    requester_email TEXT,
    requester_name TEXT,
    message TEXT,
    created TIMESTAMP
PRIMARY KEY (recipient_email, requester_email)
) WITH default_time_to_live = 864000;

最好使用batch同时向requests_sent_by_userrequests_received_by_user添加记录,这将确保两者之间信息的一致性table s,TTL(数据到期)也将相同。

storing contacts

题中有4个table连接:connectionsactive_connectionsfavourite_connectionspaired_connections,两者有什么区别他们?他们会有不同的 rules/use 案例吗?如果是这样的话,将它们设置为不同的 tables:

是有意义的
CREATE TABLE IF NOT EXISTS connections(
    requester_email TEXT,
    recipient_email TEXT,
    recipient_name TEXT,
    notes TEXT,
    created TIMESTAMP,
    last_update TIMESTAMP,
    is_favourite BOOLEAN,
    is_active BOOLEAN,
    is_paired BOOLEAN,
    PRIMARY KEY (requester_email, recipient_email)
 );

CREATE TABLE IF NOT EXISTS active_connections(
    requester_email TEXT,
    recipient_email TEXT,
    recipient_name TEXT,
    last_update TIMESTAMP,
    PRIMARY KEY (requester_email, recipient_email)
);

CREATE TABLE IF NOT EXISTS favourite_connections(
    requester_email TEXT,
    recipient_email TEXT,
    recipient_name TEXT,
    last_update TIMESTAMP,
    PRIMARY KEY (requester_email, recipient_email)
);

CREATE TABLE IF NOT EXISTS paired_connections(
    requester_email TEXT,
    recipient_email TEXT,
    recipient_name TEXT,
    last_update TIMESTAMP,
    PRIMARY KEY (requester_email, recipient_email)
);

注意去掉了boolean flag,逻辑是如果记录存在于active_connections,会认为是活动连接

当一个新的连接被创建时,它可能在不同的tables中有几条记录;捆绑所有这些插入或更新,最好使用 batch

find all the active contacts of a given user

根据提议的tables,如果请求者的邮箱是test@email.com:

SELECT * FROM active_connections WHERE requester_email = 'test@email.com'

update user as favourite

将批量更新connections中的记录并将新记录添加到favourite_connections:

BEGIN BATCH

UPDATE connections 
SET is_favourite = true, last_update = dateof(now())
WHERE requester_email ='test@email.com' 
  AND recipient_email = 'john.smith@test.com';

INSERT INTO favourite_connections (
    requester_email, recipient_email, recipient_name, last_update
) VALUES (
    'test@email.com', 'john.smith@test.com', 'John Smith', dateof(now())
);
APPLY BATCH;

mark connection for soft deletion

连接的信息可以保留在connections中并禁用所有标志,以及从active_connectionsfavourite_connectionspaired_connections中删除的记录

BEGIN BATCH

UPDATE connections 
SET is_active = false, is_favourite = false,
    is_paired = false, last_update = dateof(now())
WHERE requester_email ='test@email.com' 
  AND recipient_email = 'john.smith@test.com';

DELETE FROM active_connections 
WHERE requester_email = 'test@email.com' 
  AND recipient_email = 'john.smith@test.com';

DELETE FROM favourite_connections 
WHERE requester_email = 'test@email.com' 
  AND recipient_email = 'john.smith@test.com';

DELETE FROM paired_connections 
WHERE requester_email = 'test@email.com' 
  AND recipient_email = 'john.smith@test.com';

APPLY BATCH;