在像 MongoDB 和 Cassandra 这样的 NoSQL 数据库中，对包含相同类型属性数组的资源进行建模的正确方法是什么？

Question

例如，假设我正在尝试为存储 Instagram post 的数据库设计一个模型，并且单个 post 可以有多个图像。假设我在 S3 中存储图像。我的问题是如何将图像和 post 联系在一起？

在标准的关系数据库中，我可能会为图像创建一个单独的 table，并将 Instagram 的外键 post 和图像的路径存储在 S3 中。然后，当我检索 post 时，我会加入此图像 table。

在像 MongoDB 或 Cassandra 这样的 NoSQL 数据库中，我的理解是为了延迟最好避免连接。那么我会直接在我的 posts table 中存储一组图像路径吗？

Answer 1

在mongodb情况下，我的做法是这样的：

{
  "name":"post1",
  "content":"post1 content",
  "images":[
    "https://example.com/img1.jpg",
    "https://example.com/img2.jpg"
  ]
}

是的，嵌入比 link 其他文档更好，除非你有充分的理由。

即使您需要获取所有图像，您也可以查询数据库，然后投影结果以满足您的需要。

Answer 2

提供 Cassandra 答案：

与 MongoDB 解决方案类似，Cassandra 允许您将这样的数据嵌入到集合中。在这种情况下，LIST 将是一个不错的选择。

此外，您还需要根据查询方式对数据建模。虽然 post id 很有用，但它可能是查询使用率较低的东西。更有可能的是，对类似 post 的数据的查询是按日期 and/or 时间进行的，因此键入它很重要。

考虑到所有这些，生成的 Cassandra table 应该看起来像这样：

CREATE TABLE Whosebug.posts_by_month (
    month int,
    posttime timestamp,
    id uuid,
    content text,
    images list<text>,
    name text,
    PRIMARY KEY (month, posttime, id)
) WITH CLUSTERING ORDER BY (posttime DESC, id ASC);

请注意，对于您的用例而言，月份可能是也可能不是合适的值。根据给定月份中写入的 post 数量，可能需要更小的“时间桶”。

然后你可以查询上个月内的posts，像这样：

SELECT post_time,name,content,images FROM posts_by_month
WHERE month=202111
AND posttime > '2021-11-01 09:00';

  posttime                        | name  | content       | images
 ---------------------------------+-------+---------------+--------------------------
  2021-11-01 09:19:00.000000+0000 | post1 | post1 content | ['img1.jpg', 'img2.jpg']

(1 rows)

在像 MongoDB 和 Cassandra 这样的 NoSQL 数据库中，对包含相同类型属性数组的资源进行建模的正确方法是什么？

In NoSQL databases like MongoDB and Cassandra, what is the proper way to model a resource that can contain an array of the same type of attribute?

database

data-modeling

mongodb

cassandra