在 MongoLab 和一般情况下有效地存储数据

Question

我有一个侦听 websocket 的应用程序，它存储 usernames/userID 的（用户名是 1-20 个字节，用户 ID 是 17 个字节）。这没什么大不了的，因为它只是一份文件。然而，他们参与的每一轮，它都会推送轮 ID（24 字节）和一个 'score' 十进制值（例如：1190.0015239999999）。

问题是，有多少轮没有限制，我负担不起每个月为 mongolab 支付那么多费用。处理这些数据的最佳方式是什么？

我的想法： - 如果有办法替换 mongodb 中的 _id: 字段，我会用 17 字节长的 userID 替换它。不过不确定我是否可以这样做。

存储带有时间戳的用户数据，并删除得分值小于 200 的旧数据。
截掉超过 10 个字符的用户名。
~~完全删除 Round ID（或将 _id 字段替换为 roundId）。~~（不会起作用，因为每个文档中有多个 roundID）
小数值四舍五入到两位。
30 天后删除回合 ID

tl;博士

需要在 mongo 实验室
文档由username(1-20个字符), userid(17个字符), round(Object Array) = [{round Id(24个字符), score(1190.0015239999999)}].

提前致谢！

编辑：

文档架构：

userID: {type: String},
userName: {type: String},
rounds: [{roundID: String, score: String}]

Answer 1

所以目前您正在为每条记录在数组中存储三个个数据点。

_id: false 将阻止 mongoose 自动为文档创建一个 id。如果你不需要 roundID，那么你可以使用下面的数组只存储 one 个数据点：

round[{_id:false, score:String}]

否则，如果 roundID 实际上有意义，请使用以下在数组中存储两个个数据点的方法：

round[{_id:false, roundID: string, score:String}]

最后，如果您只需要一个 ID 以供参考，请使用以下内容，它将在数组中存储两个个数据点 - 一个随机 ID 和分数：

round[{score:String}]

Answer 2

建模 1:n 关系作为嵌入文档并不是最好的，除非是极少数情况。这是因为在撰写本文时 BSON 文档的大小限制为 16MB。

更好的（阅读更具扩展性和效率的方法）是使用 document references。

首先，您当然需要玩家数据。这是一个例子：

{
  _id: "SomeUserId",
  name: "SomeName"
}

不需要额外的 userId 字段，因为每个文档都需要有一个具有唯一值的 _id 字段。与流行的看法相反，此字段值不必是 ObjectId。因此，如果我没记错的话，我们已经将玩家数据所需的大小减少了 1/3。

接下来是每一轮的结果：

{
  _id: {
    round: "SomeString",
    player: "SomeUserId"
  },
  score: 5,
  createdAt: ISODate("2015-04-13T01:03:04.0002Z")
}

这里有几点需要注意。首先也是最重要的：不要使用字符串来记录值。即使是成绩也应该存储为相应的数值。否则你无法获得平均值等。我稍后会展示更多。我们在这里使用 _id 的复合字段，这是完全有效的。此外，它将为我们提供一个免费索引，优化一些最有可能的查询，例如 "How did player X score in round Y?"

db.results.find({"_id.player":"X","_id.round":"Y"})

或"What where the results of round Y?"

db.results.find({"_id.round":"Y"})

或"What we're the scores of Player X in all rounds?"

db.results.find({"_id.player":"X"})

然而，通过而不是使用字符串来保存分数，即使是一些漂亮的统计数据也变得相当便宜，例如"What was the average score of round Y?"

db.results.aggregate(
  { $match: { "_id.round":"Y" } },
  { $group: { "round":"$_id.round", "averageScore": {$avg:"$score"} }
)

或"What is the average score of each player in all rounds?"

db.results.aggregate(
  { $group: { "player: "$_id.player", "averageAll": {$avg:"$score"} }
)

虽然您可以在您的应用程序中进行这些计算，但 MongoDB 可以更有效地进行这些计算，因为数据不必在处理之前发送到您的应用程序。

接下来，数据过期。我们有一个 createdAt 字段，类型为 ISODate。现在，我们让 MongoDB 通过创建一个 TTL index

来处理剩下的事情

db.results.ensureIndex(
  { "createdAt":1 },
  { expireAfterSeconds: 60*60*24*30}
)

总而言之，这应该是最有效的数据存储和过期方式，同时提高了可扩展性。

在 MongoLab 和一般情况下有效地存储数据

Storing data efficiently in MongoLab and in general

database

heroku

mongodb

node.js

mlab