如何在 mongodb 中搜索逗号分隔的数据

Question

我有不同领域的电影数据库。 Genre 字段包含一个逗号分隔的字符串，如 :

{genre: 'Action, Adventure, Sci-Fi'}

我知道我可以使用正则表达式来查找匹配项。我也试过：

{'genre': {'$in': genre}}

问题是运行时间。 return 查询结果需要很多时间。该数据库有大约 300K 个文档，我已经对 'genre' 字段进行了正常索引。

Answer 1

会说使用 Map-Reduce 创建一个单独的集合，将 genre 存储为一个数组，其值来自拆分逗号分隔的字符串，然后，您可以运行 Map-Reduce 作业并管理对输出集合的查询。

例如，我为 foo 集合创建了一些示例文档：

db.foo.insert([
    {genre: 'Action, Adventure, Sci-Fi'},
    {genre: 'Thriller, Romantic'},
    {genre: 'Comedy, Action'}
])

随后的 map/reduce 操作将生成您可以从中应用高性能查询的集合：

map = function() {
    var array = this.genre.split(/\s*,\s*/);
    emit(this._id, array);
}

reduce = function(key, values) {
    return values;
}

result = db.runCommand({
    "mapreduce" : "foo", 
    "map" : map,
    "reduce" : reduce,
    "out" : "foo_result"
});

查询会很简单，利用 value 字段上的多键索引查询：

db.foo_result.createIndex({"value": 1});

var genre = ['Action', 'Adventure'];
db.foo_result.find({'value': {'$in': genre}})

输出:

/* 0 */
{
    "_id" : ObjectId("55842af93cab061ff5c618ce"),
    "value" : [ 
        "Action", 
        "Adventure", 
        "Sci-Fi"
    ]
}

/* 1 */
{
    "_id" : ObjectId("55842af93cab061ff5c618d0"),
    "value" : [ 
        "Comedy", 
        "Action"
    ]
}

Answer 2

好吧，你不能真正有效地做到这一点，所以我很高兴你在你的问题上使用了标签 "performance"。

如果您想用字符串中的 "comma separated" 数据执行此操作，您需要这样做：

如果适合的话，一般使用正则表达式：

db.collection.find({ "genre": { "$regex": "Sci-Fi" } })

但效率不高。

或通过 JavaScript 评估 $where:

db.collection.find(function() {
     return ( 
         this.genre.split(",")
             .map(function(el) { 
                 return el.replace(/^\s+/,"") 
             })
             .indexOf("Sci-Fi") != -1;
    )
})

效率不高，可能和上面一样。

或者更好的是，可以使用索引、分隔数组并使用基本查询的东西：

{
    "genre": [ "Action", "Adventure", "Sci-Fi" ] 
}

有索引：

db.collection.ensureIndex({ "genre": 1 })

然后查询：

db.collection.find({ "genre": "Sci-Fi" })

当你这样做的时候就是这么简单。而且确实高效。

你做出选择。

如何在 mongodb 中搜索逗号分隔的数据

How to search comma separated data in mongodb

regex

performance

mongodb

mongodb-query