如何使用 MongoDB Java 查找字段的重复项数?

How can I find the number of duplicates for a field using MongoDB Java?

如何找到 Java-MongoDB 中每个文档的重复项数 我有 collection 这样的。 Collection 示例:

{
    "_id": {
        "$oid": "5fc8eb07d473e148192fbecd"
    },
    "ip_address": "192.168.0.1",
    "mac_address": "00:A0:C9:14:C8:29",
    "url": "https://people.richland.edu/dkirby/141macaddress.htm",
    "datetimes": {
        "$date": "2021-02-13T02:02:00.000Z"
    }
{
    "_id": {
        "$oid": "5ff539269a10d529d88d19f4"
    },
    "ip_address": "192.168.0.7",
    "mac_address": "00:A0:C9:14:C8:30",
    "url": "https://people.richland.edu/dkirby/141macaddress.htm",
    "datetimes": {
        "$date": "2021-02-12T19:00:00.000Z"
    }
}
{
    "_id": {
        "$oid": "60083d9a1cad2b613cd0c0a2"
    },
    "ip_address": "192.168.1.5",
    "mac_address": "00:0A:05:C7:C8:31",
    "url": "www.facebook.com",
    "datetimes": {
        "$date": "2021-01-24T17:00:00.000Z"
    }
}

示例查询:

            BasicDBObject whereQuery = new BasicDBObject();
            DBCursor cursor = table1.find(whereQuery);
            while (cursor.hasNext()) {
                DBObject obj = cursor.next();
                String ip_address = (String) obj.get("ip_address");
                String mac_address = (String) obj.get("mac_address");
                Date datetimes = (Date) obj.get("datetimes");
                String url = (String) obj.get("url");
                System.out.println(ip_address, mac_address, datetimes, url);
            }

在Java中,我如何知道计算“url”的重复数据。以及重复了多少。

如果我对你的问题的理解正确,你正在尝试查找字段 url 的重复条目数。您可以遍历所有文档并将它们添加到 SetSet 具有仅存储唯一值的 属性。当您添加您的值时,已经在 Set 中的值将不会再次添加。因此,Set 中的条目数与文档数之差就是给定字段的重复条目数。

如果您想知道哪些 URL 是非唯一的,您可以评估 Set.add(Object) 中的 return 值,它会告诉您给定值是否已在 Set 事先。如果有,你就得到了一个副本。

在 mongodb 中,您可以使用“聚合管道”解决此问题。您需要在“Mongodb Java 驱动程序”中实现此管道。它只给出重复的结果及其重复计数。

db.getCollection('table1').aggregate([
   {
        "$group": {
            // group by url and calculate count of duplicates by url 
            "_id": "$url",
            "url": {
                "$first": "$url"
            },
            "duplicates_count": {
                "$sum": 1
            },
            "duplicates": {
                "$push": {
                    "_id": "$_id",
                    "ip_address": "$ip_address",
                    "mac_address": "$mac_address",
                    "url": "$url",
                    "datetimes": "$datetimes"
                }
            }
        }
    },
    {   // select documents that only duplicates count higher than 1
        "$match": {
            "duplicates_count": {
                "$gt": 1
            }
        }
    },
    {
        "$project": {
            "_id": 0
        }
    }
]);

输出结果:

{
    "url" : "https://people.richland.edu/dkirby/141macaddress.htm",
    "duplicates_count" : 2.0,
    "duplicates" : [ 
        {
            "_id" : ObjectId("5fc8eb07d473e148192fbecd"),
            "ip_address" : "192.168.0.1",
            "mac_address" : "00:A0:C9:14:C8:29",
            "url" : "https://people.richland.edu/dkirby/141macaddress.htm",
            "datetimes" : {
                "$date" : "2021-02-13T02:02:00.000Z"
            }
        }, 
        {
            "_id" : ObjectId("5ff539269a10d529d88d19f4"),
            "ip_address" : "192.168.0.7",
            "mac_address" : "00:A0:C9:14:C8:30",
            "url" : "https://people.richland.edu/dkirby/141macaddress.htm",
            "datetimes" : {
                "$date" : "2021-02-12T19:00:00.000Z"
            }
        }
    ]
}