根据匹配键值对获取记录,比较Python中的日期

Obtain records based on matching key value pairs and comparing date in Python

我在 MongoDB 中有以下合集:

{
    "_id" : ObjectId("5bbc86e5c16a27f1e1bd39f8"),
    "name" : "swetha",
    "nameId" : 123,
    "source" : "Blore",
    "sourceId" : 10,
    "LastUpdate" : "10-Oct-2018"
}
{
    "_id" : ObjectId("5bbc86e5c16a27f1e1bd39f9"),
    "name" : "swetha",
    "nameId" : 123,
    "source" : "Mlore",
    "sourceId" : "11",
    "LastUpdate" : "11-Oct-2018"
}
{
    "_id" : ObjectId("5bbc86e5c16a27f1e1bd39fa"),
    "name" : "swathi",
    "nameId" : 124,
    "source" : "Mlore",
    "sourceId" : "11",
    "LastUpdate" : "9-Oct-2018"
}

我是Python的初学者,想根据匹配[=41比较以上记录之间的'LastUpdate' =]'nameId' 并希望将最新日期的记录推送到另一个集合。例如。 name:'Swetha' 前两条记录相同。所以比较'LastUpdate',输出最新的记录。

我写了下面的代码来从MongoDB读取数据记录并打印。尽管我在 Google.

上参考了很少的资源,但我不明白如何比较同一键内的记录并比较它们的时间戳
import json
import pandas as pd
from pymongo import MongoClient

try: 
    client = MongoClient() 
    print("Connected successfully!!!") 
except:   
    print("Could not connect to MongoDB") 

# database 
db = client.conn
collection = db.contactReg
df = collection.find()
for row in df:
    print(row)

参考链接

Is there a better way to compare dictionary values

https://gis.stackexchange.com/questions/87276/how-to-compare-values-from-a-column-in-attribute-table-with-values-in-dictionary

Comparing two dictionaries and printing key value pair in python 等等。

我觉得你需要的是聚合。这可能看起来很大,但一旦你摆脱了 mongo 聚合,你就会感到舒服。

df = collection.aggregate([
    {
        "$project": {
            "_id": 0,
            "name": 1,
            "nameId": 1,
            "source": 1,
            "sourceId": 1,
            "LastUpdate": 1,
            "LastUpdateArray": {
                "$split": [
                    "$LastUpdate",
                    "-"
                ]
            }
        }
    },
    {
        "$project": {
            "name": 1,
            "nameId": 1,
            "source": 1,
            "sourceId": 1,
            "LastUpdate": 1,
            "LastUpdateArray": 1,
            "LastUpdateMonth": {
                "$arrayElemAt": [
                    "$LastUpdateArray",
                    1
                ]
            }
        }
    },
    {
        "$project": {
            "name": 1,
            "nameId": 1,
            "source": 1,
            "sourceId": 1,
            "LastUpdate": 1,
            "Year": {
                "$arrayElemAt": [
                    "$LastUpdateArray",
                    2
                ]
            },
            "Date": {
                "$arrayElemAt": [
                    "$LastUpdateArray",
                    0
                ]
            },
            "Month": {
                "$switch": {
                    "branches": [
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Jan"
                                ]
                            },
                            "then": "01"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Feb"
                                ]
                            },
                            "then": "02"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Mar"
                                ]
                            },
                            "then": "03"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Apr"
                                ]
                            },
                            "then": "04"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "May"
                                ]
                            },
                            "then": "05"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Jun"
                                ]
                            },
                            "then": "06"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Jul"
                                ]
                            },
                            "then": "07"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Aug"
                                ]
                            },
                            "then": "08"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Sep"
                                ]
                            },
                            "then": "09"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Oct"
                                ]
                            },
                            "then": "10"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Nov"
                                ]
                            },
                            "then": "11"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Dec"
                                ]
                            },
                            "then": "12"
                        }
                    ],
                    "default": "01"
                }
            }
        }
    },
    {
        "$project": {
            "name": 1,
            "nameId": 1,
            "source": 1,
            "sourceId": 1,
            "LastUpdate": 1,
            "Year": 1,
            "Date": 1,
            "Month": 1,
            "DateString": {
                "$concat": [
                    "$Year",
                    "-",
                    "$Month",
                    "-",
                    "$Date"
                ]
            }
        }
    },
    {
        "$project": {
            "name": 1,
            "nameId": 1,
            "source": 1,
            "sourceId": 1,
            "LastUpdate": 1,
            "Date": {
                "$dateFromString": {
                    "dateString": "$DateString"
                }
            }
        }
    },
    {
        "$sort": {
            "Date": -1
        }
    },
    {
        "$group": {
            "_id": "$name",
            "name": {
                "$first": "$name"
            },
            "nameId": {
                "$first": "$nameId"
            },
            "source": {
                "$first": "$source"
            },
            "sourceId": {
                "$first": "$sourceId"
            },
            "LastUpdate": {
                "$first": "$LastUpdate"
            },
            "Date": {
                "$first": "$Date"
            }
        }
    },
    {
        "$project": {
            "name": 1,
            "nameId": 1,
            "source": 1,
            "sourceId": 1,
            "LastUpdate": 1
        }
    }
])

聚合的前5步,我尝试将其转换为日期,然后按日期降序排列。在 group by 中,我按 name 分组,并选择了第一个带有该名称的。

希望对您有所帮助。

我假设您需要的是重复记录,而我正在处理第一个出现的记录。参考:

df = collection.aggregate([
    {
        "$group": {
            "_id": "$name",
            "count": {
                "$sum": 1
            },
            "data": {
                "$push": {
                    "nameId": "$nameId",
                    "source": "$source",
                    "sourceId": "$sourceId",
                    "LastUpdate": "$LastUpdate"
                }
            }
        }
    },
    {
        "$match": {
            "_id": {
                "$ne": null
            },
            "count": {
                "$gt": 1
            }
        }
    }
])