使用 Python 比较数据框不同行中的日期

Compare date present in different rows of a data-frame using Python

我正在阅读来自 mongodb 的文档和 csv 文件,并将它们合并以检索重复记录。我有以下代码。现在我想比较这些记录和 return 具有最新日期的行之间的日期 (LastUpdate)。有人可以帮忙吗?

代码:

import json
import pandas as pd
import xlrd
from pymongo import MongoClient
from functools import reduce

try: 
    client = MongoClient() 
    print("Connected successfully!!!") 
except:   
    print("Could not connect to MongoDB") 

# database 
db = client.conn
collection = db.contactReg

df = pd.DataFrame(list(collection.find()))
print(df)

df1 = df[df.duplicated(['name'], keep = False)]
print(df1)

# reading the csv file
df2 = pd.read_csv(r'C:\Users\swetha1\Desktop\rules.csv')
print(df2)

df3 = pd.merge(df1,df2,on="source")
print(df3)
print(df3.dtypes)

输出:

Connected successfully!!!
data from mongo
    LastUpdate                       _id    name  nameId  source sourceId
0  10-Oct-2018  5bbc86e5c16a27f1e1bd39f8  swetha   123.0   Blore       10
1  11-Oct-2018  5bbc86e5c16a27f1e1bd39f9  swetha   123.0   Mlore       11
2   9-Oct-2018  5bbc86e5c16a27f1e1bd39fa  swathi   124.0   Mlore       11

fetching duplicates
    LastUpdate                       _id    name  nameId  source sourceId
0  10-Oct-2018  5bbc86e5c16a27f1e1bd39f8  swetha   123.0  Blore       10
1  11-Oct-2018  5bbc86e5c16a27f1e1bd39f9  swetha   123.0  Mlore       11

reading CSV file
   source  P.weight  N.weight  Tolerance(days)  Durability(Days)
0  Blore       100      -100                0                 0
1  Mlore       200      -200               30               365

merging
    LastUpdate                       _id    name  nameId  source sourceId  
P.weight  N.weight  Tolerance(days)  Durability(Days)
0  10-Oct-2018  5bbc86e5c16a27f1e1bd39f8  swetha   123.0  Blore       10       
100      -100                0                 0
1  11-Oct-2018  5bbc86e5c16a27f1e1bd39f9  swetha   123.0  Mlore       11       
200      -200               30               365

首先转换列 to_datetime and then filter by boolean indexing:

df3['LastUpdate'] = pd.to_datetime(df3['LastUpdate'])
df4 = df3[df3['LastUpdate'] == df3['LastUpdate'].max()]

或使用idxmax

df3['LastUpdate'] = pd.to_datetime(df3['LastUpdate'])
df4 = df3.loc[[df3['LastUpdate'].idxmax()]]