使用 Python 比较数据框不同行中的日期
Compare date present in different rows of a data-frame using Python
我正在阅读来自 mongodb 的文档和 csv 文件,并将它们合并以检索重复记录。我有以下代码。现在我想比较这些记录和 return 具有最新日期的行之间的日期 (LastUpdate)。有人可以帮忙吗?
代码:
import json
import pandas as pd
import xlrd
from pymongo import MongoClient
from functools import reduce
try:
client = MongoClient()
print("Connected successfully!!!")
except:
print("Could not connect to MongoDB")
# database
db = client.conn
collection = db.contactReg
df = pd.DataFrame(list(collection.find()))
print(df)
df1 = df[df.duplicated(['name'], keep = False)]
print(df1)
# reading the csv file
df2 = pd.read_csv(r'C:\Users\swetha1\Desktop\rules.csv')
print(df2)
df3 = pd.merge(df1,df2,on="source")
print(df3)
print(df3.dtypes)
输出:
Connected successfully!!!
data from mongo
LastUpdate _id name nameId source sourceId
0 10-Oct-2018 5bbc86e5c16a27f1e1bd39f8 swetha 123.0 Blore 10
1 11-Oct-2018 5bbc86e5c16a27f1e1bd39f9 swetha 123.0 Mlore 11
2 9-Oct-2018 5bbc86e5c16a27f1e1bd39fa swathi 124.0 Mlore 11
fetching duplicates
LastUpdate _id name nameId source sourceId
0 10-Oct-2018 5bbc86e5c16a27f1e1bd39f8 swetha 123.0 Blore 10
1 11-Oct-2018 5bbc86e5c16a27f1e1bd39f9 swetha 123.0 Mlore 11
reading CSV file
source P.weight N.weight Tolerance(days) Durability(Days)
0 Blore 100 -100 0 0
1 Mlore 200 -200 30 365
merging
LastUpdate _id name nameId source sourceId
P.weight N.weight Tolerance(days) Durability(Days)
0 10-Oct-2018 5bbc86e5c16a27f1e1bd39f8 swetha 123.0 Blore 10
100 -100 0 0
1 11-Oct-2018 5bbc86e5c16a27f1e1bd39f9 swetha 123.0 Mlore 11
200 -200 30 365
首先转换列 to_datetime
and then filter by boolean indexing
:
df3['LastUpdate'] = pd.to_datetime(df3['LastUpdate'])
df4 = df3[df3['LastUpdate'] == df3['LastUpdate'].max()]
或使用idxmax
:
df3['LastUpdate'] = pd.to_datetime(df3['LastUpdate'])
df4 = df3.loc[[df3['LastUpdate'].idxmax()]]
我正在阅读来自 mongodb 的文档和 csv 文件,并将它们合并以检索重复记录。我有以下代码。现在我想比较这些记录和 return 具有最新日期的行之间的日期 (LastUpdate)。有人可以帮忙吗?
代码:
import json
import pandas as pd
import xlrd
from pymongo import MongoClient
from functools import reduce
try:
client = MongoClient()
print("Connected successfully!!!")
except:
print("Could not connect to MongoDB")
# database
db = client.conn
collection = db.contactReg
df = pd.DataFrame(list(collection.find()))
print(df)
df1 = df[df.duplicated(['name'], keep = False)]
print(df1)
# reading the csv file
df2 = pd.read_csv(r'C:\Users\swetha1\Desktop\rules.csv')
print(df2)
df3 = pd.merge(df1,df2,on="source")
print(df3)
print(df3.dtypes)
输出:
Connected successfully!!!
data from mongo
LastUpdate _id name nameId source sourceId
0 10-Oct-2018 5bbc86e5c16a27f1e1bd39f8 swetha 123.0 Blore 10
1 11-Oct-2018 5bbc86e5c16a27f1e1bd39f9 swetha 123.0 Mlore 11
2 9-Oct-2018 5bbc86e5c16a27f1e1bd39fa swathi 124.0 Mlore 11
fetching duplicates
LastUpdate _id name nameId source sourceId
0 10-Oct-2018 5bbc86e5c16a27f1e1bd39f8 swetha 123.0 Blore 10
1 11-Oct-2018 5bbc86e5c16a27f1e1bd39f9 swetha 123.0 Mlore 11
reading CSV file
source P.weight N.weight Tolerance(days) Durability(Days)
0 Blore 100 -100 0 0
1 Mlore 200 -200 30 365
merging
LastUpdate _id name nameId source sourceId
P.weight N.weight Tolerance(days) Durability(Days)
0 10-Oct-2018 5bbc86e5c16a27f1e1bd39f8 swetha 123.0 Blore 10
100 -100 0 0
1 11-Oct-2018 5bbc86e5c16a27f1e1bd39f9 swetha 123.0 Mlore 11
200 -200 30 365
首先转换列 to_datetime
and then filter by boolean indexing
:
df3['LastUpdate'] = pd.to_datetime(df3['LastUpdate'])
df4 = df3[df3['LastUpdate'] == df3['LastUpdate'].max()]
或使用idxmax
:
df3['LastUpdate'] = pd.to_datetime(df3['LastUpdate'])
df4 = df3.loc[[df3['LastUpdate'].idxmax()]]