python 根据日期字典检查重复项
python dictionary check for duplicates based on date
所以我支持循环访问一个目录,我正在读取一些 JSON 个文件
在这些文件上,我解析出 4 个键,然后用所有解析出的数据创建一个 CSV 文件
碰巧我有重复条目,所以我想根据日期(较新)删除重复项,然后重写? CSV 不确定如何实现它
例如:
def mdy_to_ymd(d):
# convert the date into comparable string
cor_date = datetime.strptime(d, '%b %d %Y').strftime('%d/%m/%Y')
return time.strptime(cor_date, "%d/%m/%Y")
def date_converter(date): # convert the date to readable string for csv
return datetime.strptime(date, '%b %d %Y').strftime('%d/%m/%Y')
def csv_generator(path): # creating the csv
list_json = []
ffresult = []
duplicate_dict = {}
for file in os.listdir(path): # iterating through the directory with the files
fresult = []
with open(f"{directory}/{file}", "r") as result: # opening the json file
templates = json.load(result)
hostname_str = file.split(".")
site_code_str = (f"{file[:5]}")
datetime_str3 = (mdy_to_ymd(datetime_str2)) # converting the date to comparable data
duplicate_dict[hostname_str[0]] = datetime_str3
"""?? i am creating a
dictionary which as key has the hostname and as date has the date
but it doesnt work since when there is the same hostname it only updates the current key and there are
not duplicates but it doesnt guarantee there are only the newest based on date"""
fresult.append(site_code_str)
fresult.append(hostname_str[0])
fresult.append((templates["execution_status"]))
fresult.append(date_converter(datetime_str2))
fresult.append(templates["protocol_name"])
fresult.append(templates["protocol_version"])
ffresult.append(fresult)
# i append the values i need into 2 lists
with open("jsondicts.csv", "w") as dst:
writetoit = csv.writer(dst)
writetoit.writerows(csv_generator(directory))
# this is how i write to csv so right now i have duplicate values on the csv
我只想拥有基于主机名的唯一值,但也只想拥有最新的
基于日期的唯一的当然还有其他解析出的数据(协议名称、站点代码等)
这解决了问题我不得不使用 pandas 库
result_pan_xls = (result_pan.sort_values(by="Execution_Date").drop_duplicates(subset="HOSTNAME",keep="last"))
所以我支持循环访问一个目录,我正在读取一些 JSON 个文件 在这些文件上,我解析出 4 个键,然后用所有解析出的数据创建一个 CSV 文件
碰巧我有重复条目,所以我想根据日期(较新)删除重复项,然后重写? CSV 不确定如何实现它
例如:
def mdy_to_ymd(d):
# convert the date into comparable string
cor_date = datetime.strptime(d, '%b %d %Y').strftime('%d/%m/%Y')
return time.strptime(cor_date, "%d/%m/%Y")
def date_converter(date): # convert the date to readable string for csv
return datetime.strptime(date, '%b %d %Y').strftime('%d/%m/%Y')
def csv_generator(path): # creating the csv
list_json = []
ffresult = []
duplicate_dict = {}
for file in os.listdir(path): # iterating through the directory with the files
fresult = []
with open(f"{directory}/{file}", "r") as result: # opening the json file
templates = json.load(result)
hostname_str = file.split(".")
site_code_str = (f"{file[:5]}")
datetime_str3 = (mdy_to_ymd(datetime_str2)) # converting the date to comparable data
duplicate_dict[hostname_str[0]] = datetime_str3
"""?? i am creating a
dictionary which as key has the hostname and as date has the date
but it doesnt work since when there is the same hostname it only updates the current key and there are
not duplicates but it doesnt guarantee there are only the newest based on date"""
fresult.append(site_code_str)
fresult.append(hostname_str[0])
fresult.append((templates["execution_status"]))
fresult.append(date_converter(datetime_str2))
fresult.append(templates["protocol_name"])
fresult.append(templates["protocol_version"])
ffresult.append(fresult)
# i append the values i need into 2 lists
with open("jsondicts.csv", "w") as dst:
writetoit = csv.writer(dst)
writetoit.writerows(csv_generator(directory))
# this is how i write to csv so right now i have duplicate values on the csv
我只想拥有基于主机名的唯一值,但也只想拥有最新的 基于日期的唯一的当然还有其他解析出的数据(协议名称、站点代码等)
这解决了问题我不得不使用 pandas 库
result_pan_xls = (result_pan.sort_values(by="Execution_Date").drop_duplicates(subset="HOSTNAME",keep="last"))