python 根据日期字典检查重复项

python dictionary check for duplicates based on date

所以我支持循环访问一个目录,我正在读取一些 JSON 个文件 在这些文件上,我解析出 4 个键,然后用所有解析出的数据创建一个 CSV 文件

碰巧我有重复条目,所以我想根据日期(较新)删除重复项,然后重写? CSV 不确定如何实现它

例如:

def mdy_to_ymd(d):
    # convert the date into comparable string
    cor_date = datetime.strptime(d, '%b %d %Y').strftime('%d/%m/%Y')
    return time.strptime(cor_date, "%d/%m/%Y")


def date_converter(date):  # convert the date to readable string for csv
    return datetime.strptime(date, '%b %d %Y').strftime('%d/%m/%Y')


def csv_generator(path):  # creating the csv
    list_json = []
    ffresult = []
    duplicate_dict = {}
    for file in os.listdir(path):  # iterating through the directory with the files
        fresult = []
        with open(f"{directory}/{file}", "r") as result:  # opening the json file
            templates = json.load(result)
            hostname_str = file.split(".")
            site_code_str = (f"{file[:5]}")
            datetime_str3 = (mdy_to_ymd(datetime_str2))  # converting the date to comparable data
            duplicate_dict[hostname_str[0]] = datetime_str3
            """?? i am creating a 
            dictionary which as key has the hostname and as date has the date 
            but it doesnt work since when there is the same hostname it only updates the current key and there are 
            not duplicates but it doesnt guarantee there are only the newest based on date"""
            fresult.append(site_code_str)
            fresult.append(hostname_str[0])
            fresult.append((templates["execution_status"]))
            fresult.append(date_converter(datetime_str2))
            fresult.append(templates["protocol_name"])
            fresult.append(templates["protocol_version"])
            ffresult.append(fresult)


# i append the values i need into 2 lists
with open("jsondicts.csv", "w") as dst:
    writetoit = csv.writer(dst)
    writetoit.writerows(csv_generator(directory))
# this is how i write to csv so right now i have duplicate values on the csv

我只想拥有基于主机名的唯一值,但也只想拥有最新的 基于日期的唯一的当然还有其他解析出的数据(协议名称、站点代码等)

这解决了问题我不得不使用 pandas 库

result_pan_xls = (result_pan.sort_values(by="Execution_Date").drop_duplicates(subset="HOSTNAME",keep="last"))