如何根据用户输入(日期范围)提取csv数据

How to extract csv data based on user input (date range)

我是 Python 初学者,我们必须读入 .csv 数据,然后提取日期范围数据(用户输入)。下面的预期输出示例。 如何遍历 reader 并提取日期范围内的行(来自用户输入)?

我想我必须使用 datetime.strptime 将输入的日期和 .csv 文件中的日期列转换为日期对象,但我不确定如何处理 .csv 文件中的日期。然后我必须显示该期间的新感染人数、结束日期的感染总数、受感染人口的百分比和地区名称。未知区域可以忽略并从输出中排除。 .csv 文件包含大约 3 个月的数据。

我在想我可以将用户输入日期内的行附加到一个空列表,然后写入一个 csv 文件吗?我应该只使用基础 python,请 没有 Pandas 解决方案。

我当前的代码:

import csv
from datetime import datetime
#Ask user to input the name of the file they wish to read
file_name = input("Enter the name of the CSV file:\n")
regional_data = open(file_name)
data_reader = csv.reader(regional_data)
cumulativeregional_data = list(data_reader)
#Print 1st and last date for the user before asking for a date range as input
print(f"The first record is for the {cumulativeregional_data[1][0]}\nThe last record is for the {cumulativeregional_data[-1][0]}")
start_date = input("Enter the start date:\n")
startdate_object = datetime.strptime(start_date, "%d/%m/%Y")
end_date = input("Enter the end date:\n")
enddate_object = datetime.strptime(end_date, "%d/%m/%Y")

我们正在从中读取数据的 CSV(脚本中的示例):

date,region,region_id,total_infections, adjusted_total_infections, total_deaths, total_recoveries, current_infections, population, day_no, daily_infections, daily_deaths
01/01/2001, Unknown, U,0,0,0,0,0,0,1,0,0
01/01/2001, East,E,5000,0,20,3800,1180,150000,1,100,7
01/01/2001, North,N,3550,0,25,3150,375,180000,1,80,0
01/01/2001, Central,C,4250,0,38,3200,264,175000,1,120,0
01/01/2001, South,S,5525,0,10,5120,395,185000,1,110,0
01/01/2001, West,W,4150,0,45,3850,255,155000,1,80,0
02/02/2001, Unknown, U,0,0,0,0,0,0,2,0,0
02/02/2001, East,E,5300,0,27,3950,1323,150000,2,300,0
02/02/2001, North,N,3750,0,25,3350,375,180000,2,200,5
02/02/2001, Central,C,4350,0,38,3310,1002,175000,2,100,7
02/02/2001, South,S,5550,0,10,5220,320,185000,2,25,1
02/02/2001, West,W,4500,0,45,4000,455,155000,2,350,0
03/01/2001, Unknown, U,0,0,0,0,0,0,3,0,0
03/01/2001, East,E,5450,0,27,4000,1423,150000,3,150,10
03/01/2001, North,N,3825,0,30,3330,465,180000,3,75,3
03/01/2001, Central,C,4475,0,45,3435,995,175000,3,125,10
03/01/2001, South,S,5705,0,11,5300,394,185000,3,155,0
03/01/2001, West,W,4700,0,45,4200,455,155000,3,200,10
04/01/2001, Unknown, U,0,0,0,0,0,0,4,0,0
04/01/2001, East,E,5520,0,37,4200,1283,150000,4,70,0
04/01/2001, North,N,3910,0,33,3510,367,180000,4,85,0
04/01/2001, Central,C,4710,0,55,3550,1105,175000,4,235,0
04/01/2001, South,S,5710,0,11,5500,199,185000,4,5,0
04/01/2001, West,W,4750,0,55,4350,345,155000,4,50,0

我的预期输出:

Expected output

你可以用这样的东西做你想做的事:

import csv
from datetime import datetime

#Ask user to input the name of the file they wish to read
file_name = input("Enter the name of the CSV file:\n")
with open(file_name) as csvfile: # recommended when dealing with files to properly open and close files (context manager)
    data_in = list(csv.reader(csvfile))

# tell user appropriate date range (first & last date)
print(f"The first record is for the {data_in[1][0]}\nThe last record is for the {data_in[-1][0]}")
# Ask for a date range as input
start_date = datetime.strptime(input("Enter the start date:\n"), "%d/%m/%Y")
end_date = datetime.strptime(input("Enter the end date:\n"), "%d/%m/%Y")

# filter dates

# returns True if date is in between start and end
def filter_func(date):
    dt_date = datetime.strptime(date, "%d/%m/%Y")
    return (start_date <= dt_date) and (dt_date <= end_date)

# filter list to include dates, removing the headers from data_in
filtered_list = [item for item in data_in[1:] if filter_func(item[0])]

# print out data
total_period_infections = 0
print("New infections\tTotal infections\tPopulation\tPercentage\tRegion") # table headers
for item in filtered_list:
    if item[1] == ' Unknown': # filter out the unknown region 
        continue
    
    total_period_infections += int(item[7]) # to use for the last print statement
    print(f"{item[7]}\t{item[3]}\t{item[8]}\t{round(int(item[3]) / int(item[8]), 3)}\t{item[1]}")

print(f"Total new infections for the period: {total_period_infections}")

程序过滤掉 'unknown' 个区域。但是,对于新感染,我不确定如何根据提供的数据计算这些数字。 table 将需要您想要的确切版本的格式,但是数据已被过滤以包含用户输入的日期,然后相应地打印出数据。