Python 根据日期从 CSV 中删除行

Question

我正在使用 python 收集温度数据，但只想存储最近 24 小时的数据。

我目前正在用这个生成我的 .csv 文件

while True:
    tempC = mcp.temperature
    tempF = tempC * 9 / 5 + 32
    timestamp = datetime.datetime.now().strftime("%y-%m-%d %H:%M   ")

    f = open("24hr.csv", "a")
    f.write(timestamp)
    f.write(',{}'.format(tempF))
    f.write("\n")
    f.close()

.csv 看起来像这样

输出的 .csv 如下所示

18-12-13 10:58   ,44.7125
18-12-13 11:03   ,44.6
18-12-13 11:08   ,44.6
18-12-13 11:13   ,44.4875
18-12-13 11:18   ,44.6
18-12-13 11:23   ,44.4875
18-12-13 11:28   ,44.7125

不想翻车，只保留最近24小时的数据。由于我每 5 分钟采样一次数据，24 小时后我的 CSV 中应该有 144 行。所以如果我使用 readlines() 我可以知道我有多少行但是我如何摆脱任何超过 24 小时的行？这是我想出的，显然行不通。建议？

f = open("24hr.csv","r")
lines = f.readlines()
f.close()

if lines => 144:
   f = open("24hr.csv","w")
   for line in lines:
       if line <= "timestamp"+","+"tempF"+\n":
           f.write(line)
           f.close()

Answer 1

您已经完成了大部分工作。我有几个建议。

使用with。这意味着如果程序中途出现错误并引发异常，文件将正确关闭。
解析文件中的时间戳并将其与当前时间进行比较。
使用len检查list的长度。

修改后的程序如下：

import datetime

with open("24hr.csv","r") as f:
    lines = f.readlines()  # read out the contents of the file

if len(lines) >= 144:
   yesterday = datetime.datetime.now() - datetime.timedelta(days=1)
   with open("24hr.csv","w") as f:
       for line in lines:
           line_time_string = line.split(",")[0]
           line_time = datetime.datetime.strptime(line_time_string, "%y-%m-%d %H:%M   ")

           if line_time > yesterday:  # if the line's time is after yesterday
               f.write(line)  # write it back into the file

此代码不是很干净（不符合 PEP-8），但您可以看到一般过程。

Answer 2

你在使用 linux 吗？如果你只需要最后 144 行，你可以尝试

tail -n 144 file.csv

你也可以找到 windows 的尾巴，我用 CMDer 找到了一个。如果您必须使用 python 并且您有适合 RAM 的小文件，请使用 readlines() 将其加载到列表中，将其剪切 (lst = lst[:144]) 并重写。如果你不知道你有多少行 - 用 https://docs.python.org/3.7/library/csv.html 解析它，将时间解析为 python 日期时间（它类似于你写时间原始）并按条件写行

Answer 3

鉴于 288 行不会占用太多内存，我认为只需读取行、截断文件并放回所需的行就可以了：

# Unless you are working in a system with limited memory
# reading 288 lines isn't much
def remove_old_entries(file_):
    file_.seek(0)  # Just in case go to start
    lines = file_.readlines()[-288:]  # Read the last 288 lines
    file_.truncate(0)  # Empty the file
    file_.writelines(lines)  # Put back just the desired lines

    return _file

while True:
    tempC = mcp.temperature
    tempF = tempC * 9 / 5 + 32
    timestamp = datetime.datetime.now().strftime("%y-%m-%d %H:%M   ")

    with open("24hr.csv", "r+") as file_:
        file_ = remove_old_entries(file_)  # Consider that the function will return the file at the end
        file_.write('{},{}\n'.format(timestamp, tempF))

    # I hope mcp.temperature is blocking or you are sleeping out the 5min
    # else this file reading in an infinite loop will get out of hand
    # time.sleep(300)  # Call me maybe

Answer 4

如果您在 Linux 或喜欢，正确的方法是实施 logrotaion

Python 根据日期从 CSV 中删除行

Python delete line from CSV based on date

python

timestamp

readlines