如何创建不包含早于 5 分钟的信息的文件？

Question

我正在使用 Twitter Streaming api 来获取实时推文数据。

我可以将该数据打印到控制台。但我想要的是将数据保存到一个文件中，并且该数据不应超过 5 分钟。

如何像对日志文件那样连续滚动包含过去 5 分钟数据的文件。

同时文件应该可以读取。

在 Python 中有什么方法可以做到这一点吗？

我还没有遇到过这样的事情，我们可以提到文件可以保存特定数据的持续时间。

Answer 1

将数据保存在一个文件中，包含实际时间并检查实际时间是否相差 5 分钟。使用时间。或者使用睡眠功能，每 5 分钟清除一次旧数据。

Answer 2

这应该可以帮助您入门。只要它的年龄超过 300 秒，它就会从列表中删除推文。

import time;
TIME_LIMIT = 300; # 5 mins in seconds


# Get the tweet text and the time stamp
def GetTweet():
    #...
    # Wait until there is a new tweet from Donald :)
    text = 'your function that gets the text goes here';

    # Get the time stamp
    time_ = time.time();
    return text, time_


# Kill all the old data that is older than X seconds
def KillOldTweet(tweets):
    time_now = time.time();  # Capture the current time

    # For every tweet stored remove the old one
    for i, tweet in enumerate(tweets):
        time_diff = time_now - tweet[1]; # get the time difference
        if time_diff > TIME_LIMIT: # if older than X secods, kill that tweet from the list
            tweets.pop(i);
            pass;      
        pass;

    return tweets; # return the list

# Updates the file with the list of text of tweets
def UpdateFile(tweets):
    with open('output.txt', 'w') as file:  # open the file and close it after writing
        texts = [ i for i, j in tweets ]; # unzip;
        out_text = str(texts); # convert to string
        print(out_text); #
        file.write(out_text); # overwrite to file
        pass;
    pass;


tweets = [];
while(1):
    # Get the new tweet and append to the list
    text, time_ = GetTweet(); # Wait until a new tweet arrived
    tweet = (text, time_); # zip it into a tuple
    tweets.append(tweet); # append it to list


    # Kill the old tweets from the list
    tweets = KillOldTweet(tweets)

    # Update the file with the fresh tweets
    UpdateFile(tweets);

    # Sleep for 1 second.
    time.sleep(1);

    pass;

但我建议您使用套接字模块而不是写入文本文件。或者使用 pickle 模块从文件中轻松解压内容

Answer 3

搜索 "python logrotate" 发现：https://docs.python.org/3.7/library/logging.handlers.html#logging.handlers.TimedRotatingFileHandler。可能值得一试。

如果您更愿意自己编写代码，管理内存中的推文（例如使用 collections.deque，这样很容易弹出旧推文并添加新推文），然后将其刷新到文件中一次while...,（或使用套接字、泡菜、简单的变量名将此数据传递给您的分析函数，正如其他答案中已经提到的那样）。

如何创建不包含早于 5 分钟的信息的文件？

How do I create a file that doesn't contain information older than 5 minutes?

python

file

text-files

tweepy

twitter-streaming-api