适合实时分析的 Python 数据结构？

Question

社区，

Objective: 我是运行一个 Pi 项目（即 Python），它与 Arduino 通信以从称重传感器每秒一次。我应该使用什么数据结构来记录（并进行实时分析）Python中的这些数据？

我希望能够做如下事情：

切片数据以获得最后记录的数据点的值。
对数据进行切片以获得最后 n 秒的数据点的平均值。
对最后n个数据点进行回归得到g/s。
从日志中删除早于 n 秒的数据点。

当前尝试次数：

字典：我已经将一个带有舍入时间的新键附加到字典中（见下文），但这使得切片和分析变得困难。

log = {}

def log_data():
    log[round(time.time(), 4)] = read_data()

Pandas DataFrame：这是我想要的，因为它使时间序列切片和分析变得容易，但是这个 (How to handle incoming real time data with python pandas ) 似乎说这是个坏主意。我无法遵循他们的解决方案（即存储在字典中，并且每隔几秒 df.append()-ing 批量）因为我希望我的速率计算（回归）是实时的。

这个问题(ECG Data Analysis on a real-time signal in Python)好像和我遇到的问题一样，但是没有真正的解决办法。

目标：

那么在Python中处理和分析实时时间序列数据的正确方法是什么？这似乎是每个人都需要做的事情，所以我想必须为此预先构建功能？

谢谢，

迈克尔

Answer 1

首先，我会质疑两个假设：

您在 post 中提到数据每秒传入一次。如果可以依赖它，则根本不需要时间戳 - 查找最后 N 个数据点与查找最后 N 秒的数据点完全相同。
您有一个限制，即您的汇总数据需要绝对 100% 实时。这可能会使生活变得更加复杂 - 是否有可能完全放松？

无论如何，这是一种使用列表的非常幼稚的方法。它满足您的需求。性能可能会成为一个问题，具体取决于您需要存储多少以前的数据点。

另外，您可能没有想到，您需要完整记录过去的数据吗？或者你可以直接丢东西吗？

data = []

new_observation = (timestamp, value)

# new data comes in
data.append(new_observation)


# Slice the data to get the value of the last logged datapoint.
data[-1]

# Slice the data to get the mean of the datapoints for the last n seconds.
mean(map(lambda x: x[1], filter(lambda o: current_time - o[0] < n, data)))

# Perform a regression on the last n data points to get g/s.
regression_function(data[-n:])

# Remove from the log data points older than n seconds.
data = filter(lambda o: current_time - o[0] < n, data)

适合实时分析的 Python 数据结构？

Proper Python data structure for real-time analysis?

dictionary

real-time

time-series

python-2.7

pandas