聚合多列时间序列数据

Question

                     rand_val  new_val           copy_time
2020-10-15 00:00:00         7       26 2020-10-15 00:00:00
2020-10-15 00:00:10         8       29 2020-10-15 00:00:10
2020-10-15 00:00:20         1       53 2020-10-15 00:00:20
2020-10-15 00:03:50         6       69 2020-10-15 00:03:50
2020-10-15 00:04:00         3       19 2020-10-15 00:04:00

我正在使用方法 resample 对时间序列进行下采样。我发现在对聚合数据应用函数时无法调用特定列。

假设我想执行一些涉及调用列名的操作：

df.resample("1min").apply(lambda x: sum(x.rand_val) if len(x)>1 else 0)

我收到一个错误：

AttributeError: 'Series' object has no attribute 'rand_val'

如果我对其他一些变量进行了分组，这将是可能的。我猜重采样函数不一样。有什么想法吗？

Answer 1

使用 on=copy_time，我得到了以下输出。

a = df.resample('1min',on='copy_time').apply(lambda x: sum(x.rand_val) if len(x)>1 else 0)
print (a)

resample 正在寻找必须具有类似日期时间索引的对象。在你的例子中，我没有看到。传递 copy_time 将使该数据时间序列进行处理。

             org_time  rand_val  new_val           copy_time
0 2020-10-15 00:00:00         7       26 2020-10-15 00:00:00
1 2020-10-15 00:00:10         8       29 2020-10-15 00:00:10
2 2020-10-15 00:00:20         1       53 2020-10-15 00:00:20
3 2020-10-15 00:03:50         6       69 2020-10-15 00:03:50
4 2020-10-15 00:04:00         3       19 2020-10-15 00:04:00


copy_time
2020-10-15 00:00:00    16
2020-10-15 00:01:00     0
2020-10-15 00:02:00     0
2020-10-15 00:03:00     0
2020-10-15 00:04:00     0
Freq: T, dtype: int64

Answer 2

这是个好问题！当我们处理 groupby 某些列时，每个数据块都被视为一个 pandas DataFrame。因此，我们可以像往常一样访问列。但是，在 resample 的情况下，它是一个系列。

仅获取 rand_val 的一种方法是直接传递该系列，如下所示：

df.resample("1min")['rand_val'].apply(lambda x: sum(x) if len(x)>1 else 0)

我假设您的索引采用日期时间格式。否则请使用pd.to_datetime转换如下：

df.index=pd.to_datetime(df.index)

聚合多列时间序列数据

Aggregation of time-series data on multiple columns

python

time-series

downsampling

pandas

pandas-groupby