python，将函数应用于由每个特征列上的 id 和时间戳索引的数据帧

Question

大家好，我有一个包含 5 列的数据框：

ID（整数）|时间（整数） |湿度 |温度 |压力

ID=房间
TIME = unixtimestamp 秒数
humidity/temperature/pressure = 传感器值

我需要什么....

我想按 ID 在 humidity/temperature/pressure 上执行过滤器 (signal.lfilter)...例如...

ID = 1
在按 TIME asc
排序的湿度值执行 lfilter 在按 TIME asc
排序的温度值执行 lfilter 以 TIME asc

排序的压力值执行 lfilter

ID = 2
在按 TIME asc
排序的湿度值执行 lfilter 在按 TIME asc
排序的温度值执行 lfilter 以 TIME asc

排序的压力值执行 lfilter

...

对于 ID = n
在按 TIME asc
排序的湿度值执行 lfilter 在按 TIME asc
排序的温度值执行 lfilter 以 TIME asc

排序的压力值执行 lfilter

我怎么能这么快？今天我使用 2 个 for 循环：

for i in df.id.unique():
    for column in ['humidity','temperature','pressure']:
        df[df.id=i][column] = ... lfilter ...

但是太慢了，有什么帮助吗？

Answer 1

不是很干净，但请尝试以下操作。这是您使用 signal.lfilter 函数执行的操作吗？

编辑：糟糕，忘了时间要求了。在下面的操作之前运行 df.sort_values(['ID', 'TIME'], ascending=True) 应该可以解决问题。

import pandas as pd
from scipy import signal
import numpy as np

np.random.seed(1618)

df = pd.DataFrame({'ID': [1,1,1,2,2,2], 
                   'humidity': np.random.random(6), 
                   'temperature': np.random.random(6), 
                   'pressure': np.random.random(6)})

#  >>> df
#     ID  humidity  pressure  temperature
#  0   1  0.605160  0.194984     0.450019
#  1   1  0.301108  0.077726     0.691227
#  2   1  0.197976  0.144978     0.155231
#  3   2  0.733884  0.458959     0.785704
#  4   2  0.457377  0.647681     0.092045
#  5   2  0.021497  0.417326     0.551941

tmp = df.groupby('ID').apply(lambda x: signal.lfilter(x['humidity'], x['pressure'], x['temperature']))
# this produces a vector for each ID.
# we have to unstack the vectors and append them to the original df

df['filtered']  = tmp.apply(lambda x: pd.Series(x)).stack().reset_index()[0]

# >>> df
#    ID  humidity  pressure  temperature  filtered
# 0   1  0.605160  0.194984     0.450019  1.396696
# 1   1  0.301108  0.077726     0.691227  2.283506
# 2   1  0.197976  0.144978     0.155231  0.057383
# 3   2  0.733884  0.458959     0.785704  1.256354
# 4   2  0.457377  0.647681     0.092045 -0.842783
# 5   2  0.021497  0.417326     0.551941  1.058038

python，将函数应用于由每个特征列上的 id 和时间戳索引的数据帧

python, apply function to dataframe indexed by id and timestamp on each feature column

python

filter

apply

multidimensional-array

pandas