将使用 integrate.trapz 的列与日期时间索引集成

Integrate a column using integrate.trapz with datetime index

早上好,

我有以下代码使用梯形法确定列的积分:

import pandas as pd
from scipy import integrate

df = pd.DataFrame()
df['Date'] = ['29/07/2021', '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '29/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021',   '30/07/2021']
df['Time'] = ['06:48:37',   '06:59:37', '07:14:37', '07:27:44', '07:42:44', '07:57:44', '08:04:32', '08:19:32', '09:01:20', '09:19:06', '09:49:06', '10:09:01', '10:23:31', '10:39:18', '10:54:17', '11:09:17', '11:20:01', '11:35:01', '11:50:00', '11:54:14', '12:09:14', '12:22:01', '12:30:15', '12:45:15', '13:00:15', '13:40:15', '13:55:15', '14:10:15', '14:27:15', '14:42:15', '14:57:15', '15:12:15', '15:27:15', '15:42:15', '15:57:15', '16:12:15', '08:12:50', '08:42:50', '08:57:50', '09:12:50', '09:42:50', '09:57:50', '10:12:50', '10:27:50', '10:42:50', '10:57:50', '11:12:50', '11:27:50', '11:42:50', '11:57:50', '12:12:50', '12:27:50', '12:42:50', '12:57:50', '13:12:50', '13:31:48', '13:43:25', '15:15:20', '15:24:44', '15:34:44', '15:39:03', '15:45:28', '15:55:28', '16:05:28', '16:15:28', '16:25:28', '16:35:28', '16:45:28', '16:55:28', '17:05:28', '17:15:28', '17:25:28', '17:35:28', '17:45:28', '17:55:28', '18:05:28', '18:15:28', '18:25:28']
df['Column1'] = [0.01153489116, 0.01345839865,  0.01779293663,  0.0188075811,   0.02593143441,  0.02516351682,  0.02656128256,  0.02774365902,  0.01068687582,  0.0492178287,   0.03830963094,  0.03982806424,  0.01197452205,  0.0452324925,   0.056356989,    0.057672,   0.06444093731,  0.01257135768,  0.0293379174,   0.01347513612,  0.03167956869,  0.03127426809,  0.0561366325,   0.04949798985,  0.0480188952,   0.0357266179,   0.01970254124,  0.01941959216,  0.01782295605,  0.01299120592,  0.0269445306,   0.01212425752,  0.01330537192,  0.00983425672,  0.0101417148,   0.02101192236,  0.01781862992,  0.00758453253,  0.0076804071,   0.00922775574,  0.0073747856,   0.00853069657,  0.03282369543,  0.02961645624,  0.03013929116,  0.010247364,    0.03243998824,  0.01806667814,  0.0325989132,   0.03179977488,  0.03362982444,  0.0094431753,   0.0082718999,   0.0109086495,   0.04043482872,  0.01571583463,  0.0573673107,   0.03165296424,  0.02008226187,  0.01864084944,  0.02020784928,  0.00982873458,  0.00791156214,  0.0123223301,   0.0067242825,   0.00775056588,  0.004625349911, 0.003382658468, 0.0075472771,   0.006104127873, 0.01520061243,  0.00891038148,  0.0069686624,   0.006432309,    0.00254625114,  0.003212563191, 0.00237200964,  0.001625559964]
df['DateTime'] = pd.to_datetime(df['Date']) + pd.to_timedelta(df['Time'])
dp = df.set_index('DateTime')
dp['Column2'] = dp['Column1'].rolling('1D').apply(integrate.trapz)
print(dp['Column2'].head(1000))

它是有效的,但问题是梯形法对于x轴[a,b]上的范围是这样的:

(b - a) * (f(b) + f(a) / 2)

如果我们考虑 [a, b] 范围内的许多点,它将是 (f(b) + f(a) / 2)[= 的 'sum' 32=] 之后我们将其与 (b - a).

相乘

代码正在求和,但他没有乘以索引中的时间差,即日期时间。

你能告诉我为什么它不这样做吗?谢谢。

PS :以前两点为例:(0.01153489116 + 0.01345839865)/2 = 0,012496644905 这正是您将看到的第二个值在打印中,因此它不会乘以索引

中的日期时间差异

您需要将索引作为第二个参数传递给 integrate.trapz 以便它能够计算积分。

dp['Column1'].rolling('1D').apply(lambda x : integrate.trapz(x,x.index))

结果是

|DateTime            |Column1       |
|--------------------|--------------|
|2021-07-29 06:48:37 |  0.000000e+00|
|2021-07-29 06:59:37 |  8.247786e+09|
|2021-07-29 07:14:37 |  2.231089e+10|
|     ...            |      ...     |
|2021-07-30 18:25:28 |  7.380168e+11|

但是我看不到在相隔一天的点之间使用这种积分。如果您对累积积分感兴趣,我建议您使用

df['integral'] = ((df.DateTime-df.DateTime.shift(1))*\
                 ((df.Column1+df.Column1.shift(1))/2))[1:].cumsum()

一开始会产生相同的数据帧,但会不断整合

或者,如果您使用 rolling('1D') 寻找的是分别计算每一天的积分,那么您可以使用

dp['Column1'].resample('d').apply(lambda x : integrate.trapz(x,x.index))

结果是

DateTime
2021-07-29   0 days 00:16:19.051562019
2021-07-30   0 days 00:12:18.016804347
Freq: D, Name: Column1, dtype: timedelta64[ns]