将使用 integrate.trapz 的列与日期时间索引集成
Integrate a column using integrate.trapz with datetime index
早上好,
我有以下代码使用梯形法确定列的积分:
import pandas as pd
from scipy import integrate
df = pd.DataFrame()
df['Date'] = ['29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021']
df['Time'] = ['06:48:37', '06:59:37', '07:14:37', '07:27:44', '07:42:44', '07:57:44', '08:04:32', '08:19:32', '09:01:20', '09:19:06', '09:49:06', '10:09:01', '10:23:31', '10:39:18', '10:54:17', '11:09:17', '11:20:01', '11:35:01', '11:50:00', '11:54:14', '12:09:14', '12:22:01', '12:30:15', '12:45:15', '13:00:15', '13:40:15', '13:55:15', '14:10:15', '14:27:15', '14:42:15', '14:57:15', '15:12:15', '15:27:15', '15:42:15', '15:57:15', '16:12:15', '08:12:50', '08:42:50', '08:57:50', '09:12:50', '09:42:50', '09:57:50', '10:12:50', '10:27:50', '10:42:50', '10:57:50', '11:12:50', '11:27:50', '11:42:50', '11:57:50', '12:12:50', '12:27:50', '12:42:50', '12:57:50', '13:12:50', '13:31:48', '13:43:25', '15:15:20', '15:24:44', '15:34:44', '15:39:03', '15:45:28', '15:55:28', '16:05:28', '16:15:28', '16:25:28', '16:35:28', '16:45:28', '16:55:28', '17:05:28', '17:15:28', '17:25:28', '17:35:28', '17:45:28', '17:55:28', '18:05:28', '18:15:28', '18:25:28']
df['Column1'] = [0.01153489116, 0.01345839865, 0.01779293663, 0.0188075811, 0.02593143441, 0.02516351682, 0.02656128256, 0.02774365902, 0.01068687582, 0.0492178287, 0.03830963094, 0.03982806424, 0.01197452205, 0.0452324925, 0.056356989, 0.057672, 0.06444093731, 0.01257135768, 0.0293379174, 0.01347513612, 0.03167956869, 0.03127426809, 0.0561366325, 0.04949798985, 0.0480188952, 0.0357266179, 0.01970254124, 0.01941959216, 0.01782295605, 0.01299120592, 0.0269445306, 0.01212425752, 0.01330537192, 0.00983425672, 0.0101417148, 0.02101192236, 0.01781862992, 0.00758453253, 0.0076804071, 0.00922775574, 0.0073747856, 0.00853069657, 0.03282369543, 0.02961645624, 0.03013929116, 0.010247364, 0.03243998824, 0.01806667814, 0.0325989132, 0.03179977488, 0.03362982444, 0.0094431753, 0.0082718999, 0.0109086495, 0.04043482872, 0.01571583463, 0.0573673107, 0.03165296424, 0.02008226187, 0.01864084944, 0.02020784928, 0.00982873458, 0.00791156214, 0.0123223301, 0.0067242825, 0.00775056588, 0.004625349911, 0.003382658468, 0.0075472771, 0.006104127873, 0.01520061243, 0.00891038148, 0.0069686624, 0.006432309, 0.00254625114, 0.003212563191, 0.00237200964, 0.001625559964]
df['DateTime'] = pd.to_datetime(df['Date']) + pd.to_timedelta(df['Time'])
dp = df.set_index('DateTime')
dp['Column2'] = dp['Column1'].rolling('1D').apply(integrate.trapz)
print(dp['Column2'].head(1000))
它是有效的,但问题是梯形法对于x轴[a,b]上的范围是这样的:
(b - a) * (f(b) + f(a) / 2)
如果我们考虑 [a, b] 范围内的许多点,它将是 (f(b) + f(a) / 2)[= 的 'sum' 32=] 之后我们将其与 (b - a).
相乘
代码正在求和,但他没有乘以索引中的时间差,即日期时间。
你能告诉我为什么它不这样做吗?谢谢。
PS :以前两点为例:(0.01153489116 + 0.01345839865)/2 = 0,012496644905 这正是您将看到的第二个值在打印中,因此它不会乘以索引
中的日期时间差异
您需要将索引作为第二个参数传递给 integrate.trapz
以便它能够计算积分。
dp['Column1'].rolling('1D').apply(lambda x : integrate.trapz(x,x.index))
结果是
|DateTime |Column1 |
|--------------------|--------------|
|2021-07-29 06:48:37 | 0.000000e+00|
|2021-07-29 06:59:37 | 8.247786e+09|
|2021-07-29 07:14:37 | 2.231089e+10|
| ... | ... |
|2021-07-30 18:25:28 | 7.380168e+11|
但是我看不到在相隔一天的点之间使用这种积分。如果您对累积积分感兴趣,我建议您使用
df['integral'] = ((df.DateTime-df.DateTime.shift(1))*\
((df.Column1+df.Column1.shift(1))/2))[1:].cumsum()
一开始会产生相同的数据帧,但会不断整合
或者,如果您使用 rolling('1D')
寻找的是分别计算每一天的积分,那么您可以使用
dp['Column1'].resample('d').apply(lambda x : integrate.trapz(x,x.index))
结果是
DateTime
2021-07-29 0 days 00:16:19.051562019
2021-07-30 0 days 00:12:18.016804347
Freq: D, Name: Column1, dtype: timedelta64[ns]
早上好,
我有以下代码使用梯形法确定列的积分:
import pandas as pd
from scipy import integrate
df = pd.DataFrame()
df['Date'] = ['29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '29/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021', '30/07/2021']
df['Time'] = ['06:48:37', '06:59:37', '07:14:37', '07:27:44', '07:42:44', '07:57:44', '08:04:32', '08:19:32', '09:01:20', '09:19:06', '09:49:06', '10:09:01', '10:23:31', '10:39:18', '10:54:17', '11:09:17', '11:20:01', '11:35:01', '11:50:00', '11:54:14', '12:09:14', '12:22:01', '12:30:15', '12:45:15', '13:00:15', '13:40:15', '13:55:15', '14:10:15', '14:27:15', '14:42:15', '14:57:15', '15:12:15', '15:27:15', '15:42:15', '15:57:15', '16:12:15', '08:12:50', '08:42:50', '08:57:50', '09:12:50', '09:42:50', '09:57:50', '10:12:50', '10:27:50', '10:42:50', '10:57:50', '11:12:50', '11:27:50', '11:42:50', '11:57:50', '12:12:50', '12:27:50', '12:42:50', '12:57:50', '13:12:50', '13:31:48', '13:43:25', '15:15:20', '15:24:44', '15:34:44', '15:39:03', '15:45:28', '15:55:28', '16:05:28', '16:15:28', '16:25:28', '16:35:28', '16:45:28', '16:55:28', '17:05:28', '17:15:28', '17:25:28', '17:35:28', '17:45:28', '17:55:28', '18:05:28', '18:15:28', '18:25:28']
df['Column1'] = [0.01153489116, 0.01345839865, 0.01779293663, 0.0188075811, 0.02593143441, 0.02516351682, 0.02656128256, 0.02774365902, 0.01068687582, 0.0492178287, 0.03830963094, 0.03982806424, 0.01197452205, 0.0452324925, 0.056356989, 0.057672, 0.06444093731, 0.01257135768, 0.0293379174, 0.01347513612, 0.03167956869, 0.03127426809, 0.0561366325, 0.04949798985, 0.0480188952, 0.0357266179, 0.01970254124, 0.01941959216, 0.01782295605, 0.01299120592, 0.0269445306, 0.01212425752, 0.01330537192, 0.00983425672, 0.0101417148, 0.02101192236, 0.01781862992, 0.00758453253, 0.0076804071, 0.00922775574, 0.0073747856, 0.00853069657, 0.03282369543, 0.02961645624, 0.03013929116, 0.010247364, 0.03243998824, 0.01806667814, 0.0325989132, 0.03179977488, 0.03362982444, 0.0094431753, 0.0082718999, 0.0109086495, 0.04043482872, 0.01571583463, 0.0573673107, 0.03165296424, 0.02008226187, 0.01864084944, 0.02020784928, 0.00982873458, 0.00791156214, 0.0123223301, 0.0067242825, 0.00775056588, 0.004625349911, 0.003382658468, 0.0075472771, 0.006104127873, 0.01520061243, 0.00891038148, 0.0069686624, 0.006432309, 0.00254625114, 0.003212563191, 0.00237200964, 0.001625559964]
df['DateTime'] = pd.to_datetime(df['Date']) + pd.to_timedelta(df['Time'])
dp = df.set_index('DateTime')
dp['Column2'] = dp['Column1'].rolling('1D').apply(integrate.trapz)
print(dp['Column2'].head(1000))
它是有效的,但问题是梯形法对于x轴[a,b]上的范围是这样的:
(b - a) * (f(b) + f(a) / 2)
如果我们考虑 [a, b] 范围内的许多点,它将是 (f(b) + f(a) / 2)[= 的 'sum' 32=] 之后我们将其与 (b - a).
相乘代码正在求和,但他没有乘以索引中的时间差,即日期时间。
你能告诉我为什么它不这样做吗?谢谢。
PS :以前两点为例:(0.01153489116 + 0.01345839865)/2 = 0,012496644905 这正是您将看到的第二个值在打印中,因此它不会乘以索引
中的日期时间差异您需要将索引作为第二个参数传递给 integrate.trapz
以便它能够计算积分。
dp['Column1'].rolling('1D').apply(lambda x : integrate.trapz(x,x.index))
结果是
|DateTime |Column1 |
|--------------------|--------------|
|2021-07-29 06:48:37 | 0.000000e+00|
|2021-07-29 06:59:37 | 8.247786e+09|
|2021-07-29 07:14:37 | 2.231089e+10|
| ... | ... |
|2021-07-30 18:25:28 | 7.380168e+11|
但是我看不到在相隔一天的点之间使用这种积分。如果您对累积积分感兴趣,我建议您使用
df['integral'] = ((df.DateTime-df.DateTime.shift(1))*\
((df.Column1+df.Column1.shift(1))/2))[1:].cumsum()
一开始会产生相同的数据帧,但会不断整合
或者,如果您使用 rolling('1D')
寻找的是分别计算每一天的积分,那么您可以使用
dp['Column1'].resample('d').apply(lambda x : integrate.trapz(x,x.index))
结果是
DateTime
2021-07-29 0 days 00:16:19.051562019
2021-07-30 0 days 00:12:18.016804347
Freq: D, Name: Column1, dtype: timedelta64[ns]