如何计算多个日期的值的差异?
How to calculate the difference of a value for multiple dates?
我的数据集包含多个储气值。我想将它们中的每一个与一年前确切日期的值进行比较,持续多年。这是我的数据的样子:
facility
gasDayStartedOn
gasInStorage
full
injection
UGS Haidach
2022-01-09
4.3041
37
0.00
UGS Haidach
2022-01-08
4.3263
38
0.00
UGS Haidach
2021-01-09
5.5678
43
0.00
我如何 calculate/compare gasInStorgae
每年 gasDayStartedOn
相同的设施并将其存储在同一 DataFrame
的新列中?
我写了这个code
:
def det_dates(df, a_date):
b_df = df[df.gasDayStartedOn == a_date - pd.Timedelta(days=365)]
if b_df.shape[0] != 0:
return b_df.full.values[0]
return None
def get_dif(df):
for i, r in df.iterrows():
a_date = r.gasDayStartedOn
a_gasInStorage = r.gasInStorage
b_gasInStorage = det_dates(df, a_date)
if b_gasInStorage:
dif_gasInStorage = a_gasInStorage - gasInStorage
else:
dif_gasInStorage = None
df.loc[i, 'difdif'] = dif_gasInStorage
dfs = []
for com_fac, group in tqdm(data_1.groupby(['company', 'facility'])):
g = group.copy()
g.sort_values('gasDayStartedOn', inplace=True, ascending=False)
get_dif(g)
dfs.append(g)
但是它不起作用!请帮助!这是我得到的错误:
from datetime import datetime, timedelta
如果您能提供预期的输出,您将获得更好的答案。但是在同一天检查一年与下一年之间差异的一种简单方法是使用 groupby
和 diff
.
import pandas as pd
df = pd.read_clipboard()
df['gasDayStartedOn'] = pd.to_datetime(df.gasDayStartedOn)
df = df.sort_values(by='gasDayStartedOn', ascending=True)
group = df.groupby([df.gasDayStartedOn.dt.day, df.gasDayStartedOn.dt.month, 'facility'])
df['diff'] = group['gasInStorage'].diff()
df
Out[1]:
facility gasDayStartedOn gasInStorage full injection diff
2 UGS Haidach 2021-01-09 5.5678 43 0.0 NaN
1 UGS Haidach 2022-01-08 4.3263 38 0.0 NaN
0 UGS Haidach 2022-01-09 4.3041 37 0.0 -1.2637
我的数据集包含多个储气值。我想将它们中的每一个与一年前确切日期的值进行比较,持续多年。这是我的数据的样子:
facility | gasDayStartedOn | gasInStorage | full | injection |
---|---|---|---|---|
UGS Haidach | 2022-01-09 | 4.3041 | 37 | 0.00 |
UGS Haidach | 2022-01-08 | 4.3263 | 38 | 0.00 |
UGS Haidach | 2021-01-09 | 5.5678 | 43 | 0.00 |
我如何 calculate/compare gasInStorgae
每年 gasDayStartedOn
相同的设施并将其存储在同一 DataFrame
的新列中?
我写了这个code
:
def det_dates(df, a_date):
b_df = df[df.gasDayStartedOn == a_date - pd.Timedelta(days=365)]
if b_df.shape[0] != 0:
return b_df.full.values[0]
return None
def get_dif(df):
for i, r in df.iterrows():
a_date = r.gasDayStartedOn
a_gasInStorage = r.gasInStorage
b_gasInStorage = det_dates(df, a_date)
if b_gasInStorage:
dif_gasInStorage = a_gasInStorage - gasInStorage
else:
dif_gasInStorage = None
df.loc[i, 'difdif'] = dif_gasInStorage
dfs = []
for com_fac, group in tqdm(data_1.groupby(['company', 'facility'])):
g = group.copy()
g.sort_values('gasDayStartedOn', inplace=True, ascending=False)
get_dif(g)
dfs.append(g)
但是它不起作用!请帮助!这是我得到的错误:
from datetime import datetime, timedelta
如果您能提供预期的输出,您将获得更好的答案。但是在同一天检查一年与下一年之间差异的一种简单方法是使用 groupby
和 diff
.
import pandas as pd
df = pd.read_clipboard()
df['gasDayStartedOn'] = pd.to_datetime(df.gasDayStartedOn)
df = df.sort_values(by='gasDayStartedOn', ascending=True)
group = df.groupby([df.gasDayStartedOn.dt.day, df.gasDayStartedOn.dt.month, 'facility'])
df['diff'] = group['gasInStorage'].diff()
df
Out[1]:
facility gasDayStartedOn gasInStorage full injection diff
2 UGS Haidach 2021-01-09 5.5678 43 0.0 NaN
1 UGS Haidach 2022-01-08 4.3263 38 0.0 NaN
0 UGS Haidach 2022-01-09 4.3041 37 0.0 -1.2637