如何比较两个带有时间戳的数据帧并创建数据帧字典? Python Pandas
How to compare two dataframes with timestamps and create a dictionary of dataframes ? Python Pandas
我想检索包含在数据帧 1 的时间戳中的数据帧 2 的每一行。数据帧 1 的时间戳成对出现(11 是开始,12 是结束)。
目标是创建一个数据框字典来绘制每条曲线。
df1
Timestamp Num
2021-01-01 08:00:00 11
2021-01-01 09:00:00 12
2021-01-01 10:00:00 11
2021-01-01 11:00:00 12
2021-01-01 12:00:00 11
2021-01-01 13:00:00 12
和
df2 Value
2021-01-01 07:30:00 66
2021-01-01 08:30:00 67
2021-01-01 08:45:00 67
2021-01-01 09:15:00 64
2021-01-01 10:30:00 65
2021-01-01 10:30:00 61
2021-01-01 10:45:00 68
2021-01-01 11:15:00 60
2021-01-01 12:30:00 66
2021-01-01 12:30:00 67
2021-01-01 12:45:00 67
我想做一个掩码,但它只适用于数据帧 1 的一对时间戳。我有这个想法,但我不能把它写在 python:
start = df1.iloc[::2, :]["Timestamp"] #to have each 11
end = df2.iloc[1::2, :]["Timestamp"] ##to have each 12
for each (start, end) in df2 :
create a df
dict = dict.append(df)
最后的结果一定是:
Dict:
final_df1 Value
2021-01-01 08:30:00 67
2021-01-01 08:45:00 67
final_df2 Value
2021-01-01 10:30:00 65
2021-01-01 10:30:00 61
2021-01-01 10:45:00 68
final_df3 Value
2021-01-01 12:30:00 66
2021-01-01 12:30:00 67
2021-01-01 12:45:00 67
我试过了:
df_12 = df_1.iloc[1::2, :]["Timestamp"]
df_11 = df_1.iloc[::2, :]["Timestamp"]
df_12 = pd.DataFrame(df_12)
df_11 = pd.DataFrame(df_11)
for row in df_11.iterrows():
for row in df_12.iterrows():
mask_Filtered = (df_2['Timestamp']>= df_11) & (df_2['Timestamp'<=df_12) <-----
df_2 = df_2.loc[mask_Filtered]
dict.append(df_2)
MemoryError: Unable to allocate 46.0 GiB for an array with shape (1090629, 5657) and data type float64
是您要找的吗?
intervals = list(zip(df1[::2]['Timestamp'], df1[1::2]['Timestamp']))
bins = pd.IntervalIndex.from_tuples(intervals)
groups = df2.groupby(pd.cut(df2['Timestamp'], bins=bins))
dfs = {}
for idx, (_, df) in enumerate(groups, 1):
dfs[f"final_df{idx}"] = df
# or process individually here to avoid MemoryError
输出:
>>> dfs
{'final_df1':
Timestamp Num
1 2021-01-01 08:30:00 67
2 2021-01-01 08:45:00 67,
'final_df2':
Timestamp Num
4 2021-01-01 10:30:00 65
5 2021-01-01 10:30:00 61
6 2021-01-01 10:45:00 68,
'final_df3':
Timestamp Num
8 2021-01-01 12:30:00 66
9 2021-01-01 12:30:00 67
10 2021-01-01 12:45:00 67}
我想检索包含在数据帧 1 的时间戳中的数据帧 2 的每一行。数据帧 1 的时间戳成对出现(11 是开始,12 是结束)。
目标是创建一个数据框字典来绘制每条曲线。
df1
Timestamp Num
2021-01-01 08:00:00 11
2021-01-01 09:00:00 12
2021-01-01 10:00:00 11
2021-01-01 11:00:00 12
2021-01-01 12:00:00 11
2021-01-01 13:00:00 12
和
df2 Value
2021-01-01 07:30:00 66
2021-01-01 08:30:00 67
2021-01-01 08:45:00 67
2021-01-01 09:15:00 64
2021-01-01 10:30:00 65
2021-01-01 10:30:00 61
2021-01-01 10:45:00 68
2021-01-01 11:15:00 60
2021-01-01 12:30:00 66
2021-01-01 12:30:00 67
2021-01-01 12:45:00 67
我想做一个掩码,但它只适用于数据帧 1 的一对时间戳。我有这个想法,但我不能把它写在 python:
start = df1.iloc[::2, :]["Timestamp"] #to have each 11
end = df2.iloc[1::2, :]["Timestamp"] ##to have each 12
for each (start, end) in df2 :
create a df
dict = dict.append(df)
最后的结果一定是:
Dict:
final_df1 Value
2021-01-01 08:30:00 67
2021-01-01 08:45:00 67
final_df2 Value
2021-01-01 10:30:00 65
2021-01-01 10:30:00 61
2021-01-01 10:45:00 68
final_df3 Value
2021-01-01 12:30:00 66
2021-01-01 12:30:00 67
2021-01-01 12:45:00 67
我试过了:
df_12 = df_1.iloc[1::2, :]["Timestamp"]
df_11 = df_1.iloc[::2, :]["Timestamp"]
df_12 = pd.DataFrame(df_12)
df_11 = pd.DataFrame(df_11)
for row in df_11.iterrows():
for row in df_12.iterrows():
mask_Filtered = (df_2['Timestamp']>= df_11) & (df_2['Timestamp'<=df_12) <-----
df_2 = df_2.loc[mask_Filtered]
dict.append(df_2)
MemoryError: Unable to allocate 46.0 GiB for an array with shape (1090629, 5657) and data type float64
是您要找的吗?
intervals = list(zip(df1[::2]['Timestamp'], df1[1::2]['Timestamp']))
bins = pd.IntervalIndex.from_tuples(intervals)
groups = df2.groupby(pd.cut(df2['Timestamp'], bins=bins))
dfs = {}
for idx, (_, df) in enumerate(groups, 1):
dfs[f"final_df{idx}"] = df
# or process individually here to avoid MemoryError
输出:
>>> dfs
{'final_df1':
Timestamp Num
1 2021-01-01 08:30:00 67
2 2021-01-01 08:45:00 67,
'final_df2':
Timestamp Num
4 2021-01-01 10:30:00 65
5 2021-01-01 10:30:00 61
6 2021-01-01 10:45:00 68,
'final_df3':
Timestamp Num
8 2021-01-01 12:30:00 66
9 2021-01-01 12:30:00 67
10 2021-01-01 12:45:00 67}