我正在尝试将一个小数据框合并到另一个大数据框,循环遍历小数据框
I'm trying to merge a small dataframe to another large one, looping through the small dataframes
我能够打印小数据框并看到它正在正确生成,我使用下面的代码编写了它。然而,我的最终结果只包含最终合并的结果,而不是传递每个结果并合并它们。
MIK_Quantiles 是第一个较大的数据帧,df2_t 是 while 循环中生成的较小数据帧。数据帧都正确生成并且合并有效,但我只剩下最后一次合并的结果。我希望它将当前 df2_t 与上一个循环的已合并结果 (df_merged) 合并。我希望这是有道理的!
i = 0
while i < df_length - 1:
cur_bound = MIK_Quantiles['bound'].iloc[i]
cur_percentile = MIK_Quantiles['percentile'].iloc[i]
cur_bin_low = MIK_Quantiles['auppm'].iloc[i]
cur_bin_high = MIK_Quantiles['auppm'].iloc[i+1]
### Grades/Counts within bin, along with min and max
df2 = df_orig['auppm'].loc[(df_orig['bound'] == cur_bound) & (df_orig['auppm'] >= cur_bin_low) & (df_orig['auppm'] < cur_bin_high)].describe()
### Add fields of interest to the output of describe for later merging together
df2['bound'] = cur_bound
df2['percentile'] = cur_percentile
df2['bin_name'] = 'bin name'
df2['bin_lower'] = cur_bin_low
df2['bin_upper'] = cur_bin_high
df2['temp_merger'] = str(int(df2['bound'])) + '_' + str(df2['percentile'])
# Write results of describe to a CSV file and transpose columns to rows
df2.to_csv('df2.csv')
df2_t = pd.read_csv('df2.csv').T
df2_t.columns = ['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max', 'bound', 'percentile', 'bin_name', 'bin_lower', 'bin_upper', 'temp_merger']
# Merge the results of the describe on the selected data with the table of quantile values to produce a final output
df_merged = MIK_Quantiles.merge(df2_t, how = 'inner', on = ['temp_merger'])
pd.merge(df_merged, df2_t)
print(df_merged)
i = i + 1
除了递增 i
.
,你的循环没有做任何有意义的事情
你合并了 2 个(静态)dfs(MIK_Quantiles
和 df2_t
),你做了 df_length
次。每次执行此操作时(首先,i-th,以及循环的最后一次迭代),都会覆盖输出变量 df_merged
.
要在输出中保留在前一个循环迭代中创建的任何内容,您需要连接所有创建的 df2_t
:
df2 = pd.concat([df2, df2_t])
到 'append' 新创建的数据 df2_t
到输出数据帧 df2
在循环的每次迭代期间,所以最后所有数据都将是包含在 df2
然后,在循环之后,merge
那一个进入MIK_Quantiles
pd.merge(MIK_Quantiles, df2)
(不是 df2_t
(!))以合并上一个输出
df2 = pd.DataFrame([]) # initialize your output
for i in range(0, df_length):
df2_t = ... # read your .csv files
df2 = pd.concat([df2, df2_t])
df2 = ... # do vector operations on df2 (process all of the df2_t at once)
out = pd.merge(MIK_Quantiles, df2)
我能够打印小数据框并看到它正在正确生成,我使用下面的代码编写了它。然而,我的最终结果只包含最终合并的结果,而不是传递每个结果并合并它们。
MIK_Quantiles 是第一个较大的数据帧,df2_t 是 while 循环中生成的较小数据帧。数据帧都正确生成并且合并有效,但我只剩下最后一次合并的结果。我希望它将当前 df2_t 与上一个循环的已合并结果 (df_merged) 合并。我希望这是有道理的!
i = 0
while i < df_length - 1:
cur_bound = MIK_Quantiles['bound'].iloc[i]
cur_percentile = MIK_Quantiles['percentile'].iloc[i]
cur_bin_low = MIK_Quantiles['auppm'].iloc[i]
cur_bin_high = MIK_Quantiles['auppm'].iloc[i+1]
### Grades/Counts within bin, along with min and max
df2 = df_orig['auppm'].loc[(df_orig['bound'] == cur_bound) & (df_orig['auppm'] >= cur_bin_low) & (df_orig['auppm'] < cur_bin_high)].describe()
### Add fields of interest to the output of describe for later merging together
df2['bound'] = cur_bound
df2['percentile'] = cur_percentile
df2['bin_name'] = 'bin name'
df2['bin_lower'] = cur_bin_low
df2['bin_upper'] = cur_bin_high
df2['temp_merger'] = str(int(df2['bound'])) + '_' + str(df2['percentile'])
# Write results of describe to a CSV file and transpose columns to rows
df2.to_csv('df2.csv')
df2_t = pd.read_csv('df2.csv').T
df2_t.columns = ['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max', 'bound', 'percentile', 'bin_name', 'bin_lower', 'bin_upper', 'temp_merger']
# Merge the results of the describe on the selected data with the table of quantile values to produce a final output
df_merged = MIK_Quantiles.merge(df2_t, how = 'inner', on = ['temp_merger'])
pd.merge(df_merged, df2_t)
print(df_merged)
i = i + 1
除了递增 i
.
你合并了 2 个(静态)dfs(MIK_Quantiles
和 df2_t
),你做了 df_length
次。每次执行此操作时(首先,i-th,以及循环的最后一次迭代),都会覆盖输出变量 df_merged
.
要在输出中保留在前一个循环迭代中创建的任何内容,您需要连接所有创建的 df2_t
:
df2 = pd.concat([df2, df2_t])
到 'append' 新创建的数据df2_t
到输出数据帧df2
在循环的每次迭代期间,所以最后所有数据都将是包含在df2
然后,在循环之后,merge
那一个进入MIK_Quantiles
pd.merge(MIK_Quantiles, df2)
(不是df2_t
(!))以合并上一个输出
df2 = pd.DataFrame([]) # initialize your output
for i in range(0, df_length):
df2_t = ... # read your .csv files
df2 = pd.concat([df2, df2_t])
df2 = ... # do vector operations on df2 (process all of the df2_t at once)
out = pd.merge(MIK_Quantiles, df2)