我正在尝试将一个小数据框合并到另一个大数据框,循环遍历小数据框

I'm trying to merge a small dataframe to another large one, looping through the small dataframes

我能够打印小数据框并看到它正在正确生成,我使用下面的代码编写了它。然而,我的最终结果只包含最终合并的结果,而不是传递每个结果并合并它们。

MIK_Quantiles 是第一个较大的数据帧,df2_t 是 while 循环中生成的较小数据帧。数据帧都正确生成并且合并有效,但我只剩下最后一次合并的结果。我希望它将当前 df2_t 与上一个循环的已合并结果 (df_merged) 合并。我希望这是有道理的!

i = 0
while i < df_length - 1:   


    cur_bound = MIK_Quantiles['bound'].iloc[i]
    cur_percentile = MIK_Quantiles['percentile'].iloc[i]
    cur_bin_low = MIK_Quantiles['auppm'].iloc[i]
    cur_bin_high = MIK_Quantiles['auppm'].iloc[i+1]

    ### Grades/Counts within bin, along with min and max
    df2 = df_orig['auppm'].loc[(df_orig['bound'] == cur_bound) & (df_orig['auppm'] >= cur_bin_low) & (df_orig['auppm'] < cur_bin_high)].describe()

    ### Add fields of interest to the output of describe for later merging together
    df2['bound'] = cur_bound
    df2['percentile'] = cur_percentile
    df2['bin_name'] = 'bin name'
    df2['bin_lower'] = cur_bin_low
    df2['bin_upper'] = cur_bin_high
    df2['temp_merger'] =  str(int(df2['bound'])) + '_' + str(df2['percentile'])

    # Write results of describe to a CSV file and transpose columns to rows
    df2.to_csv('df2.csv')
    df2_t = pd.read_csv('df2.csv').T
    df2_t.columns = ['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max', 'bound', 'percentile', 'bin_name', 'bin_lower', 'bin_upper', 'temp_merger']

    # Merge the results of the describe on the selected data with the table of quantile values to produce a final output    
    df_merged = MIK_Quantiles.merge(df2_t, how = 'inner', on = ['temp_merger'])
    pd.merge(df_merged, df2_t)
    print(df_merged)


i = i + 1

除了递增 i.

,你的循环没有做任何有意义的事情

你合并了 2 个(静态)dfs(MIK_Quantilesdf2_t),你做了 df_length 次。每次执行此操作时(首先,i-th,以及循环的最后一次迭代),都会覆盖输出变量 df_merged.

要在输出中保留在前一个循环迭代中创建的任何内容,您需要连接所有创建的 df2_t:

  1. df2 = pd.concat([df2, df2_t]) 到 'append' 新创建的数据 df2_t 到输出数据帧 df2 在循环的每次迭代期间,所以最后所有数据都将是包含在 df2

然后,循环之后,merge那一个进入MIK_Quantiles

  1. pd.merge(MIK_Quantiles, df2)(不是 df2_t (!))以合并上一个输出
df2 = pd.DataFrame([]) # initialize your output
for i in range(0, df_length):
    df2_t = ...       # read your .csv files
    df2 = pd.concat([df2, df2_t])
 df2 = ...      # do vector operations on df2 (process all of the df2_t at once)
 out = pd.merge(MIK_Quantiles, df2)