减少 SUMIFS 等价物的执行时间

Reducing execution time for SUMIFS equivalent

我正在尝试重现 Excel 中的函数 SUMIFS,它大约是:accumulation1 =SUMIFS(value; $fin$1:$fin$5; ini$1)

公式的作用: 搜索并累加末尾列表中对应一个ini

的值

计算id3和累加1的例子: 搜索或添加值或 endPoint(ini = 11) 即 id 1 和 id 5 (3+5)=8

的值

然后创建一个新的累加列并重新开始相同的计算(我必须这样做 1004 次..)

id ini fin value accumulation1 accumulation2 sumOfAccumulation
1 10 11 5 0 0 5
2 9 10 0 0 0 0
3 11 12 2 8 0 10
4 12 13 1 2 8 11
5 05 11 3 0 0 3

我现在有如下所示的累积代码:

    connection = psycopg2.connect(dbname=DB_NAME,user=DB_USER, password=DB_PWD, host=DB_HOST, port=DB_PORT)
    cursor = connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
    
    data = pdsql.read_sql_query("select id_bdcarth, id_nd_ini::int ini, id_nd_fin::int fin, v from tempturbi.tmp_somme_v19",connection)

Endtest=1
 
#loop until Endtest = 0 : 
    #create a new column accumulation  
    for i in data.ini:
        acc=[]
        acc=data.v.loc[data.fin==i] # get values of the upstream segments
        acc=sum(acc) 
        #save acc in accumulation 

    Endtest=data.sum(accumulation)     
    
    print("--- %s seconds ---" % (time.time() - start_time))

并且在不保存计算结果的情况下,脚本需要 129 秒才能达到 运行,这比 Excel 慢得多。有什么方法可以改进脚本并使其更快?

我想做的是沿着河流网络行走并计算值:

所以我做了一些修改:

loop = [0,1,2]
    #while total != 0:
for total in loop:
    z=z+1
    acc='acc'+str(z)

    # tant que i dans ini
    for i in data.ini:
        v = data.iloc[:,-1:]#get last column
        val = data.v.loc[data.fin==i] 
        val = sum(val) 
        
        #creer colonne et stock valeur
        data[acc] = val
    
    print(data[acc].sum())
    total=total+1
        
print(data)
print("--- %s seconds ---" % (time.time() - start_time))

(不影响执行时间)

再次感谢您澄清您的问题。我想我现在明白了,这种方法与您显示的输出相匹配。如果我误解了,请告诉我,如果这比您的方法更快,请告诉我。我不知道会是

import pandas as pd

#Create the test data
df = pd.DataFrame({
    'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
    'ini': {0: 10, 1: 9, 2: 11, 3: 12, 4: 5},
    'fin': {0: 11, 1: 10, 2: 12, 3: 13, 4: 11},
    'value': {0: 5, 1: 0, 2: 2, 3: 1, 4: 3},
})

#Setup initial values
curr_value_col = 'value'
i = 0
all_value_cols = []

#The groupings stay the same throughout the loops
#so we can just group once and reuse it for speed benefit
gb = df.groupby('fin')

#Loop forever until we break
while True:
    #update the loop number and add to the value col list
    i += 1
    all_value_cols.append(curr_value_col)
    
    #group by fin and sum the value_col values
    fin_cumsum = gb[curr_value_col].sum()
    
    #map the sums to the new column
    next_val_col = 'accumulation{}'.format(i)
    df[next_val_col] = df['ini'].map(fin_cumsum).fillna(0).astype(int)
    
    #If the new column we added sums to 0, then quit
    #(I think this is what you were saying you wanted, but I'm not sure)
    curr_value_col = next_val_col
    if df[curr_value_col].sum() == 0:
        break
        
    
#Get the cumulative sum from the list of columns we've been saving
df['sumOfAccumulation'] = df[all_value_cols].sum(axis=1)
df