尽快计算 BLEU 和 Rouge 分数

Calculating BLEU and Rouge score as fast as possible

我有大约 200 个候选句子,对于每个候选句子,我想通过将每个句子与数千个参考句子进行比较来衡量 bleu 分数。这些推荐信对所有候选人都是一样的。以下是我现在的做法:

ref_for_all = [reference] *len(sents)
score = corpus_bleu(ref_for_all, [i.split() for i in sents], weights=(0, 1, 0, 0))

reference 包含我要比较每个句子的整个语料库, sent 是我的句子(候选)。不幸的是,这花费的时间太长,而且考虑到我的代码的实验性质,我不能等那么久才能得到结果。有没有其他方法(例如使用正则表达式)可以更快地获得这些分数?我也有 Rouge 的这个问题,所以任何建议都非常感谢!

在搜索和试验不同的包并测量每个包计算分数所需的时间后,我发现 nltk corpus bleu and PyRouge 最有效的包。请记住,在每条记录中,我有多个假设,这就是为什么我为每条记录计算一次均值,然后 这就是我为 BLEU 所做的:

reference = [[i.split() for i in ref]]

def find_my_bleu(text, w):

   candidates_ = [text.split()]
   return corpus_bleu(reference, candidates_, weights=w, 
                                    smoothing_function=cc.method4)

def get_final_bleu(output_df):

   print('Started calculating the bleu scores...')
   output_df.loc[:, 'bleu_1'] = output_df.loc[:, 'final_predicted_verses'].apply(lambda x:[find_my_bleu(t, (1, 0, 0, 0)) for t in x])
   output_df.loc[:, 'bleu_2'] = output_df.loc[:, 'final_predicted_verses'].apply(lambda x:[find_my_bleu(t, (0, 1, 0, 0)) for t in x])
   output_df.loc[:, 'bleu_3'] = output_df.loc[:, 'final_predicted_verses'].apply(lambda x:[find_my_bleu(t, (0, 0, 1, 0)) for t in x])


   print('Now the average score...')
   output_df.loc[:, 'bleu_3_mean'] = output_df.loc[:, 'bleu_3'].apply(lambda x:np.mean(x))
   output_df.loc[:, 'bleu_2_mean'] = output_df.loc[:, 'bleu_2'].apply(lambda x:np.mean(x))
   output_df.loc[:, 'bleu_1_mean'] = output_df.loc[:, 'bleu_1'].apply(lambda x:np.mean(x))

   print('mean bleu_3 score: ', np.mean(output_df.loc[:, 'bleu_3_mean']))
   print('mean bleu_2 score: ', np.mean(output_df.loc[:, 'bleu_2_mean']))
   print('mean bleu_1 score: ', np.mean(output_df.loc[:, 'bleu_1_mean']))

对于胭脂:

rouge = PyRouge(rouge_n=(1, 2), rouge_l=True, rouge_w=False, rouge_s=False, rouge_s你=假)

def find_my_rouge(text):
    hypotheses = [[text.split()]]
    score = rouge.evaluate_tokenized(hypotheses, [[reference_rouge]])
    return score

然后取所有的平均值:

def get_short_rouge(list_dicts):

    """ get the mean of all generated text for each record"""
    l_r = 0
    l_p = 0
    l_f = 0

    one_r = 0
    one_p  = 0
    one_f  = 0

    two_r  = 0
    two_p  = 0
    two_f  = 0
    
    for d in list_dicts:
        
        
        one_r += d['rouge-1']['r']
        one_p += d['rouge-1']['p']
        one_f += d['rouge-1']['f']


        two_r += d['rouge-2']['r']
        two_p += d['rouge-2']['p']
        two_f += d['rouge-2']['f']
        
        l_r += d['rouge-l']['r']
        l_p += d['rouge-l']['p']
        l_f += d['rouge-l']['f']

    length = len(list_dicts)

    return {'rouge-1': {'r': one_r/length , 'p': one_p/length , 'f': one_f/length},
            'rouge-2': {'r': two_r/length, 'p': two_p/length, 'f': two_f/length},
            'rouge-l': {'r': l_r/length, 'p': l_p/length , 'f': l_f/length}
            }

def get_overal_rouge_mean(output_df):
    print('Started getting the overall rouge of each record...')
    output_df.loc[:, 'rouge_mean'] = output_df.loc[:, 'rouge'].apply(lambda x: get_short_rouge(x))
    print('Started getting the overall rouge of all record...')
    l_r = 0
    l_p = 0
    l_f = 0

    one_r = 0
    one_p  = 0
    one_f  = 0

    two_r  = 0
    two_p  = 0
    two_f  = 0

    for i in range(len(output_df)):
        d = output_df.loc[i, 'rouge_mean']
        
        one_r += d['rouge-1']['r']
        one_p += d['rouge-1']['p']
        one_f += d['rouge-1']['f']


        two_r += d['rouge-2']['r']
        two_p += d['rouge-2']['p']
        two_f += d['rouge-2']['f']
        
        l_r += d['rouge-l']['r']
        l_p += d['rouge-l']['p']
        l_f += d['rouge-l']['f']

    length = len(output_df)
    print('overall rouge scores: ')
    print({'rouge-1': {'r': one_r/length , 'p': one_p/length , 'f': one_f/length},
                'rouge-2': {'r': two_r/length, 'p': two_p/length, 'f': two_f/length},
                'rouge-l': {'r': l_r/length, 'p': l_p/length , 'f': l_f/length}
                })
    return output_df

希望对遇到此问题的人有所帮助。