索引中的 pyrouge 元组
pyrouge tuple out of index
我正在尝试使用 pyrouge 来计算自动摘要和黄金标准之间的相似性。当它处理两个摘要时,Rouge 工作正常。但是当它写出结果时,它抱怨说 "tuple index out of range" 有谁知道导致这个问题的原因,我该如何解决?
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Set ROUGE home directory to D:\ComputerScience\Research\ROUGE-1.5.5\ROUGE-1.5.5.
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Writing summaries.
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Processing summaries. Saving system files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\system and model files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\model.
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Processing files in D:\ComputerScience\Research\summary\Grendel\automated.
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Processing automated.txt.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Saved processed files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\system.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing files in D:\ComputerScience\Research\summary\Grendel\manual.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing BookRags.txt.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing GradeSaver.txt.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing GradeSummary.txt.
2017-09-13 23:54:57,557 [MainThread ] [INFO ] Processing Wikipedia.txt.
2017-09-13 23:54:57,562 [MainThread ] [INFO ] Saved processed files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\model.
Traceback (most recent call last):
File "<ipython-input-8-bc227b272111>", line 1, in <module>
runfile('D:/ComputerScience/Research/automate_summary.py', wdir='D:/ComputerScience/Research')
File "C:\Users\zhuan\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 707, in runfile
execfile(filename, namespace)
File "C:\Users\zhuan\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/ComputerScience/Research/automate_summary.py", line 53, in <module>
output = r.convert_and_evaluate()
File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 361, in convert_and_evaluate
rouge_output = self.evaluate(system_id, rouge_args)
File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 331, in evaluate
self.write_config(system_id=system_id)
File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 315, in write_config
self._config_file, system_id)
File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 264, in write_config_static
system_filename_pattern = re.compile(system_filename_pattern)
File "C:\Users\zhuan\Anaconda3\lib\re.py", line 233, in compile
return _compile(pattern, flags)
File "C:\Users\zhuan\Anaconda3\lib\re.py", line 301, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\Users\zhuan\Anaconda3\lib\sre_compile.py", line 562, in compile
p = sre_parse.parse(p, flags)
File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 855, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 616, in _parse
source.tell() - here + len(this))
error: nothing to repeat
黄金标准是 BookRags.txt、GradeSaver.txt、GradeSummary.txt、Wikipedia.txt
需要对比的总结是automated.txt
*.txt 或 [a-z0-9A-Z]+ 不应该工作吗?但是前一个给我"nothing to repeat error",后一个"tuple index out of range"error
r = Rouge155("D:\ComputerScience\Research\ROUGE-1.5.5\ROUGE-1.5.5")
r.system_dir = 'D:\ComputerScience\Research\summary\Grendel\automated'
r.model_dir = 'D:\ComputerScience\Research\summary\Grendel\manual'
r.system_filename_pattern = '[a-z0-9A-Z]+.txt'
r.model_filename_pattern = '[a-z0-9A-Z]+.txt'
output = r.convert_and_evaluate()
print(output)
我正在手动设置这两个目录。看起来 Rouge 包可以处理其中的 txts。
问题在于流氓库从未考虑过找不到与您的正则表达式匹配的情况。流氓源代码 id = match.groups(0)[0]
中的那一行是有问题的。如果您在 documentation 中查找它,它会显示组函数 Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern...
。由于未找到匹配项,因此返回了一个空元组,代码试图从空元组中获取第一项,这导致了错误。
我对 pyrouge 包有同样的问题。出现此问题是因为源代码试图将我们提供的文件名与特定模式匹配,如果失败则返回一个空元组。如果您想了解更多信息,可以查看 Rouge155.py 文件。更具体地说,例如检查函数 __get_model_filenames_for_id()。
我按照 official page 中提到的确切文件名说明解决了这个问题,如下所示:
r.system_filename_pattern = 'some_name.(\d+).txt'
r.model_filename_pattern = 'some_name.[A-Z].#ID#.txt'
所以,我的建议是:
- 为 system_summaries(系统生成)和 model_summaries(人工生成/黄金标准)创建两个单独的目录
- 提供指向这些目录的确切文件路径
- 如果您将一个 system_summary(例如,SystemSummary.1.txt)与一组 model_summaries(例如,ModelSummary.A.1.txt, ModelSummary.B.1.txt, ModelSummary.C.1.txt ), 然后提供以下模式:
r.system_filename_pattern = 'SystemSummary.(\d+).txt'
r.model_filename_pattern = 'ModelSummary.[A-Z].#ID#.txt'
您可以根据要评估的摘要数量扩展此设置。
希望对您有所帮助!祝你好运!
我正在尝试使用 pyrouge 来计算自动摘要和黄金标准之间的相似性。当它处理两个摘要时,Rouge 工作正常。但是当它写出结果时,它抱怨说 "tuple index out of range" 有谁知道导致这个问题的原因,我该如何解决?
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Set ROUGE home directory to D:\ComputerScience\Research\ROUGE-1.5.5\ROUGE-1.5.5.
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Writing summaries.
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Processing summaries. Saving system files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\system and model files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\model.
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Processing files in D:\ComputerScience\Research\summary\Grendel\automated.
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Processing automated.txt.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Saved processed files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\system.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing files in D:\ComputerScience\Research\summary\Grendel\manual.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing BookRags.txt.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing GradeSaver.txt.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing GradeSummary.txt.
2017-09-13 23:54:57,557 [MainThread ] [INFO ] Processing Wikipedia.txt.
2017-09-13 23:54:57,562 [MainThread ] [INFO ] Saved processed files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\model.
Traceback (most recent call last):
File "<ipython-input-8-bc227b272111>", line 1, in <module>
runfile('D:/ComputerScience/Research/automate_summary.py', wdir='D:/ComputerScience/Research')
File "C:\Users\zhuan\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 707, in runfile
execfile(filename, namespace)
File "C:\Users\zhuan\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/ComputerScience/Research/automate_summary.py", line 53, in <module>
output = r.convert_and_evaluate()
File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 361, in convert_and_evaluate
rouge_output = self.evaluate(system_id, rouge_args)
File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 331, in evaluate
self.write_config(system_id=system_id)
File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 315, in write_config
self._config_file, system_id)
File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 264, in write_config_static
system_filename_pattern = re.compile(system_filename_pattern)
File "C:\Users\zhuan\Anaconda3\lib\re.py", line 233, in compile
return _compile(pattern, flags)
File "C:\Users\zhuan\Anaconda3\lib\re.py", line 301, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\Users\zhuan\Anaconda3\lib\sre_compile.py", line 562, in compile
p = sre_parse.parse(p, flags)
File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 855, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 616, in _parse
source.tell() - here + len(this))
error: nothing to repeat
黄金标准是 BookRags.txt、GradeSaver.txt、GradeSummary.txt、Wikipedia.txt
需要对比的总结是automated.txt
*.txt 或 [a-z0-9A-Z]+ 不应该工作吗?但是前一个给我"nothing to repeat error",后一个"tuple index out of range"error
r = Rouge155("D:\ComputerScience\Research\ROUGE-1.5.5\ROUGE-1.5.5")
r.system_dir = 'D:\ComputerScience\Research\summary\Grendel\automated'
r.model_dir = 'D:\ComputerScience\Research\summary\Grendel\manual'
r.system_filename_pattern = '[a-z0-9A-Z]+.txt'
r.model_filename_pattern = '[a-z0-9A-Z]+.txt'
output = r.convert_and_evaluate()
print(output)
我正在手动设置这两个目录。看起来 Rouge 包可以处理其中的 txts。
问题在于流氓库从未考虑过找不到与您的正则表达式匹配的情况。流氓源代码 id = match.groups(0)[0]
中的那一行是有问题的。如果您在 documentation 中查找它,它会显示组函数 Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern...
。由于未找到匹配项,因此返回了一个空元组,代码试图从空元组中获取第一项,这导致了错误。
我对 pyrouge 包有同样的问题。出现此问题是因为源代码试图将我们提供的文件名与特定模式匹配,如果失败则返回一个空元组。如果您想了解更多信息,可以查看 Rouge155.py 文件。更具体地说,例如检查函数 __get_model_filenames_for_id()。
我按照 official page 中提到的确切文件名说明解决了这个问题,如下所示:
r.system_filename_pattern = 'some_name.(\d+).txt'
r.model_filename_pattern = 'some_name.[A-Z].#ID#.txt'
所以,我的建议是:
- 为 system_summaries(系统生成)和 model_summaries(人工生成/黄金标准)创建两个单独的目录
- 提供指向这些目录的确切文件路径
- 如果您将一个 system_summary(例如,SystemSummary.1.txt)与一组 model_summaries(例如,ModelSummary.A.1.txt, ModelSummary.B.1.txt, ModelSummary.C.1.txt ), 然后提供以下模式:
r.system_filename_pattern = 'SystemSummary.(\d+).txt'
r.model_filename_pattern = 'ModelSummary.[A-Z].#ID#.txt'
您可以根据要评估的摘要数量扩展此设置。
希望对您有所帮助!祝你好运!