使用 java 通过标记提取数据

data extraction by tagging using java

我有一个要求,我需要根据用户输入(标签)收集文本文件(非结构化数据),我需要在所有文件中搜索标签词。如果找到,我需要 return 出现搜索词的段落。

例如:spec.txt 文件具有以下内容

The ABX earphones with Bluetooth support have been rolled into the Indian market for a price of Rs 5490. They’re available in two color choices of black and red, and come with a rechargeable battery which can be juiced up via the supplied micro-USB cable.

The ABX is said to be capable of rendering up to 10.5 hours of playback once fully charged. It also features an integrated microphone that lets you attend to voice calls. The earphones come with digital noise cancellation technology and a Bluetooth receiver/connector.

在上面的 2 段中,如果用户输入标签,"price" 它应该 return "price = Rs 5490" 或者它应该 return 它识别术语的段落"price"

我已经检查了 UIMA 和 lucene,但不知道该怎么做,谁能帮帮我..

提前致谢

感谢您的回复...是的,我找到了解决方案,我正在使用 solr 荧光笔,通过调整 solr 响应返回的片段的片段大小,我们可以获得搜索词所在的段落