ai-form-recognizer 与 cognitiveservices-computervision

ai-form-recognizer vs. cognitiveservices-computervision

目前正在使用 @azure/ai-form-recognizer 3.2.0 从图像和 PDF 中进行 OCR,例如:

const poller = await MsClient.beginRecognizeInvoices(stream, 
            {
                onProgress: (state) => {}
            });
const [ocrResult] = await poller.pollUntilDone();

@azure/cognitiveservices-computervision的区别或关系是什么?我只对 OCR 感兴趣。

两者之间有几个主要区别。表单识别器的主要目标是从表单和其他数字化文档中构建数据以供进一步处理。这里的关键是表单识别器提供的功能可以帮助更好地将从所述文档中读取的信息上下文化,而不仅仅是 stand-alone 光学字符识别。来自 Form Recognizer documentation(强调我的):

Azure Form Recognizer is a cloud-based Azure Applied AI Service that uses machine-learning models to extract and analyze form fields, text, and tables from your documents. Form Recognizer analyzes your forms and documents, extracts text and data, maps field relationships as key-value pairs, and returns a structured JSON output. You quickly get accurate results that are tailored to your specific content without excessive manual intervention or extensive data science expertise. Use Form Recognizer to automate your data processing in applications and workflows, enhance data-driven strategies, and enrich document search capabilities.

另一方面,Azure Computer Vision 提供三个不同的功能。虽然下面的 OCR 原则描述了与表单识别器 相似 的内容,但它的使用更多 general-purpose 因为它没有提供表单识别器那样强大的 key/value 对上下文化做。该服务还提供 higher-level AI 功能,用于处理图像和视频以识别 people/celebrities、地标和其中的常见对象(以及其他)。来自 Computer Vision documentation:

Service Description
Optical Character Recognition (OCR) The Optical Character Recognition (OCR) service extracts text from images. You can use the new Read API to extract printed and handwritten text from photos and documents. It uses deep-learning-based models and works with text on a variety of surfaces and backgrounds. These include business documents, invoices, receipts, posters, business cards, letters, and whiteboards. The OCR APIs support extracting printed text in several languages...
Image Analysis The Image Analysis service extracts many visual features from images, such as objects, faces, adult content, and auto-generated text descriptions. Follow the Image Analysis quickstart to get started.
Spatial Analysis The Spatial Analysis service analyzes the presence and movement of people on a video feed and produces events that other systems can respond to. Install the Spatial Analysis container to get started.

乍一看,两者之间有一些重叠,但进一步检查后,可以清楚地描述两者的主要用例。