有没有办法使用 OCR 从 CAD 技术图纸中提取特定数据?

Is there a way to use OCR to extract specific data from a CAD technical drawing?

我正在尝试使用 OCR 仅提取 CAD 模型的基本尺寸,但还有其他我不需要的关联尺寸(例如角度、从基线到孔的长度等)。 Here is an example of a technical drawing. (The numbers in red circles are the base dimensions, the rest in purple highlights are the ones to ignore.) 我如何告诉我的程序只提取基本尺寸(块在通过 CNC 之前的高度、长度和宽度)?

问题是我得到的图纸不是特定格式,所以我无法告诉 OCR 尺寸在哪里。它必须根据上下文自行解决。

我应该通过 运行 几次迭代并通过机器学习来训练程序并纠正它吗?如果可以,有哪些方法?我唯一能想到的就是 Opencv 级联分类器。 或者还有其他方法可以解决这个问题吗? 很抱歉 post。谢谢

我觉得你...这是一个非常棘手的问题,过去 3 年我们一直在寻找解决方案。原谅我提到自己的解决方案,但它一定会解决你的问题:pip install werk24


from werk24 import Hook, W24AskVariantMeasures
from werk24.models.techread import W24TechreadMessage
from werk24.utils import w24_read_sync
    
from . import get_drawing_bytes # define your own
    
    
def recv_measures(message: W24TechreadMessage) -> None:
    for cur_measure in message.payload_dict.get('measures'):
        print(cur_measure)
    
if __name__ == "__main__":
    # define what information you want to receive from the API
    # and what shall be done when the info is available.
    hooks = [Hook(ask=W24AskVariantMeasures(), function=recv_measures)]
    
    # submit the request to the Werk24 API
    w24_read_sync(get_drawing_bytes(), hooks)

在您的示例中,它将 return 例如以下度量

    {
        "position": <STRIPPED>
        "label": {
            "blurb": "ø30 H7 +0.0210/0",
            "quantity": 1,
            "size": {
                "blurb": "30",
                "size_type":" "DIAMETER",
                "nominal_size": "30.0",
            },
            "unit": "MILLIMETER",
            "size_tolerance": {
                "toleration_type": "FIT_SIZE_ISO",
                "blurb": "H7",
                "deviation_lower": "0.0",
                "deviation_upper": "0.0210",
                "fundamental_deviation": "H",
                "tolerance_grade": {
                    "grade":7,
                    "warnings":[]
                },
            "thread": null,
            "chamfer": null,
            "depth":null,
            "test_dimension": null,
         },
         "warnings": [],
         "confidence": 0.98810
    }

或 GD&T

{
    "position": <STRIPPED>,
    "frame": {
        "blurb": "[⟂|0.05|A]",
        "characteristic": "⟂",
        "zone_shape": null,
        "zone_value": {
            "blurb": "0.05",
            "width_min": 0.05,
            "width_max": null,
            "extend_quantity": null,
            "extend_shape": null,
            "extend": null,
            "extend_angle": null
        },
        "zone_combinations": [],
        "zone_offset": null,
        "zone_constraint": null,
        "feature_filter": null,
        "feature_associated": null,
        "feature_derived": null,
        "reference_association": null,
        "reference_parameter": null,
        "material_condition": null,
        "state": null,
        "data": [
            {
                "blurb": "A"
            }
         ]
    }
}

查看 Werk24 上的文档了解详细信息。