在自定义数据上训练 EAST 文本检测器

Question

如何在我的自定义数据上训练 EAST text detector。没有任何在线博客显示执行相同操作的分步过程。我目前拥有的。

我有一个文件夹，其中包含所有图像和每个图像对应的 xml 文件，这些文件告诉我们文本所在的位置。

示例：

<annotation>
    <folder>Dataset</folder>
    <filename>FFDDAPMDD1.png</filename>
    <path>C:\Users\HPO2KOR\Desktop\Work\venv\Patent\Dataset\Dataset\FFDDAPMDD1.png</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>839</width>
        <height>1000</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>522</xmin>
            <ymin>29</ymin>
            <xmax>536</xmax>
            <ymax>52</ymax>
        </bndbox>
    </object>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>510</xmin>
            <ymin>258</ymin>
            <xmax>521</xmax>
            <ymax>281</ymax>
        </bndbox>
    </object>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>546</xmin>
            <ymin>528</ymin>
            <xmax>581</xmax>
            <ymax>555</ymax>
        </bndbox>
    </object>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>523</xmin>
            <ymin>646</ymin>
            <xmax>555</xmax>
            <ymax>674</ymax>
        </bndbox>
    </object>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>410</xmin>
            <ymin>748</ymin>
            <xmax>447</xmax>
            <ymax>776</ymax>
        </bndbox>
    </object>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>536</xmin>
            <ymin>826</ymin>
            <xmax>567</xmax>
            <ymax>851</ymax>
        </bndbox>
    </object>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>792</xmin>
            <ymin>918</ymin>
            <xmax>838</xmax>
            <ymax>945</ymax>
        </bndbox>
    </object>
</annotation>

我还为我的每一张图像提供了用于训练 YOLO 模型的格式的已解析 xml 文件。

示例

C:\Users\HPO2KOR\...\text\FFDDAPMDD1.png 522,29,536,52,0 510,258,521,281,0 546,528,581,555,0 523,646,555,674,0 410,748,447,776,0 536,826,567,851,0 792,918,838,945,0 660,918,706,943,0 63,1,108,24,0 65,51,110,77,0 65,101,109,126,0 63,151,110,175,0 63,202,109,228,0 63,252,110,276,0 63,303,110,330,0 62,353,110,381,0 65,405,109,434,0 90,457,110,482,0 59,505,101,534,0 64,565,107,590,0 61,616,107,644,0 62,670,103,694,0 62,725,104,753,0 63,778,104,804,0 62,831,100,857,0 87,887,106,912,0 98,919,144,943,0 240,916,284,943,0 378,915,420,943,0 520,918,565,942,0
C:\Users\HPO2KOR\...\text\FFDDAPMDD2.png 91,145,109,171,0 68,192,106,218,0 92,239,111,265,0 69,286,108,311,0 92,333,107,357,0 66,379,110,405,0 90,424,111,451,0 69,472,107,497,0 91,518,109,545,0 66,564,109,590,0 90,613,110,637,0 121,644,140,670,0 279,643,322,671,0 446,645,490,668,0 615,642,661,669,0 786,643,831,667,0 954,643,997,672,0 820,22,866,50,0 823,73,866,103,0
C:\Users\HPO2KOR\...\text\FFDDAPMDD3.png 648,1,698,30,0 68,64,129,91,0 55,144,128,168,0 70,218,129,247,0 56,300,127,326,0 71,377,125,404,0 58,459,127,482,0 109,535,130,560,0 140,568,160,594,0 344,568,382,594,0 563,566,581,591,0 760,568,800,593,0 982,569,1000,591,0

为我的自定义数据集训练此 EAST 文本检测器的过程是什么。我在 windows.

Answer 1

根据自述文件中的文档，自定义训练 EAST 的 keras 实现需要一个图像文件夹，每个图像都有一个名为 gt_IMAGENAME.txt 的附带文本文件。（将 IMAGENAME 替换为它映射到的图像的名称。）

在每个文本文件中，"the ground truth is given as separate text files (one per image) where each line specifies the coordinates of one word's bounding box and its transcription in a comma separated format." 此引文来自 https://rrc.cvc.uab.es/?ch=4&com=tasks, which is linked in the read me to the tensorflow implementation of EAST at https://github.com/argman/EAST。边界框表示为四个角的坐标。

您似乎拥有以正确格式构建训练数据所需的所有信息。可能有一个工具可以转换所有内容，但是一个快速的 python 脚本也可以正常工作。像...

循环所有 xml 个文件
为每个 xml 文件，创建一个根据文档要求命名的文本文件
使用BeautifulSoup解析xml
使用find_all获取所有object个标签
使用xmin、xmax、ymin和ymax值来表示所有角的x,y坐标。（左上角是 xmin、ymax；右上角是 xmax、ymax；等等）基于 https://github.com/argman/EAST/blob/master/training_samples/img_1.txt 的顺序似乎是 lower-left, lower-right, upper-right, upper-left
对于每个对象标记，按照以下格式在文本文件中写入一个新行： x1, y1, x2, y2, x3, y3, x4, y4, transcription 或 x1, y1, x2, y2, x3, y3, x4, y4, ###（后跟 \n 换行符）
运行 python train.py 使用所有命令行参数设置 "execution example" 的设置方式，但将 --training_data_path= 之后的值更改为您的路径

在自定义数据上训练 EAST 文本检测器

Train EAST text detector on custom data

python

machine-learning

image-processing

deep-learning

east-text-detector