将特定格式的文本文件中的 x 和 y 坐标获取到 python 中的有序字典中

Question

我正在尝试读取特定格式的文本文件并从中提取坐标并将它们存储在有序字典中。文本文件中的一组包含标题行，后跟 x 和 y 坐标。 x、y 坐标始终以 . 开头，后跟 \t（制表符）。一个文本文件包含多个这样的集合。我的想法是将每个集合的 x 和 y 提取到一个列表中，并将其附加到一个有序的字典中。基本上，最后，它将是一个列表的列表，列表的数量等于将附加到有序字典的集合的数量。

文本文件的外观说明：

Freehand    green   2   2   0,0 289618  .   
.   104326.2,38323.8    104309.6,38307.2    104286.3,38287.3    104269.6,38270.6    104256.3,38254.0
.   104239.7,38237.4    104223.0,38220.7    104209.7,38204.1    104193.1,38194.1    104176.4,38187.5

Freehand    green   2   3   0,0 63980   .   
.   99803.4,37296.2 99826.7,37306.2 99843.3,37312.8 99860.0,37316.2 99876.6,37322.8

我的代码：

from collections import OrderedDict
import re

dict_roi = OrderedDict([
                ("title", []),
                ("X", []),
                ("Y", []) ])

with open(elements_file,"r") as f:

    try:
        # pattern to match to get coordinates
        pattern = re.compile(".\t\d+.*")

        # loop through lines and find title line and line with coordinates

        for i, line in enumerate(f):
            # get title line
            if line.startswith('Freehand'):
                dict_roi['title'].append(line) 

                # initiate empty list per set
                XX = []  
                YY = []

            # line with coordinates starts with .\t
            # if pattern matches and line starts with .\t, get the coordinates
            for match in re.finditer(pattern, line):
                if line.startswith('.\t'):
                    nln = "{}".format(line[2:].strip())
                    val = nln.split('{:6.1f}')

                    # data-massaging to get to the coordinates
                    for v in val:
                        coordinates_list = v.split("\t") 
                        for c in coordinates_list:
                            x, y = c.split(',')
                            print(x, y)
                            XX.append(float(x))
                            YY.append(float(y))

                        # this should append one list per set
                        dict_roi['X'].append(XX)
                        dict_roi['Y'].append(YY)


    except ValueError:
        print("Exiting")

    print(dict_roi)

理想情况下，我想要一个有序的字典，它会给我这样的东西：

('X', [[104326.2, 104309.6, 104286.3, 104269.6, 104256.3, 104239.7, 104223.0, 104209.7, 104193.1, 104176.4], 
[99803.4, 99826.7, 99843.3, 99860.0, 99876.6]])

('Y', [[38323.8, 38307.2, 38287.3, 38270.6, 38254.0, 38237.4, 38220.7, 38204.1, 38194.1, 38187.5], 
[37296.2, 37306.2, 37312.8, 37316.2, 37322.8]])])

但是我的输出是这样的：

('X', [[104326.2, 104309.6, 104286.3, 104269.6, 104256.3, 104239.7, 104223.0, 104209.7, 104193.1, 104176.4], 
[104326.2, 104309.6, 104286.3, 104269.6, 104256.3, 104239.7, 104223.0, 104209.7, 104193.1, 104176.4], 
[99803.4, 99826.7, 99843.3, 99860.0, 99876.6]])

('Y', [[38323.8, 38307.2, 38287.3, 38270.6, 38254.0, 38237.4, 38220.7, 38204.1, 38194.1, 38187.5], 
[38323.8, 38307.2, 38287.3, 38270.6, 38254.0, 38237.4, 38220.7, 38204.1, 38194.1, 38187.5], 
[37296.2, 37306.2, 37312.8, 37316.2, 37322.8]])])

我从每个集合中得到了列表的多个副本。例如，这里的 X 和 Y 列表是从第一组复制的。可能与在添加或放置空列表 XX 和 YY 后清除列表有关。但是我已经尝试了多种变体，并且似乎得到了上面的输出或者每行一个列表，而不是有序字典中的每组列表。

有谁知道如何按照我在理想情况下获得输出的方式格式化此代码？

Answer 1

我通过不使用正则表达式稍微简化了它。

相反，对于每一行，坐标存储在名为 coords 的列表中。
每个 x 将有一个偶数索引，而 y 将是奇数。因此，切片此列表将为您提供 XX 和 YY.

from collections import OrderedDict

input_text = '''Freehand    green   2   2   0,0 289618  .   
.   104326.2,38323.8    104309.6,38307.2    104286.3,38287.3    104269.6,38270.6    104256.3,38254.0
.   104239.7,38237.4    104223.0,38220.7    104209.7,38204.1    104193.1,38194.1    104176.4,38187.5

Freehand    green   2   3   0,0 63980   .   
.   99803.4,37296.2 99826.7,37306.2 99843.3,37312.8 99860.0,37316.2 99876.6,37322.8'''


dict_roi = OrderedDict([('title', []),
                        ('X', []),
                        ('Y', [])])

lines = input_text.split('\n')

Xs = []
Ys = []

for i, line in enumerate(lines):

    # When a line contains a tile
    if line.startswith('Freehand'):
        dict_roi['title'].append(line)

        if Xs and Ys:
            dict_roi['X'].append(Xs)
            dict_roi['Y'].append(Ys)
            Xs = []
            Ys = []

    # When a line is empty
    elif not line:
        continue

    # When a line contains coordinates
    else:
        line = line.replace('\n', '')
        line = line.replace('\t', ',')
        line = line.replace(' ', ',')
        coords = line.split(',')
        coords = [e for e in coords if e != '.' and e]
        coords = [float(c) for c in coords]

        # Xs are even, Ys are odd
        Xs += coords[0:: 2]
        Ys += coords[1:: 2]

dict_roi['X'].append(Xs)
dict_roi['Y'].append(Ys)

print(dict_roi)

输出：

[('title', ['Freehand    green   2   2   0,0 289618  .   ', 'Freehand    green   2   3   0,0 63980   .   ']),

 ('X', [[104326.2, 104309.6, 104286.3, 104269.6, 104256.3, 104239.7, 104223.0, 104209.7, 104193.1, 104176.4], [99803.4, 99826.7, 99843.3, 99860.0, 99876.6]]), 

('Y', [[38323.8, 38307.2, 38287.3, 38270.6, 38254.0, 38237.4, 38220.7, 38204.1, 38194.1, 38187.5], [37296.2, 37306.2, 37312.8, 37316.2, 37322.8]])])

将特定格式的文本文件中的 x 和 y 坐标获取到 python 中的有序字典中

Get x and y coordinates from a specifically formatted text file into an Ordered dictionary in python

python

loops

list

ordereddictionary

read-text