将特定格式的文本文件中的 x 和 y 坐标获取到 python 中的有序字典中
Get x and y coordinates from a specifically formatted text file into an Ordered dictionary in python
我正在尝试读取特定格式的文本文件并从中提取坐标并将它们存储在有序字典中。文本文件中的一组包含标题行,后跟 x 和 y 坐标。 x、y 坐标始终以 .
开头,后跟 \t
(制表符)。一个文本文件包含多个这样的集合。我的想法是将每个集合的 x 和 y 提取到一个列表中,并将其附加到一个有序的字典中。基本上,最后,它将是一个列表的列表,列表的数量等于将附加到有序字典的集合的数量。
文本文件的外观说明:
Freehand green 2 2 0,0 289618 .
. 104326.2,38323.8 104309.6,38307.2 104286.3,38287.3 104269.6,38270.6 104256.3,38254.0
. 104239.7,38237.4 104223.0,38220.7 104209.7,38204.1 104193.1,38194.1 104176.4,38187.5
Freehand green 2 3 0,0 63980 .
. 99803.4,37296.2 99826.7,37306.2 99843.3,37312.8 99860.0,37316.2 99876.6,37322.8
我的代码:
from collections import OrderedDict
import re
dict_roi = OrderedDict([
("title", []),
("X", []),
("Y", []) ])
with open(elements_file,"r") as f:
try:
# pattern to match to get coordinates
pattern = re.compile(".\t\d+.*")
# loop through lines and find title line and line with coordinates
for i, line in enumerate(f):
# get title line
if line.startswith('Freehand'):
dict_roi['title'].append(line)
# initiate empty list per set
XX = []
YY = []
# line with coordinates starts with .\t
# if pattern matches and line starts with .\t, get the coordinates
for match in re.finditer(pattern, line):
if line.startswith('.\t'):
nln = "{}".format(line[2:].strip())
val = nln.split('{:6.1f}')
# data-massaging to get to the coordinates
for v in val:
coordinates_list = v.split("\t")
for c in coordinates_list:
x, y = c.split(',')
print(x, y)
XX.append(float(x))
YY.append(float(y))
# this should append one list per set
dict_roi['X'].append(XX)
dict_roi['Y'].append(YY)
except ValueError:
print("Exiting")
print(dict_roi)
理想情况下,我想要一个有序的字典,它会给我这样的东西:
('X', [[104326.2, 104309.6, 104286.3, 104269.6, 104256.3, 104239.7, 104223.0, 104209.7, 104193.1, 104176.4],
[99803.4, 99826.7, 99843.3, 99860.0, 99876.6]])
('Y', [[38323.8, 38307.2, 38287.3, 38270.6, 38254.0, 38237.4, 38220.7, 38204.1, 38194.1, 38187.5],
[37296.2, 37306.2, 37312.8, 37316.2, 37322.8]])])
但是我的输出是这样的:
('X', [[104326.2, 104309.6, 104286.3, 104269.6, 104256.3, 104239.7, 104223.0, 104209.7, 104193.1, 104176.4],
[104326.2, 104309.6, 104286.3, 104269.6, 104256.3, 104239.7, 104223.0, 104209.7, 104193.1, 104176.4],
[99803.4, 99826.7, 99843.3, 99860.0, 99876.6]])
('Y', [[38323.8, 38307.2, 38287.3, 38270.6, 38254.0, 38237.4, 38220.7, 38204.1, 38194.1, 38187.5],
[38323.8, 38307.2, 38287.3, 38270.6, 38254.0, 38237.4, 38220.7, 38204.1, 38194.1, 38187.5],
[37296.2, 37306.2, 37312.8, 37316.2, 37322.8]])])
我从每个集合中得到了列表的多个副本。例如,这里的 X
和 Y
列表是从第一组复制的。可能与在添加或放置空列表 XX
和 YY
后清除列表有关。但是我已经尝试了多种变体,并且似乎得到了上面的输出或者每行一个列表,而不是有序字典中的每组列表。
有谁知道如何按照我在理想情况下获得输出的方式格式化此代码?
我通过不使用正则表达式稍微简化了它。
相反,对于每一行,坐标存储在名为 coords
的列表中。
每个 x 将有一个偶数索引,而 y 将是奇数。
因此,切片此列表将为您提供 XX
和 YY
.
from collections import OrderedDict
input_text = '''Freehand green 2 2 0,0 289618 .
. 104326.2,38323.8 104309.6,38307.2 104286.3,38287.3 104269.6,38270.6 104256.3,38254.0
. 104239.7,38237.4 104223.0,38220.7 104209.7,38204.1 104193.1,38194.1 104176.4,38187.5
Freehand green 2 3 0,0 63980 .
. 99803.4,37296.2 99826.7,37306.2 99843.3,37312.8 99860.0,37316.2 99876.6,37322.8'''
dict_roi = OrderedDict([('title', []),
('X', []),
('Y', [])])
lines = input_text.split('\n')
Xs = []
Ys = []
for i, line in enumerate(lines):
# When a line contains a tile
if line.startswith('Freehand'):
dict_roi['title'].append(line)
if Xs and Ys:
dict_roi['X'].append(Xs)
dict_roi['Y'].append(Ys)
Xs = []
Ys = []
# When a line is empty
elif not line:
continue
# When a line contains coordinates
else:
line = line.replace('\n', '')
line = line.replace('\t', ',')
line = line.replace(' ', ',')
coords = line.split(',')
coords = [e for e in coords if e != '.' and e]
coords = [float(c) for c in coords]
# Xs are even, Ys are odd
Xs += coords[0:: 2]
Ys += coords[1:: 2]
dict_roi['X'].append(Xs)
dict_roi['Y'].append(Ys)
print(dict_roi)
输出:
[('title', ['Freehand green 2 2 0,0 289618 . ', 'Freehand green 2 3 0,0 63980 . ']),
('X', [[104326.2, 104309.6, 104286.3, 104269.6, 104256.3, 104239.7, 104223.0, 104209.7, 104193.1, 104176.4], [99803.4, 99826.7, 99843.3, 99860.0, 99876.6]]),
('Y', [[38323.8, 38307.2, 38287.3, 38270.6, 38254.0, 38237.4, 38220.7, 38204.1, 38194.1, 38187.5], [37296.2, 37306.2, 37312.8, 37316.2, 37322.8]])])
我正在尝试读取特定格式的文本文件并从中提取坐标并将它们存储在有序字典中。文本文件中的一组包含标题行,后跟 x 和 y 坐标。 x、y 坐标始终以 .
开头,后跟 \t
(制表符)。一个文本文件包含多个这样的集合。我的想法是将每个集合的 x 和 y 提取到一个列表中,并将其附加到一个有序的字典中。基本上,最后,它将是一个列表的列表,列表的数量等于将附加到有序字典的集合的数量。
文本文件的外观说明:
Freehand green 2 2 0,0 289618 .
. 104326.2,38323.8 104309.6,38307.2 104286.3,38287.3 104269.6,38270.6 104256.3,38254.0
. 104239.7,38237.4 104223.0,38220.7 104209.7,38204.1 104193.1,38194.1 104176.4,38187.5
Freehand green 2 3 0,0 63980 .
. 99803.4,37296.2 99826.7,37306.2 99843.3,37312.8 99860.0,37316.2 99876.6,37322.8
我的代码:
from collections import OrderedDict
import re
dict_roi = OrderedDict([
("title", []),
("X", []),
("Y", []) ])
with open(elements_file,"r") as f:
try:
# pattern to match to get coordinates
pattern = re.compile(".\t\d+.*")
# loop through lines and find title line and line with coordinates
for i, line in enumerate(f):
# get title line
if line.startswith('Freehand'):
dict_roi['title'].append(line)
# initiate empty list per set
XX = []
YY = []
# line with coordinates starts with .\t
# if pattern matches and line starts with .\t, get the coordinates
for match in re.finditer(pattern, line):
if line.startswith('.\t'):
nln = "{}".format(line[2:].strip())
val = nln.split('{:6.1f}')
# data-massaging to get to the coordinates
for v in val:
coordinates_list = v.split("\t")
for c in coordinates_list:
x, y = c.split(',')
print(x, y)
XX.append(float(x))
YY.append(float(y))
# this should append one list per set
dict_roi['X'].append(XX)
dict_roi['Y'].append(YY)
except ValueError:
print("Exiting")
print(dict_roi)
理想情况下,我想要一个有序的字典,它会给我这样的东西:
('X', [[104326.2, 104309.6, 104286.3, 104269.6, 104256.3, 104239.7, 104223.0, 104209.7, 104193.1, 104176.4],
[99803.4, 99826.7, 99843.3, 99860.0, 99876.6]])
('Y', [[38323.8, 38307.2, 38287.3, 38270.6, 38254.0, 38237.4, 38220.7, 38204.1, 38194.1, 38187.5],
[37296.2, 37306.2, 37312.8, 37316.2, 37322.8]])])
但是我的输出是这样的:
('X', [[104326.2, 104309.6, 104286.3, 104269.6, 104256.3, 104239.7, 104223.0, 104209.7, 104193.1, 104176.4],
[104326.2, 104309.6, 104286.3, 104269.6, 104256.3, 104239.7, 104223.0, 104209.7, 104193.1, 104176.4],
[99803.4, 99826.7, 99843.3, 99860.0, 99876.6]])
('Y', [[38323.8, 38307.2, 38287.3, 38270.6, 38254.0, 38237.4, 38220.7, 38204.1, 38194.1, 38187.5],
[38323.8, 38307.2, 38287.3, 38270.6, 38254.0, 38237.4, 38220.7, 38204.1, 38194.1, 38187.5],
[37296.2, 37306.2, 37312.8, 37316.2, 37322.8]])])
我从每个集合中得到了列表的多个副本。例如,这里的 X
和 Y
列表是从第一组复制的。可能与在添加或放置空列表 XX
和 YY
后清除列表有关。但是我已经尝试了多种变体,并且似乎得到了上面的输出或者每行一个列表,而不是有序字典中的每组列表。
有谁知道如何按照我在理想情况下获得输出的方式格式化此代码?
我通过不使用正则表达式稍微简化了它。
相反,对于每一行,坐标存储在名为 coords
的列表中。
每个 x 将有一个偶数索引,而 y 将是奇数。
因此,切片此列表将为您提供 XX
和 YY
.
from collections import OrderedDict
input_text = '''Freehand green 2 2 0,0 289618 .
. 104326.2,38323.8 104309.6,38307.2 104286.3,38287.3 104269.6,38270.6 104256.3,38254.0
. 104239.7,38237.4 104223.0,38220.7 104209.7,38204.1 104193.1,38194.1 104176.4,38187.5
Freehand green 2 3 0,0 63980 .
. 99803.4,37296.2 99826.7,37306.2 99843.3,37312.8 99860.0,37316.2 99876.6,37322.8'''
dict_roi = OrderedDict([('title', []),
('X', []),
('Y', [])])
lines = input_text.split('\n')
Xs = []
Ys = []
for i, line in enumerate(lines):
# When a line contains a tile
if line.startswith('Freehand'):
dict_roi['title'].append(line)
if Xs and Ys:
dict_roi['X'].append(Xs)
dict_roi['Y'].append(Ys)
Xs = []
Ys = []
# When a line is empty
elif not line:
continue
# When a line contains coordinates
else:
line = line.replace('\n', '')
line = line.replace('\t', ',')
line = line.replace(' ', ',')
coords = line.split(',')
coords = [e for e in coords if e != '.' and e]
coords = [float(c) for c in coords]
# Xs are even, Ys are odd
Xs += coords[0:: 2]
Ys += coords[1:: 2]
dict_roi['X'].append(Xs)
dict_roi['Y'].append(Ys)
print(dict_roi)
输出:
[('title', ['Freehand green 2 2 0,0 289618 . ', 'Freehand green 2 3 0,0 63980 . ']),
('X', [[104326.2, 104309.6, 104286.3, 104269.6, 104256.3, 104239.7, 104223.0, 104209.7, 104193.1, 104176.4], [99803.4, 99826.7, 99843.3, 99860.0, 99876.6]]),
('Y', [[38323.8, 38307.2, 38287.3, 38270.6, 38254.0, 38237.4, 38220.7, 38204.1, 38194.1, 38187.5], [37296.2, 37306.2, 37312.8, 37316.2, 37322.8]])])