标记字符串中的整数
Tokenising integers in a string
我有一个包含以下格式坐标的文本文件:
[-1.38795678, 54.90352965]
[-3.2115, 55.95530556]
[0.00315428, 51.50285246]
我希望能够遍历每个坐标以检查它位于哪个多边形(shapefile 中的英国县),但是我不确定如何标记数字以便我可以沿线编写代码的...
for line in coordinates:
for poly in polygons:
if points in polygons:
print(polygons)
break
if points not in polygons:
continue
目前它们是字符串,但我希望每条线都由两个点组成,以便程序可以尝试在多边形中定位它们。
您可以使用 literal_eval
.
将字符串转换为元组
>>> from ast import literal_eval
>>> s = "[-1.38795678, 54.90352965], [-3.2115, 55.95530556], [0.00315428, 51.50285246]"
>>> seq = literal_eval(s)
>>> print seq[0][1]
54.90352965
编辑:如果坐标在单独的行上且没有逗号,
from ast import literal_eval
s = """[-1.38795678, 54.90352965]
[-3.2115, 55.95530556]
[0.00315428, 51.50285246]"""
seq = [literal_eval(line) for line in s.split("\n")]
#or
seq = literal_eval(s.replace("\n", ","))
print seq[0][1]
您还可以使用比 ast 快得多的正则表达式:
import re
with open("in.txt") as f:
r = re.compile("[-]?\d+\.\d+")
data = [list(map(float, r.findall(line))) for line in f]
一些时间:
In [14]: %%timeit
with open("test.txt") as f:
data = [literal_eval(line) for line in f]
....:
100 loops, best of 3: 2.01 ms per loop
In [15]: %%timeit
with open("test.txt") as f:
r = re.compile("[-]\d+\.\d+")
data = [list(map(float, r.findall(line))) for line in f]
....:
1000 loops, best of 3: 403 µs per loop
with open("test.txt") as f:
r = re.compile("[-]?\d+\.\d+")
data = [list(map(float, r.findall(line))) for line in f]
....:
In [38]: with open("test.txt") as f:
data2 = [literal_eval(line) for line in f]
....:
In [39]: data == data2
Out[39]: True
再次剥离和拆分会更快:
In [40]: %%timeit
....: with open("test.txt") as f:
....: data = [list(map(float, line.strip("[]\n").split(","))) for line in f]
....:
1000 loops, best of 3: 249 µs per loop
我有一个包含以下格式坐标的文本文件:
[-1.38795678, 54.90352965]
[-3.2115, 55.95530556]
[0.00315428, 51.50285246]
我希望能够遍历每个坐标以检查它位于哪个多边形(shapefile 中的英国县),但是我不确定如何标记数字以便我可以沿线编写代码的...
for line in coordinates:
for poly in polygons:
if points in polygons:
print(polygons)
break
if points not in polygons:
continue
目前它们是字符串,但我希望每条线都由两个点组成,以便程序可以尝试在多边形中定位它们。
您可以使用 literal_eval
.
>>> from ast import literal_eval
>>> s = "[-1.38795678, 54.90352965], [-3.2115, 55.95530556], [0.00315428, 51.50285246]"
>>> seq = literal_eval(s)
>>> print seq[0][1]
54.90352965
编辑:如果坐标在单独的行上且没有逗号,
from ast import literal_eval
s = """[-1.38795678, 54.90352965]
[-3.2115, 55.95530556]
[0.00315428, 51.50285246]"""
seq = [literal_eval(line) for line in s.split("\n")]
#or
seq = literal_eval(s.replace("\n", ","))
print seq[0][1]
您还可以使用比 ast 快得多的正则表达式:
import re
with open("in.txt") as f:
r = re.compile("[-]?\d+\.\d+")
data = [list(map(float, r.findall(line))) for line in f]
一些时间:
In [14]: %%timeit
with open("test.txt") as f:
data = [literal_eval(line) for line in f]
....:
100 loops, best of 3: 2.01 ms per loop
In [15]: %%timeit
with open("test.txt") as f:
r = re.compile("[-]\d+\.\d+")
data = [list(map(float, r.findall(line))) for line in f]
....:
1000 loops, best of 3: 403 µs per loop
with open("test.txt") as f:
r = re.compile("[-]?\d+\.\d+")
data = [list(map(float, r.findall(line))) for line in f]
....:
In [38]: with open("test.txt") as f:
data2 = [literal_eval(line) for line in f]
....:
In [39]: data == data2
Out[39]: True
再次剥离和拆分会更快:
In [40]: %%timeit
....: with open("test.txt") as f:
....: data = [list(map(float, line.strip("[]\n").split(","))) for line in f]
....:
1000 loops, best of 3: 249 µs per loop