如何读取 python 中包含 numpy.ndarray 的 txt 文件
How to read a txt file containing numpy.ndarray in python
我想知道读取包含以下值的 test.txt
文件的正确语法是什么:
(p.s。test.txt 的类型为 numpy.ndarray)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.51 0.47 0.45
0.42 0.42 0.4 0.385 0.375 0.41 0.415 0.375 0.355 0.36 0.41 0.4
0.39 0.38 0.375 0.375 0.375 0.38 0.39 0.395 0.385 0.38 0.375 0.375
0.37 0.365 0.36 0.355 0.35 0.35 0.345 0.345 0.35 0.36 0.355 0.355
0.35 0.35 0.355 0.355 0.35 0.35 0.35 0.345 0.34 0.335 0.325 0.325
0.325 0.33 0.345 0.325 0.32 0.315 0.315 0.315 0.31 0.31 0.31 0.305
0.305 0.3 0.3 0.29 0.29 0.3 0.295 0.29 0.29 0.29 0.29 0.29]
我尝试使用以下代码读取文件:
data_test = np.genfromtxt('test.txt')
但我收到错误信息:
ValueError: Some errors were detected !
Line #43 (got 8 columns instead of 12)
任何有关如何读取这种由 space/columns 分隔的数据的帮助将不胜感激!
由于文件可以看作是嵌入在 non-decimal 垃圾中的一堆浮点数,正则表达式可以将它们拉出来。只需找到所有由小数和句点组成的子串。
>>> import numpy as np
>>> import re
>>> with open('foo.txt') as fileobj:
... arr = np.array([float(val) for val in re.findall(r"[\d\.]+",
... fileobj.read(), flags=re.MULTILINE)])
...
>>> arr
array([0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.51 , 0.47 , 0.45 , 0.42 , 0.42 , 0.4 ,
0.385, 0.375, 0.41 , 0.415, 0.375, 0.355, 0.36 , 0.41 , 0.4 ,
0.39 , 0.38 , 0.375, 0.375, 0.375, 0.38 , 0.39 , 0.395, 0.385,
0.38 , 0.375, 0.375, 0.37 , 0.365, 0.36 , 0.355, 0.35 , 0.35 ,
0.345, 0.345, 0.35 , 0.36 , 0.355, 0.355, 0.35 , 0.35 , 0.355,
0.355, 0.35 , 0.35 , 0.35 , 0.345, 0.34 , 0.335, 0.325, 0.325,
0.325, 0.33 , 0.345, 0.325, 0.32 , 0.315, 0.315, 0.315, 0.31 ,
0.31 , 0.31 , 0.305, 0.305, 0.3 , 0.3 , 0.29 , 0.29 , 0.3 ,
0.295, 0.29 , 0.29 , 0.29 , 0.29 , 0.29 ])
with open('test.txt') as file:
data = file.read()
data = data.replace('\n', '')
arr = np.fromstring(data[1:-1], sep=' ', dtype=np.float32)
我想知道读取包含以下值的 test.txt
文件的正确语法是什么:
(p.s。test.txt 的类型为 numpy.ndarray)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.51 0.47 0.45
0.42 0.42 0.4 0.385 0.375 0.41 0.415 0.375 0.355 0.36 0.41 0.4
0.39 0.38 0.375 0.375 0.375 0.38 0.39 0.395 0.385 0.38 0.375 0.375
0.37 0.365 0.36 0.355 0.35 0.35 0.345 0.345 0.35 0.36 0.355 0.355
0.35 0.35 0.355 0.355 0.35 0.35 0.35 0.345 0.34 0.335 0.325 0.325
0.325 0.33 0.345 0.325 0.32 0.315 0.315 0.315 0.31 0.31 0.31 0.305
0.305 0.3 0.3 0.29 0.29 0.3 0.295 0.29 0.29 0.29 0.29 0.29]
我尝试使用以下代码读取文件:
data_test = np.genfromtxt('test.txt')
但我收到错误信息:
ValueError: Some errors were detected !
Line #43 (got 8 columns instead of 12)
任何有关如何读取这种由 space/columns 分隔的数据的帮助将不胜感激!
由于文件可以看作是嵌入在 non-decimal 垃圾中的一堆浮点数,正则表达式可以将它们拉出来。只需找到所有由小数和句点组成的子串。
>>> import numpy as np
>>> import re
>>> with open('foo.txt') as fileobj:
... arr = np.array([float(val) for val in re.findall(r"[\d\.]+",
... fileobj.read(), flags=re.MULTILINE)])
...
>>> arr
array([0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.51 , 0.47 , 0.45 , 0.42 , 0.42 , 0.4 ,
0.385, 0.375, 0.41 , 0.415, 0.375, 0.355, 0.36 , 0.41 , 0.4 ,
0.39 , 0.38 , 0.375, 0.375, 0.375, 0.38 , 0.39 , 0.395, 0.385,
0.38 , 0.375, 0.375, 0.37 , 0.365, 0.36 , 0.355, 0.35 , 0.35 ,
0.345, 0.345, 0.35 , 0.36 , 0.355, 0.355, 0.35 , 0.35 , 0.355,
0.355, 0.35 , 0.35 , 0.35 , 0.345, 0.34 , 0.335, 0.325, 0.325,
0.325, 0.33 , 0.345, 0.325, 0.32 , 0.315, 0.315, 0.315, 0.31 ,
0.31 , 0.31 , 0.305, 0.305, 0.3 , 0.3 , 0.29 , 0.29 , 0.3 ,
0.295, 0.29 , 0.29 , 0.29 , 0.29 , 0.29 ])
with open('test.txt') as file:
data = file.read()
data = data.replace('\n', '')
arr = np.fromstring(data[1:-1], sep=' ', dtype=np.float32)