np.genfromtxt 多个分隔符?
np.genfromtxt multiple delimiters?
我的文件如下所示:
1497484825;34425;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14
1497484837;34476;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14
我想使用 np.genfromtxt 将它导入到 numpy 数组中。最大的问题是它有';'和 ',' 作为分隔符。
我的尝试:
import numpy as np
import io
s = io.StringIO(open('2e70dfa1.csv').read().replace(';',','))
data = np.genfromtxt(s,dtype=int,delimiter=',')
我收到错误:
TypeError: Can't convert 'bytes' object to str implicitly
如何解决?我也愿意接受全新的(更好的)想法。
Per the docs 对于 numpy.genfromtxt:
Note that generators must return byte strings in Python 3k.
所以不是创建一个 StringIO
对象,而是创建一个 BytesIO
:
import numpy as np
import io
s = io.BytesIO(open('2e70dfa1.csv', 'rb').read().replace(b';',b','))
data = np.genfromtxt(s,dtype=int,delimiter=',')
产量
array([[1497484825, 34425, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14],
[1497484837, 34476, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14]])
请注意,如果您安装了 Pandas,则可以使用 pd.read_table
,这将允许您将正则表达式模式指定为分隔符:
import pandas as pd
df = pd.read_table('2e70dfa1.csv', sep=';|,', engine='python', header=None)
print(df)
产量
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 1497484825 34425 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14
1 1497484837 34476 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14
pd.read_table
returns 一个数据帧。如果你需要一个 NumPy 数组,你可以通过它的 values
属性访问它:
In [24]: df.values
Out[24]:
array([[1497484825, 34425, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14],
[1497484837, 34476, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14]])
根据 docs:
Parameters:
fname : file, str, pathlib.Path, list of str, generator
File, filename, list, or generator to read. If the filename extension
is gz or bz2, the file is first decompressed. Note that generators
must return byte strings in Python 3k. The strings in a list or
produced by a generator are treated as lines.
给它一个生成器可能更容易也更有效,但要记住它必须产生字节串:
>>> with open('2e70dfa1.csv', 'rb') as f:
... clean_lines = (line.replace(b';',b',') for line in f)
... data = np.genfromtxt(clean_lines, dtype=int, delimiter=',')
...
>>> data
array([[1497484825, 34425, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14],
[1497484837, 34476, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14]])
我的文件如下所示:
1497484825;34425;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14
1497484837;34476;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14
我想使用 np.genfromtxt 将它导入到 numpy 数组中。最大的问题是它有';'和 ',' 作为分隔符。 我的尝试:
import numpy as np
import io
s = io.StringIO(open('2e70dfa1.csv').read().replace(';',','))
data = np.genfromtxt(s,dtype=int,delimiter=',')
我收到错误:
TypeError: Can't convert 'bytes' object to str implicitly
如何解决?我也愿意接受全新的(更好的)想法。
Per the docs 对于 numpy.genfromtxt:
Note that generators must return byte strings in Python 3k.
所以不是创建一个 StringIO
对象,而是创建一个 BytesIO
:
import numpy as np
import io
s = io.BytesIO(open('2e70dfa1.csv', 'rb').read().replace(b';',b','))
data = np.genfromtxt(s,dtype=int,delimiter=',')
产量
array([[1497484825, 34425, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14],
[1497484837, 34476, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14]])
请注意,如果您安装了 Pandas,则可以使用 pd.read_table
,这将允许您将正则表达式模式指定为分隔符:
import pandas as pd
df = pd.read_table('2e70dfa1.csv', sep=';|,', engine='python', header=None)
print(df)
产量
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 1497484825 34425 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14
1 1497484837 34476 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14
pd.read_table
returns 一个数据帧。如果你需要一个 NumPy 数组,你可以通过它的 values
属性访问它:
In [24]: df.values
Out[24]:
array([[1497484825, 34425, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14],
[1497484837, 34476, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14]])
根据 docs:
Parameters:
fname : file, str, pathlib.Path, list of str, generator File, filename, list, or generator to read. If the filename extension is gz or bz2, the file is first decompressed. Note that generators must return byte strings in Python 3k. The strings in a list or produced by a generator are treated as lines.
给它一个生成器可能更容易也更有效,但要记住它必须产生字节串:
>>> with open('2e70dfa1.csv', 'rb') as f:
... clean_lines = (line.replace(b';',b',') for line in f)
... data = np.genfromtxt(clean_lines, dtype=int, delimiter=',')
...
>>> data
array([[1497484825, 34425, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14],
[1497484837, 34476, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14]])