从 python 中的文本文件中读取特定列
Reading specific columns from a text file in python
我有一个文本文件,其中包含 table 由数字组成,例如:
5 10 6
6 20 1
7 30 4
8 40 3
9 23 1
4 13 6
例如,如果我想要仅包含在第二列中的数字,我该如何将该列提取到列表中?
f=open(file,"r")
lines=f.readlines()
result=[]
for x in lines:
result.append(x.split(' ')[1])
f.close()
您可以使用列表理解来做同样的事情
print([x.split(' ')[1] for x in open(file).readlines()])
split()
上的文档
string.split(s[, sep[, maxsplit]])
Return a list of the words of the string s
. If the optional second argument sep is absent or None, the words are separated by arbitrary strings of whitespace characters (space, tab, newline, return, formfeed). If the second argument sep is present and not None, it specifies a string to be used as the word separator. The returned list will then have one more item than the number of non-overlapping occurrences of the separator in the string.
因此,您可以省略我使用的 space 并只执行 x.split()
但这也会删除制表符和换行符,请注意这一点。
首先我们打开文件,然后 datafile
然后我们应用 .read()
方法读取文件内容,然后我们拆分 returns 类似的数据:['5', '10', '6', '6', '20', '1', '7', '30', '4', '8', '40', '3', '9', '23', '1', '4', '13', '6']
并且我们在此列表上应用列表切片以从索引位置 1 处的元素开始并跳过接下来的 3 个元素,直到它到达循环的末尾。
with open("sample.txt", "r") as datafile:
print datafile.read().split()[1::3]
输出:
['10', '20', '30', '40', '23', '13']
您可以使用带有列表理解的 zip
函数:
with open('ex.txt') as f:
print zip(*[line.split() for line in f])[1]
结果:
('10', '20', '30', '40', '23', '13')
您有一个 space 分隔文件,因此请使用专为读取分隔值文件而设计的模块,csv
。
import csv
with open('path/to/file.txt') as inf:
reader = csv.reader(inf, delimiter=" ")
second_col = list(zip(*reader))[1]
# In Python2, you can omit the `list(...)` cast
zip(*iterable)
模式可用于将行转换为列,反之亦然。如果您正在按行读取文件...
>>> testdata = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
>>> for line in testdata:
... print(line)
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
...但需要列,您可以将每一行传递给 zip
函数
>>> testdata_columns = zip(*testdata)
# this is equivalent to zip([1,2,3], [4,5,6], [7,8,9])
>>> for line in testdata_columns:
... print(line)
[1, 4, 7]
[2, 5, 8]
[3, 6, 9]
我知道这是一个老问题,但没有人提到当你的数据看起来像一个数组时,numpy 的 loadtxt 就派上用场了:
>>> import numpy as np
>>> np.loadtxt("myfile.txt")[:, 1]
array([10., 20., 30., 40., 23., 13.])
可能有帮助:
import csv
with open('csv_file','r') as f:
# Printing Specific Part of CSV_file
# Printing last line of second column
lines = list(csv.reader(f, delimiter = ' ', skipinitialspace = True))
print(lines[-1][1])
# For printing a range of rows except 10 last rows of second column
for i in range(len(lines)-10):
print(lines[i][1])
我有一个文本文件,其中包含 table 由数字组成,例如:
5 10 6
6 20 1
7 30 4
8 40 3
9 23 1
4 13 6
例如,如果我想要仅包含在第二列中的数字,我该如何将该列提取到列表中?
f=open(file,"r")
lines=f.readlines()
result=[]
for x in lines:
result.append(x.split(' ')[1])
f.close()
您可以使用列表理解来做同样的事情
print([x.split(' ')[1] for x in open(file).readlines()])
split()
string.split(s[, sep[, maxsplit]])
Return a list of the words of the string
s
. If the optional second argument sep is absent or None, the words are separated by arbitrary strings of whitespace characters (space, tab, newline, return, formfeed). If the second argument sep is present and not None, it specifies a string to be used as the word separator. The returned list will then have one more item than the number of non-overlapping occurrences of the separator in the string.
因此,您可以省略我使用的 space 并只执行 x.split()
但这也会删除制表符和换行符,请注意这一点。
首先我们打开文件,然后 datafile
然后我们应用 .read()
方法读取文件内容,然后我们拆分 returns 类似的数据:['5', '10', '6', '6', '20', '1', '7', '30', '4', '8', '40', '3', '9', '23', '1', '4', '13', '6']
并且我们在此列表上应用列表切片以从索引位置 1 处的元素开始并跳过接下来的 3 个元素,直到它到达循环的末尾。
with open("sample.txt", "r") as datafile:
print datafile.read().split()[1::3]
输出:
['10', '20', '30', '40', '23', '13']
您可以使用带有列表理解的 zip
函数:
with open('ex.txt') as f:
print zip(*[line.split() for line in f])[1]
结果:
('10', '20', '30', '40', '23', '13')
您有一个 space 分隔文件,因此请使用专为读取分隔值文件而设计的模块,csv
。
import csv
with open('path/to/file.txt') as inf:
reader = csv.reader(inf, delimiter=" ")
second_col = list(zip(*reader))[1]
# In Python2, you can omit the `list(...)` cast
zip(*iterable)
模式可用于将行转换为列,反之亦然。如果您正在按行读取文件...
>>> testdata = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
>>> for line in testdata:
... print(line)
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
...但需要列,您可以将每一行传递给 zip
函数
>>> testdata_columns = zip(*testdata)
# this is equivalent to zip([1,2,3], [4,5,6], [7,8,9])
>>> for line in testdata_columns:
... print(line)
[1, 4, 7]
[2, 5, 8]
[3, 6, 9]
我知道这是一个老问题,但没有人提到当你的数据看起来像一个数组时,numpy 的 loadtxt 就派上用场了:
>>> import numpy as np
>>> np.loadtxt("myfile.txt")[:, 1]
array([10., 20., 30., 40., 23., 13.])
可能有帮助:
import csv
with open('csv_file','r') as f:
# Printing Specific Part of CSV_file
# Printing last line of second column
lines = list(csv.reader(f, delimiter = ' ', skipinitialspace = True))
print(lines[-1][1])
# For printing a range of rows except 10 last rows of second column
for i in range(len(lines)-10):
print(lines[i][1])