使用 python 从数据文件中提取选定的列

Question

我有这样一个数据文件

0.000       1.185e-01  1.185e-01  3.660e-02  2.962e-02  0.000e+00  0.000e+00  0.000e+00  0.000e+00  0.000e+00
0.001       1.185e-01  1.185e-01  3.660e-02  2.962e-02  -1.534e-02  -1.534e-02  8.000e-31  8.000e-31  0.000e+00
0.002       1.185e-01  1.185e-01  3.659e-02  2.961e-02  -1.541e-02  -1.541e-02  -6.163e-01  -6.163e-01  -4.284e-05
0.003       1.186e-01  1.186e-01  3.657e-02  2.959e-02  -1.547e-02  -1.547e-02  -8.000e-31  -8.000e-31  0.000e+00
0.004       1.186e-01  1.186e-01  3.657e-02  2.959e-02  -2.005e-32  -2.005e-32  -8.000e-31  -8.000e-31  0.000e+00
0.005       1.186e-01  1.186e-01  3.657e-02  2.959e-02  -2.005e-32  -2.005e-32  -8.000e-31  -8.000e-31  0.000e+00
0.006       1.187e-01  1.186e-01  3.657e-02  2.959e-02  -2.005e-32  -2.005e-32  -8.000e-31  -8.000e-31  0.000e+00
0.007       1.187e-01  1.187e-01  3.657e-02  2.959e-02  -2.005e-32  -2.005e-32  -8.000e-31  -8.000e-31  0.000e+00
0.008       1.188e-01  1.187e-01  3.657e-02  2.959e-02  -2.005e-32  -2.005e-32  -8.000e-31  -8.000e-31  0.000e+00
0.009       1.188e-01  1.187e-01  3.657e-02  2.959e-02  -2.005e-32  -2.005e-32  -8.000e-31  -8.000e-31  0.000e+00

我只想将选定的列从此文件复制到另一个文件。假设我将第 1、2 和 6 列复制到一个文件中，那么该文件应该类似于

0.000       1.185e-01  0.000e+00
0.001       1.185e-01  -1.534e-02
0.002       1.185e-01  -1.541e-02
0.003       1.186e-01  -1.547e-02
0.004       1.186e-01  -2.005e-32
0.005       1.186e-01  -2.005e-32
0.006       1.187e-01  -2.005e-32
0.007       1.187e-01  -2.005e-32
0.008       1.188e-01  -2.005e-32
0.009       1.188e-01  -2.005e-32

这是一个非常大的格式化文本文件，最初是这样写的

f=open('myMD.dat','w')
s='%8.3e  %8.3e  %8.3e  %8.3e  %8.3e  %8.3e  %8.3e  %8.3e  %8.3e\t\t'%(xpos1[i],ypos1[i],xvel1[i],yvel1[i],xacc1[i],yacc1[i],xforc[i],yforc[i],potn[i])
f.write(s)
f.close()

我正在 python 编程。我该怎么做？

Answer 1

这是什么文件？逗号分隔？纯文本？如果它是一个 *.csv 文件，你可以试试这个：

openFile = open('filepath', 'r')
dataIn = csv.reader(openFile, delimiter=' ')
col1, col2, col6 = [], [], []
for rows in dataIn:
    col1.append(rows[0])
    col2.append(rows[1])
    col6.append(rows[5])

Answer 2

列数据

此方法适用于满足这些要求的任何数据文件：

数据以白色分隔space[即space,tab,return]
数据项不含白色space

给出的示例数据满足这些要求。此方法使用 Python 3 和 Regular Expressions 从数据中提取特定列。

要简单地使用它：

调用一次init(file)函数
- 传入数据文件的路径
然后根据需要多次调用 getColm(i)
- 填入你需要的栏目
- 它将 return 一个包含该列条目的数组。

这是代码。确保导入正则表达式库 re.

import re

matrixOfFile = []

# Prep the matrixOfFile variable
def init(filepath):
    global matrixOfFile
    # Read the file content
    with open(filepath,'r') as file:
        fileContent = file.read()       
    # Split the file into rows
    rows = fileContent.split("\n")

    # Split rows into entries and add them to matrixOfFile
    for row in rows: # For each row, find all of the entries in the row that
                     # are non-space characters and add those entries to the
                     # matrix
        matrixOfFile.append(re.findall("\S+",row))

# Returns the ith column of the matrixOfFile
# i should be an int between 0 and len(matrixOfFile[0])
def getColm(i):
    global matrixOfFile
    if i<0 or i>=len(matrixOfFile[0]):
        raise ValueError('Column '+str(i)+' does not exist')
    colum = []
    for row in matrixOfFile: # For each row, add whatever is in the ith 
                  # column to colum
        colum.append(row[i])

    return colum

# Absolute filepath might be necessary ( eg "C:/Windows/Something/Users/Documents/data.dat" )
init("data.dat") 
# Gets the first, second and sixth columns of data
print(getColm(0))
print(getColm(1))
print(getColm(5))

Answer 3

这将使用给定的逗号分隔行列表读取给定的输入文件和 select 行：

import sys
input_name = sys.argv[1]
column_list = [(int(x) - 1) for x in sys.argv[2].split(',')]
with open(input_name) as input_file:
    for line in input_file:
        row = line.split()
        for col in column_list:
            print row[col],
        print ""

它一次读取并打印一行，这意味着它应该能够处理任意大的输入文件。使用您的示例数据作为 input.txt，我运行是这样的：

python selected_columns.py input.txt 1,2,6

它产生了以下输出（省略号用于显示为简洁起见删除的行）：

0.000 1.185e-01 0.000e+00 
0.001 1.185e-01 -1.534e-02 
...
0.009 1.188e-01 -2.005e-32

您可以使用重定向将输出保存到文件中：

python selected_columns.py input.txt 1,2,6 > output.txt

Answer 4

简单得多，但用途广泛。假设 none 个字段包含任何空格，您可以简单地在每一行上使用 split 方法来获取字段列表，然后打印您想要的字段。这是一个脚本，可让您指定输出的列和分隔符字符串。

注意：我们决不会在字符串和浮点数之间进行转换。这保留了以前的数字格式，对于一个巨大的文件，节省了很多 CPU!

COLS=0,1,5  # the columns you want. The first is numbered zero.
            # NB its a tuple: COLS=0, for one column, mandatory trailing comma

SEP = ', '  # the string you want to separate the columns in the output

INFILE='t.txt'      # file to read from
OUTFILE='out.txt'   # file to write to

f = open( INFILE, 'r')
g = open( OUTFILE, 'w')

for line in f.readlines():
   x = line.split()
   if x != []:  # ignore blank lines

       y = [ x[i] for i in COLS ]
       outline = SEP.join( '{}'.format(q) for q in y )
       g.write( outline+'\n')

刚刚意识到，'{}'.format(q) for q in y 在这里有点过分了。 y 是要输出的字符串数组，因此 SEP.join(y) 就是您在这里所需要的。但是显示将格式应用于相似元素列表的模式可能很有用。

使用 python 从数据文件中提取选定的列

Extracting selected columns from a datafile using python

python

file-io

numpy

data-files

列数据

要简单地使用它：