Python: 按行匹配拆分文件

Question

我有一个包含不同部分的文本文件，我想拆分成单独的文件。在下面的示例中，分割点将是 "Step" 行。

Step Number: 1; Plot Name: deg0_R58; Type: Arrow Plot 
x(mm),y(mm),z(mm),Bx(T),By(T),Bz(T),Bm(T)
5.505E+01,-1.124E-02,-2.000E+00, 3.443E-04,-1.523E-05, 3.913E-04
5.511E+01,-1.124E-02,-2.000E+00, 3.417E-04,-1.511E-05, 3.912E-04
5.516E+01,-1.124E-02,-2.000E+00, 3.390E-04,-1.499E-05, 3.910E-04
...

Step Number: 2; Plot Name: deg0_R58; Type: Arrow Plot
...

原因是 pandas 函数 pandas.read_csv() 由于 "Step" 行而无法处理整个文件。

我只是暂时需要 pandas.read_csv() 的文件，所以我实际上并不想写它们。我已经尝试使用 itertools.islice 对文件进行切片，但是我无法使用 pandas.read_csv 处理输出，因为它需要一个文件类型对象。

这是我目前得到的：

buf  = []
with open(filepath, 'r') as f:
    for line in f:
            if 'Step' in line:
                buf.append( [] )
            else:
                buf[-1].append( line )

有没有办法将 buf 行列表转换为文件类型格式？

->

感谢您的投入，StringIO 工作得很好！这是我所做的，以防万一有人遇到类似的问题：

steps_Dict= {}
fsection = None
step_nr = 0;
with open( filepath, 'r' ) as f:
    print f
    for line in f:
        if 'Step' in line:
            if fsection:
                step_nr = step_nr + 1   # Steps start with 1
                fsection.seek(0)
                steps_Dict[ step_nr ] = pd.read_csv(fsection, sep=',', header=0 )
                print steps_Dict
            fsection = StringIO.StringIO()  # new section
        else:   # append to section
            if line.strip():                                # Skip Blank Lines;Alternative with pandas 0.16, pd.read_csv skip_blank_lines=True a parameter could be used ?
                fsection.write( line )  
    if fsection:    # captures the last section
        fsection.seek(0)
        steps_Dict[ step_nr +1] = pd.read_csv( fsection, sep=',', header=0 )
steps_Panel = pd.Panel( steps_Dict )

Answer 1

您可以使用 pandas.io.parsers.read_csv 函数并跳过您不需要或不需要的行，直接将文件读入 DataFrame。

 import pandas
 z = pandas.io.parsers.read_csv("C:/path/a.txt", skiprows=0, header=1, sep=",")
 z

    x(mm)   y(mm)       z(mm)   Bx(T)       By(T)       Bz(T)       Bm(T)
0   55.05   -0.01124    -2      0.000344    -0.000015   0.000391    NaN
1   55.11   -0.01124    -2      0.000342    -0.000015   0.000391    NaN
2   55.16   -0.01124    -2      0.000339    -0.000015   0.000391    NaN

Answer 2

如果不需要写入文件，可以使用StringIO来存储字符串。

import StringIO

output = StringIO.StringIO()
with open(filepath, 'r') as f:
    for line in f:
        if 'Step' not in line:
            output.write(line)

然后你可以使用 Pandas' read_csv 功能与 output.

正如@Julien 在下面的评论中指出的那样。在使用 pandas:

阅读之前，您还需要执行 output.seek(0)

import pandas as pd
output.seek(0)
pd.read_csv(output)

Answer 3

您可以使用 StringIO 模块创建一个类似文件的对象，供 pd.read_csv() 使用：

import StringIO
import pandas as pd

astr = StringIO.StringIO()
astr.write('This,is,a,test\n')
astr.write('This,is,another,test\n')
astr.seek(0)
df = pd.read_csv(astr)

Python: 按行匹配拆分文件

Python: Split file by line match

python

io

file