"copy only the comments from one file" 和 "prepend it into another file" 使用 python 的更好方法

Question

基本上我想从一个文件中复制评论并将其添加到另一个数据中。

文件'data_with_comments.txt'可以从pastebin获取： http://pastebin.com/Tixij2yG

看起来像这样：

# coating file for detector A/R
# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
      14.2000     0.300000  8.00000e-05     0.999920
      14.2000     0.301000  4.00000e-05     0.999960
      14.2000     0.302000  2.00000e-05     0.999980
      14.2000     0.303000  2.00000e-05     0.999980
      14.2000     0.304000  2.00000e-05     0.999980
      14.2000     0.305000  3.00000e-05     0.999970
      14.2000     0.306000  5.00000e-05     0.999950

现在，我有另一个数据文件 'test.txt'，它看起来像这样：

300.0 1.53345164121e-32
300.1 1.53345164121e-32
300.2 1.53345164121e-32
300.3 1.53345164121e-32
300.4 1.53345164121e-32
300.5 1.53345164121e-32

所需输出：

# coating file for detector A/R
# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
300.0 1.53345164121e-32
300.1 1.53345164121e-32
300.2 1.53345164121e-32
300.3 1.53345164121e-32
300.4 1.53345164121e-32

一种方法是：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author    : Bhishan Poudel
# Date      : Jun 18, 2016


# Imports
from __future__ import print_function
import fileinput


# read in comments from the file
infile = 'data_with_comments.txt'
comments = []
with open(infile, 'r') as fi:
    for line in fi.readlines():
        if line.startswith('#'):
            comments.append(line)

# reverse the list
comments = comments[::-1]
print(comments[0])
#==============================================================================


# preprepend a list to a file
filename = 'test.txt'

for i in range(len(comments)):
    with file(filename, 'r') as original: data = original.read()
    with file(filename, 'w') as modified: modified.write(comments[i] + data)

这种方法需要多次打开文件，当数据文件很大时效率不高。

有更好的方法吗？

相关链接如下：

Prepend line to beginning of a file
Python f.write() at beginning of file?
How can I add a new line of text at top of a file?
Prepend a line to an existing file in Python

Answer 1

如果您的文件仅在开头包含注释，您可以使用文件的 lazy opening 然后只处理文件的第一行，直到找到非注释为止。在找到以“#”字符开头的行后，您可以从循环中跳出并让 python 的 with 语句处理文件关闭。

Answer 2

特别是如果数据文件（test.txt 这里）很大（如 OP 所述），我建议（文件只打开一次用于读取，另一个文件用于写入）：

创建一个临时文件夹，
用剥离的（！）注释行在其中预填充一个临时文件，
添加数据文件中的行，
将临时文件重命名为数据文件，
删除临时文件夹，瞧。

像这样：

#! /usr/bin/env python
from __future__ import print_function

import os
import tempfile


infile = 'data_with_comments.txt'
comments = None
with open(infile, 'r') as f_i:
    comments = [t.strip() for t in f_i.readlines() if t.startswith('#')]

file_name = 'test.txt'
file_path = file_name  # simpl0ification here

tmp_dir = tempfile.mkdtemp()  # create tmp folder (works on all platforms)
tmp_file_name = '_' + file_name  # determine the file name in temp folder

s_umask = os.umask(0077)

tmp_file_path = os.path.join(tmp_dir, tmp_file_name)
try:
    with open(file_path, "rt") as f_prep, open(
            tmp_file_path, "wt") as f_tmp:
        f_tmp.write('\n'.join(comments) + '\n')
        for line in f_prep.readlines():
            f_tmp.write(line)
except IOError as e:
    print(e)  # or what you want to tell abnout it, instead of aborting
else:
    os.rename(tmp_file_path, file_path)
finally:
    try:  # so we have an empty folder in - nearly - any case
        os.remove(tmp_file_path)
    except OSError:
        pass
    os.umask(s_umask)
    os.rmdir(tmp_dir)

没有什么特别的，每行迭代可能是咳咳，好吧......，应该衡量它是否足够的性能明智。在某些情况下，我必须写入一个文件的 "top"，该文件主要工作 "good nuff"，或者使用 shell，例如：

cat comments_only test.txt > foo && mv foo test.txt

PS：为了在 "append" 阶段提升文件读写，应该使用匹配的块级读写，块大小针对底层系统调用进行了优化，以获得最大性能（因为这将是一对一副本，无需逐行迭代。

Answer 3

您已经有了一个很好的答案，使用临时目录，但通常只在与目标文件相同的目录中创建一个临时文件。在 tmp 是单独挂载点的系统上，重命名临时文件时避免额外的数据副本。请注意，如果评论列表很大，则没有重要的中间评论列表。

import os
import shutil

infile = 'data_with_comments.txt'
filename = 'test.txt'

tmpfile = filename + '.tmp'

try:
    # write wanted data to tempfile
    with open(tmpfile, 'w') as out_fp:
        # prepend comments from infle
        with open(infile) as in_fp:
            out_fp.writelines(filter(lambda l: l.startswith('#'), in_fp))
        # then add filename
        with open(filename) as in2_fp:
            shutil.copyfileobj(in2_fp, out_fp)
    # get rid of original data
    os.remove(filename)
    # replace with new data
    os.rename(tmpfile, filename)
finally:
    # cleanup on error
    if os.path.exists(tmpfile):
        os.remove(tmpfile)

Answer 4

遵循 Dilletant 的想法，

对于多个文本和只有一个评论文件，我们可以使用 shell 脚本来做到这一点：

# in the directory i have one file called   : comment
# and, other many files with file_extension : .txt

for file in *.txt; do cat comments "$file" > foo && mv foo "$file"; done

这会将相同的注释写入目录中的所有文件 (.txt)。

"copy only the comments from one file" 和 "prepend it into another file" 使用 python 的更好方法

Better way to "copy only the comments from one file" and "prepend it into another file" using python

python

file-io

prepend