在包含特定字符串的行中用单个 space 替换双 space

Question

我有一个包含行和列的大型文本文件。在文件中的所有 strings/data 之间，有一个双 space。但是，为了让我的特定代码正常工作，我需要双 spaces 仅在某些行中变为单个 spaces。这些行都以相同的字符串开头。

我试过：

with open(outfile) as f3, open(outfile2,'w') as f4:
    for line in f3:
         line = line.strip()
         if "SAMPLE" in line:
             " ".join(line.split())
         if 'xyz' not in line and len(line) >=46:
             f4.write(line+'\n')

我试过了：

import re
with open(outfile) as f3, open(outfile2,'w') as f4:
    for line in f3:
         if "SAMPLE" in line:
             re.sub("\s\s+" , " ", line)
         if 'xyz' not in line and len(line) >=46:
             f4.write(line)

都不行。第二个 if 语句删除一些我不想要的行，这样就不会消失（这按预期工作）。但是，文本文件中所有数据之间的双倍间距仍然存在。我怎样才能使文件中包含 "SAMPLE" 的行用单间距替换行中单词之间的双 spaces？

Answer 1

试试这个：

s = " ".join(your_string.split())

Answer 2

你的问题是字符串的可变性，" ".join(line.split()) 创建了一个新字符串，这很可能是你需要的，但你应该将它分配回 line 变量。

if "SAMPLE" in line:
    line = " ".join(line.split())

稍后编辑：
第二个 if 有点 "strange" ...预期的结果是什么？

if not line or (':' and len(line) >=46):
    f4.write(line)

尤其是第二部分... ':' 总是评估为 True，看起来没用，可能是打字错误或遗漏了什么。仅当 line 为空或 None（计算结果为 False）或行的长度为 >= 46.

时，这才会写入文件

代码应如下所示：

with open(outfile) as f3, open(outfile2,'w') as f4:
    for line in f3:
         line = line.strip()
         if "SAMPLE" in line:
             # we clean eventual double/multi-space if the line contains "SAMPLE"
             line = " ".join(line.split()) 
         if 'xyz' not in line and len(line) >=46:
             # write to the second file only the lines that
             # don't contain 'xyz' and have the length of the line => 46 
             f4.write(line+'\n')

在包含特定字符串的行中用单个 space 替换双 space

Replacing double space with single space in line containing certain string

python

regex

whitespace

split

removing-whitespace