txt 文件排序（每行中的键：值）- '\n' 的问题

Question

我正在尝试对看起来像这样的 txt 文件进行排序：

byr:1983 iyr:2017
pid:796082981 cid:129 eyr:2030
ecl:oth hgt:182cm

iyr:2019
cid:314
eyr:2039 hcl:#cfa07d hgt:171cm ecl:#0180ce byr:2006 pid:8204115568

byr:1991 eyr:2022 hcl:#341e13 iyr:2016 pid:729933757 hgt:167cm ecl:gry

hcl:231d64 cid:124 ecl:gmt eyr:2039
hgt:189in
pid:#9c3ea1

依此类推（+1000 行），到那个结构：

byr:value
iyr:value
eyr:value
hgt:value
hcl:value
ecl:value
pid:value
cid:value

byr:value
iyr:value
eyr:value
hgt:value
hcl:value
ecl:value
pid:value
cid:value

byr、iyr 等顺序无关紧要，但 key:value 的每个“集合”都必须用空行分隔。我的主要问题，如果我可以这样称呼它，是创建一段代码，当有多个 key:value 元素时正确排序文件，我设法取得了一些进展，但它仍然不是它应该是 - 以下代码：

result_file = open('testresult.txt', 'w')
#list_of_lines = [] testing purpose


with open('input.txt', 'r') as f:
    for line in f:
        if line == "\n":
            #list_of_lines.append('\n') testing
            result_file.writelines('\n')
        else:
            for i in line.split(' '):
                if i[-1] == "n":
                    result_file.write(i)
                else:
                    result_file.write(i + '\n')
                #print(i) testing purpose

正在制作结果如下：

byr:1983
iyr:2017

pid:796082981
cid:129
eyr:2030

ecl:oth
hgt:182cm


iyr:2019

cid:314

eyr:2039
hcl:#cfa07d
hgt:171cm
ecl:#0180ce
byr:2006
pid:8204115568


byr:1991
eyr:2022
hcl:#341e13
iyr:2016
pid:729933757
hgt:167cm
ecl:gry

如您所见，它无法正常工作——例如，在第一次出现的 byr 和第一次出现的 hgt 之间不应有空行，依此类推。在我看来，最后一个 if 语句

if i[-1] == "n":
    result_file.write(i)
else:
    result_file.write(i + '\n')

正在保护我免受这种情况的影响，但现在我完全不明白为什么不是我“预测”的那样。请帮忙。提前致谢 <3

Answer 1

试试这个 -

result_file = open('testresult.txt', 'w')
#list_of_lines = [] testing purpose


with open('input.txt', 'r') as f:
    for line in f:
        if line == '\n':
            #list_of_lines.append('\n') testing
            result_file.writelines('\n')
        else:
            # replace '\n' with ''
            line = line.replace('\n', '')
            for i in line.split(' '):
                result_file.writelines(i + '\n')

result_file.close()

Answer 2

试试这个

lines = []
with open("file.txt", "r") as f:
    lines = f.readlines()

print(lines)

splited_lines = []

for line in lines:
    [ splited_lines.append(splited) for splited in line.split(" ")]

print("splitted_lines")
print(splited_lines)

# notice every occurence in splitted_lines has a '\n', 
# that might be causing your more then on newline problem,
# lets remove that

cleaned_lines = []

[cleaned_lines.append(splited.strip("\n")) for splited in splited_lines]

print("Removed /n")
print(cleaned_lines)

with open("output.txt", "w") as f:
    for line in cleaned_lines:
        f.write(line+"\n")

在 file.txt 中有这个：

byr:1983 iyr:2017
pid:796082981 cid:129 eyr:2030
ecl:oth hgt:182cm

iyr:2019
cid:314
eyr:2039 hcl:#cfa07d hgt:171cm ecl:#0180ce byr:2006 pid:8204115568

byr:1991 eyr:2022 hcl:#341e13 iyr:2016 pid:729933757 hgt:167cm ecl:gry

hcl:231d64 cid:124 ecl:gmt eyr:2039
hgt:189in
pid:#9c3ea1

运行上面的脚本在 output.txt:

中给了我这个

byr:1983
iyr:2017
pid:796082981
cid:129
eyr:2030
ecl:oth
hgt:182cm

iyr:2019
cid:314
eyr:2039
hcl:#cfa07d
hgt:171cm
ecl:#0180ce
byr:2006
pid:8204115568

byr:1991
eyr:2022
hcl:#341e13
iyr:2016
pid:729933757
hgt:167cm
ecl:gry

hcl:231d64
cid:124
ecl:gmt
eyr:2039
hgt:189in
pid:#9c3ea1

希望这是您需要的？

Answer 3

您可以使用 replace 删除所有 \n。

result_file = open('testresult.txt', 'w')
#list_of_lines = [] testing purpose


with open('input.txt', 'r') as f:
    for line in f:
        line = line.replace('\n', '')
        if line != '':
            for i in line.split(' '):
                result_file.write(i+'\n')

这是结果：

byr:1983
iyr:2017
pid:796082981
cid:129
eyr:2030
ecl:oth
hgt:182cm
iyr:2019
cid:314
eyr:2039
hcl:#cfa07d
hgt:171cm
ecl:#0180ce
byr:2006
pid:8204115568
byr:1991
eyr:2022
hcl:#341e13
iyr:2016
pid:729933757
hgt:167cm
ecl:gry
hcl:231d64
cid:124
ecl:gmt
eyr:2039
hgt:189in
pid:#9c3ea1

Answer 4

正则表达式可能有助于实现您的结果，而不会被行尾字符打扰。

假设您的对中没有空格，您可以使用以下脚本：

import re
from contextlib import ExitStack

REGEX = re.compile(r"[^:\s]+:\S+")
with ExitStack() as stack:
    fr = stack.enter_context(open(input, encoding="UTF_8"))
    fw = stack.enter_context(open(output, mode="w", encoding="UTF_8"))
    for line in fr:
        match = REGEX.match(line)
        if not match:
            fw.write("\n")
            continue
        for item in REGEX.findall(line):
            fw.write(f"{item}\n")

正则表达式可帮助您搜索“任何不是分号，也不是空白字符，后跟分号的任何内容。然后是任何不是空白字符的内容”。这允许脚本只关注对。

空白字符包括空格、制表符和行尾字符。

ExitStack 功能有助于优化两个上下文管理器的使用。

txt 文件排序（每行中的键：值）- '\n' 的问题

txt file sorting(key:value in every line) - a problem with '\n'

python

sorting

file

txt