在 Python 中将文本文件转换为 YAML

Converting text file to YAML in Python

我有一个文本文件要转换为 YAML 格式。这里有一些注释可以更好地描述问题:

格式示例如下:

file_content = '''
    Section section_1
        section_1_subheading1 = text
        section_1_subheading2 = bool
    end
    Section section_2
       section_2_subheading3 = int
       section_2_subheading4 = double
       section_2_subheading5 = bool
       section_2_subheading6 = text
       section_2_subheading7 = datetime
    end
    Section section_3
       section_3_subheading8 = numeric
       section_3_subheading9 = int
    end
'''

我尝试通过以下方式将文本转换为 YAML 格式:

  1. 使用正则表达式将等号替换为冒号。
  2. Section section_name 替换为 section_name :
  3. 删除每个部分之间的 end

但是,我在#2 和#3 上遇到了困难。这是我目前创建的 text-to-YAML 函数:

import yaml
import re

def convert_txt_to_yaml(file_content):
    """Converts a text file to a YAML file"""

    # Replace "=" with ":"
    file_content2 = file_content.replace("=", ":")

    # Split the lines 
    lines = file_content2.splitlines()

    # Define section headings to find and replace
    section_names = "Section "
    section_headings = r"(?<=Section )(.*)$"
    section_colons = r" : "
    end_names = "end"

    # Convert to YAML format, line-by-line
    for line in lines:
        add_colon = re.sub(section_headings, section_colons, line) # Add colon to end of section name
        remove_section_word = re.sub(section_names, "", add_colon) # Remove "Section " in section header
        line = re.sub(end_names, "", remove_section_word)          # Remove "end" between sections

    # Join lines back together
    converted_file = "\n".join(lines)
    return converted_file

我认为问题出在 for 循环中 - 我无法弄清楚为什么 headers 部分和结尾没有改变。如果我测试它,它打印完美,但线条本身没有保存。

我要找的输出格式如下:

file_content = '''
    section_1 :
        section_1_subheading1 : text
        section_1_subheading2 : bool
    section_2 :
        section_2_subheading3 : int
        section_2_subheading4 : double
        section_2_subheading5 : bool
        section_2_subheading6 : text
        section_2_subheading7 : datetime
    section_3 :
        section_3_subheading8 : numeric
        section_3_subheading9 : int
'''

我宁愿将其转换为 dict,然后使用 python 中的 yaml 包将其格式化为 yaml,如下所示:

import yaml
def convert_txt_to_yaml(file_content):
    """Converts a text file to a YAML file"""
    config_dict = {}
    
    # Split the lines 
    lines = file_content.splitlines()
    section_title=None
    for line in lines:
        if line=='\n':
            continue
        elif re.match('.*end$', line):
            #End of section
            section_title=None
        elif re.match('.*Section\s+.*', line):
            #Start of Section
            match_obj =  re.match(".*Section\s+(.*)", line)
            section_title=match_obj.groups()[0]
            config_dict[section_title] = {}
        elif section_title and re.match(".*{}_.*\s+=.*".format(section_title), line):
            match_obj =  re.match(".*{}_(.*)\s+=(.*)".format(section_title), line)            
            config_dict[section_title][match_obj.groups()[0]] = match_obj.groups()[1]
    return yaml.dump(config_dict )