如何读取 Python 中由制表符分隔的文件值？

Question

我有一个这种格式的文本文件

ConfigFile 1.1
;
;     Version: 4.0.32.1
;     Date="2021/04/08" Time="11:54:46" UTC="8"
;
Name
    John Legend
Type
    Student
Number
    s1054520

我想获得 Name 或 Type 或 Number 的值我如何得到它？我试过这个方法，但是没有解决我的问题。

import re
f = open("Data.txt", "r")
file = f.read()
Name = re.findall("Name", file)
print(Name)

我的期望输出是John Legend 任何人都可以帮助我。我真的很感激。谢谢

Answer 1

这里我逐行遍历，当我遇到Name我会把下一行（你也可以直接打印）添加到结果列表中：

import re


def print_hi(name):

    result = []
    regexp = re.compile(r'Name*')
    gotname = False;
    with open('test.txt') as f:
        for line in f:
            if gotname:
                result.append(line.strip())
                gotname = False
            match = regexp.match(line)
            if match:
                gotname = True

    print(result)

if __name__ == '__main__':
    print_hi('test')

Answer 2

首先，re.findall 用于搜索与给定模式匹配的“所有”事件。所以在你的情况下。您正在文件中找到每个“名称”。因为这就是你要找的。另一方面，计算机不会知道“John Legend”这个名字。它只会知道那是“名称”一词之后的那一行。在你的情况下，我建议你可以检查这个 link.

找到“姓名”的行号
阅读下一行
得到没有白色的名字space
如果姓名多于1个。这也行得通

最终代码是这样的

def search_string_in_file(file_name, string_to_search):
    """Search for the given string in file and return lines containing that string,
    along with line numbers"""
    line_number = 0
    list_of_results = []
    # Open the file in read only mode
    with open(file_name, 'r') as read_obj:
        # Read all lines in the file one by one
        for line in read_obj:
            # For each line, check if line contains the string
            line_number += 1
            if string_to_search in line:
                # If yes, then add the line number & line as a tuple in the list
                list_of_results.append((line_number, line.rstrip()))
    # Return list of tuples containing line numbers and lines where string is found
    return list_of_results

file = open('Data.txt')
content = file.readlines()
matched_lines = search_string_in_file('Data.txt', 'Name')
print('Total Matched lines : ', len(matched_lines))
for i in matched_lines:
    print(content[i[0]].strip())

Answer 3

我不会使用正则表达式，而是为文件类型创建一个解析器。规则可能是：

第一行可以忽略
任何以 ; 开头的行都可以忽略。
没有前导空格的每一行都是一个键
前导空格的每一行都是属于最后一行的值键

我将从一个生成器开始，它可以 return 给你任何未忽略的行：

def read_data_lines(filename):
    with open(filename, "r") as f:
        # skip the first line
        f.readline()

        # read until no more lines
        while line := f.readline():
            # skip lines that start with ;
            if not line.startswith(";"):
                yield line

然后按照规则 3 和 4 填写字典：

def parse_data_file(filename):
    data = {}
    key = None
    for line in read_data_lines(filename):
        # No starting whitespace makes this a key
        if not line.startswith(" "):
            key = line.strip()

        # Starting whitespace makes this a value for the last key
        else:
            data[key] = line.strip()

    return data

此时您可以解析文件并打印您想要的任何密钥：

data = parse_data_file("Data.txt")
print(data["Name"])

Answer 4

假设这些标签行在文件中找到的序列中可以简单地扫描它们：

labelList = ["Name","Type","Number"]
captures = dict()
with open("Data.txt","rt") as f:
    for label in labelList:
        while not f.readline().startswith(label):
            pass
        captures[label] = f.readline().strip()
for label in labelList:
    print(f"{label} : {captures[label]}")

如何读取 Python 中由制表符分隔的文件值？

How to read a value of file separate by tabs in Python?

python

string

text

readfile