检测文本文件中特定字符串模式之间的行数，并根据 python 中的输出进行分类

Question

我有一个文本文件，我试图根据单词 'START' 和 'END /' 的行之间的行数对其进行分类。 I/p 文件结构：

  START               
  Action1
  Action2 
  Action3
  END /

  START
  Action1 
  END /

  START                  
  Action1
  Action2
  END /

  START  
  Action0              
  Action1
  Action2 
  Action3
  END /

  START
  Action1 
  END /

代码应检测 'START' 和 'END /' 之间的行数并按以下方式分类：如果只有 1 个行动线，则 'P1' ；如果不止一个行动线那么 'P2'

所以描述的 i/p 文件的输出可以给出为：

['P2', 'P1', 'P2', 'P2', 'P1']

最终目标是将此输出列表导出到 excel 列（如图所示）。我相信这可以在 pandas 库的帮助下完成，但是，我们将不胜感激。

Category
P2
P1
P2
P2
P1

最初我能够打印出整个文件对应的行号，所以也在考虑提取行号。但是，由于 Actions 行的数量不同，这个想法存在一个重大缺陷。

with open('filepath.txt') as f:
    for index, line in enumerate(f):
        print("Line {}: {}".format(index, line.strip()))

最初有缺陷的想法输出：

Line 0: 
Line 1: A
Line 2: Action1
Line 3: Action2
Line 4: Action3
Line 5: B
Line 6: 
Line 7: A
Line 8: Action1
Line 9: B
Line 10: 
Line 11: A
Line 12: Action1
Line 13: Action1
Line 14: B
Line 15: 
Line 16: A
Line 17: Action0
Line 18: Action1
Line 19: Action2
Line 20: Action3
Line 21: B

然后我想出了检测初始（START）和最终（END）模式的想法，计算它们之间的行数并用if else语句可以分配P1或P2类别。目前坚持实施一种计算模式内行数的方法。

任何有关代码的帮助都会有所帮助，谢谢！

Answer 1

如果文件数据正是您在问题中提到的内容，那么下面的代码应该可以工作。

import pandas as pd

result = []
fp = 'your_file.txt'                       # change this

with open(fp) as file:
    file_content = file.read().splitlines()
    count = 0

    # this is the logic you were after:
    for item in file_content:
        if item.strip() == 'START':
            count = 0
        elif item.strip() == 'END /':
            if count <= 1:
                result.append('P1')
            else:
                result.append('P2')
        else:
            count += 1

print(result)

dataframe = pd.DataFrame(result, columns=['Category'])

# Note: Pandas module needs openpyxl module installed for this next step
dataframe.to_excel('excel.xlsx', index=False)

检测文本文件中特定字符串模式之间的行数，并根据 python 中的输出进行分类

detect number of lines in a text file between specific string pattern and categorize based on the output in python

python

excel

text

list