Python 在特定字符串出现后提取数字

Question

我是 python 的新手，想编写一个脚本从一堆文件中提取一些数字。这是我正在尝试做的一个代表性示例：

File_name_1: Bob-01
File content: 
...(Lots of text)
Tea cups: 3
Tea cups: 4
Tea cups: 6
...(Lots of text)
Completed the first task, proceed to the next task.
...(Lots of text)
Tea cups: 7
Termination

假设我们还有另一个文件：

File_name_2: Bob-02
File content: 
...(Lots of text)
Tea cups: 2
Tea cups: 7
Tea cups: 3
Tea cups: 8
...(Lots of text)
Completed the first task, proceed to the next task.
...(Lots of text)
Tea cups: 1
Termination.

目前我已经编写了提取文件名（例如Bob-01）、每个 Bob 后面的数字（例如01）和文件内容（例如每一行）的代码在第一个文件中）并存储在一个名为 list_of_file

的变量中

print list_of_file

[["Bob-01"], 
  01,
 [".......", "Tea Cups: 3", "Tea Cups: 4", "Tea cups: 6", "....", "Completed the first task, proceed to the next task.", "....", "Tea cups: 7", "Termination"],
 ["Bob-02"], 
  02,
 [".......", "Tea Cups: 2", "Tea Cups: 7", "Tea cups: 3", "Tea cups: 8", "....", "Completed the first task, proceed to the next task.", "....", "Tea cups: 1", "Termination]]

我想做的是提取每个文件中"Complete first task, proceed to the next task."行后的茶杯数。所以我写了下面的代码：

def get_tea_cups (list_of_files):
   list_of_cup = []
   for line in file[2]:
      if "Completed the first task" in line:
         for line in file[2]:
            if "Tea cups:" in line:
              tea_cups_line = line.split()
              cup_num = tea_cups_line [2]
              list_of_cup.append(file[0], file[1], cup_num)
   return list_of_cup

我的思路：如果我能在list_of_file中找到"Complete first task"，那么我希望能提取出茶杯的数量（例如，Bob-01 为 7 个，Bob-02 为 1 个）在包含 "Complete first task" 的字符串出现之后。然而，当我执行我的代码时，我似乎已经提取了所有数量的茶杯，这不是我想要的。

我认为发生这种情况的原因是因为 if 语句始终为真，所以我最终提取了所有数量的茶杯。

有什么办法可以解决这个问题吗？？我知道如果我只对一个文件进行提取，我可以将找到的所有茶杯编号存储为一个列表并取最后一个值（通过向后切片）。当我对多个文件执行提取时，我可以做类似的事情吗？

我试着环顾四周，但还没有找到任何有用的东西。如果您遇到任何与此问题相关的问题，请 post 下面的 link。

谢谢！！

Answer 1

更新代码：我会做什么：

.....

for i, line in enumerate(file[2]):
    if "Completed the first task" in line:
         for j in xrange(i+1, len(file[2]):
            if "Tea cups:" in file[2][j]:
              tea_cups_line = file[2][j].split()
              cup_num = tea_cups_line [2]
              list_of_cup.append(file[0], file[1], cup_num)
              break
return list_of_cup

这就像你的想法，但我的代码计算了文件 [2] 中的变体。当get'Completed the first task'从text开始下一个is，再次进行for循环直到find'Tea cups'。拿号休息。

为我的英语道歉，希望对您有所帮助

Answer 2

是的，有办法。我建议您向后阅读文件，找到第一次出现的茶，然后分解并解析下一个文件。 我的解决方案假定您的文件适合内存。很可能这可能需要一段时间才能读取大文件

您可以通过以下方式从头读取文件：

for line in reversed(list(open("filename"))):
    print(line.rstrip())

现在，要只获得所需的茶杯，您可以这样做：

cups = []
for line in reversed(list(open("filename"))):
    if "Tea cups" in line.rstrip():
        cups.append(line.rstrip().split()[2])
        break
print(cups)

Python 在特定字符串出现后提取数字

Python extract numbers after a specific string has appeared

iteration

for-loop

data-extraction

python-2.7