如何使用脚本语言提取文本文件中的多个文本

How to extract multiple text on a text file with a scripting language

很可能有人问过这个问题,但我无法很好地理解那里的代码来实现我所追求的目标。

我有一个包含 1000 个条目的文本文件,例如以下 3 个连续的条目。从我想提取的那个文本文件中

number.xml及其对应的当前视频时序:1280x720p 60Hz 并把它一个一个地吐到一个文本文件上。

Report complete.on: E01A040E.xml
EDID raw data:
---  00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
000  00 FF FF FF FF FF FF 00 4C 2D FC 08 00 00 00 00 
010  29 15 01 03 80 10 09 78 0A EE 91 A3 54 4C 99 26 
020  0F 50 54 BD EE 00 81 C0 01 01 01 01 01 01 01 01 
030  01 01 01 01 01 01 66 21 56 AA 51 00 1E 30 46 8F 
040  33 00 A0 5A 00 00 00 1E 01 1D 00 72 51 D0 1E 20 
050  6E 28 55 00 A0 5A 00 00 00 1E 00 00 00 FD 00 18 
060  4B 0F 44 17 00 0A 20 20 20 20 20 20 00 00 00 FC 
070  00 53 41 4D 53 55 4E 47 0A 20 20 20 20 20 01 6D 
080  02 03 1F F1 47 84 05 03 10 20 22 07 23 09 07 07 
090  83 01 00 00 E2 00 0F 67 03 0C 00 10 00 B8 2D 01 
0A0  1D 80 18 71 1C 16 20 58 2C 25 00 A0 5A 00 00 00 
0B0  9E 8C 0A D0 8A 20 E0 2D 10 10 3E 96 00 A0 5A 00 
0C0  00 00 18 02 3A 80 18 71 38 2D 40 58 2C 45 00 A0 
0D0  5A 00 00 00 1E 00 00 00 00 00 00 00 00 00 00 00 
0E0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0F0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FD 
+5V: OK
Video information:
- Current video timing: 1280x720p 60Hz
- Incoming video matches CEA-861 VIC 4 and 69 exactly
- HDMI video detected
- Received AVI VIC 4
- Color space: YCbCr 4:4:4 8 bpc
# 
EDID description: E01A0A8A.xml
EDID raw data:
---  00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
000  00 FF FF FF FF FF FF 00 4C 2D FC 08 00 00 00 00 
010  29 15 01 03 80 10 09 78 0A EE 91 A3 54 4C 99 26 
020  0F 50 54 BD EE 00 81 C0 01 01 01 01 01 01 01 01 
030  01 01 01 01 01 01 66 21 56 AA 51 00 1E 30 46 8F 
040  33 00 A0 5A 00 00 00 1E 01 1D 00 72 51 D0 1E 20 
050  6E 28 55 00 A0 5A 00 00 00 1E 00 00 00 FD 00 18 
060  4B 0F 44 17 00 0A 20 20 20 20 20 20 00 00 00 FC 
070  00 53 41 4D 53 55 4E 47 0A 20 20 20 20 20 01 6D 
080  02 03 1F F1 47 84 05 03 10 20 22 07 23 09 07 07 
090  83 01 00 00 E2 00 0F 67 03 0C 00 10 00 B8 2D 01 
0A0  1D 80 18 71 1C 16 20 58 2C 25 00 A0 5A 00 00 00 
0B0  9E 8C 0A D0 8A 20 E0 2D 10 10 3E 96 00 A0 5A 00 
0C0  00 00 18 02 3A 80 18 71 38 2D 40 58 2C 45 00 A0 
0D0  5A 00 00 00 1E 00 00 00 00 00 00 00 00 00 00 00 
0E0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0F0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FD 
+5V: OK
Video information:
- Current video timing: 1280x720p 60Hz
- Incoming video matches CEA-861 VIC 4 and 69 exactly
- HDMI video detected
- Received AVI VIC 4
- Color space: YCbCr 4:4:4 8 bpc
# 
EDID description: E01A0C88.xml
EDID raw data:
---  00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
000  00 FF FF FF FF FF FF 00 08 59 42 00 01 00 00 00 
010  01 16 01 03 80 45 27 78 0A D0 DD A9 53 49 9D 23 
020  11 47 4A A3 08 00 81 C0 81 00 81 0F 81 40 81 80 
030  95 00 B3 00 01 01 52 35 80 80 70 38 1F 40 20 20 
040  13 00 C4 8E 21 00 00 1E 46 20 00 A4 51 00 2A 30 
050  50 80 37 00 20 46 21 00 00 1A 00 00 00 FC 00 4E 
060  53 2D 33 32 4C 32 34 30 41 31 33 0A 00 00 00 FD 
070  00 37 4C 1E 50 11 00 0A 20 20 20 20 20 20 01 23 
080  02 03 20 73 48 05 04 03 02 01 06 07 90 26 09 07 
090  07 15 07 50 83 01 00 00 67 03 0C 00 10 00 B8 2D 
0A0  01 1D 00 72 51 D0 1E 20 6E 28 55 00 C4 8E 21 00 
0B0  00 1E 8C 0A D0 8A 20 E0 2D 10 10 3E 96 00 13 8E 
0C0  21 00 00 18 01 1D 80 18 71 1C 16 20 58 2C 25 00 
0D0  C4 8E 21 00 00 9E 00 00 00 00 00 00 00 00 00 00 
0E0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0F0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 53 
+5V: OK
Video information:
- Current video timing: 1280x720p 60Hz
- Incoming video matches CEA-861 VIC 4 and 69 exactly
- HDMI video detected
- Received AVI VIC 4
- Color space: YCbCr 4:4:4 8 bpc

这是我目前拥有的代码,但对我没有用。这是在 python 中完成的,但如果它可以完成任何其他脚本语言。只是我不习惯编写脚本。向伸出援助之手的人致以百万谢意。

#!/usr/bin/env python

inFile = open("batch01.txt")
outFile = open("result.txt", "w")

with open('batch01.txt') as infile, open('result.txt', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "EDID description:":
            copy = True
        elif line.strip() == "- Current video timing":
            copy = True
        elif copy:
            outfile.write(line)

inFile.close()
outFile.close()

我会使用 regular expressions 来做到这一点:

#!/usr/bin/env python

import re

with open('batch01.txt') as infile, open('result.txt', 'w') as outfile:
    for line in infile:
        m = re.search('(EDID description|- Current video timing): (.*)', line)
        if m is not None:
            outfile.write(m.group(2) + '\n') 

这将打印出来

1280x720p 60Hz
E01A0A8A.xml
1280x720p 60Hz
E01A0C88.xml
1280x720p 60Hz

看来您实际上需要检查给定行是否以某些子字符串开头,而不是完全精确的比较(这是 == 运算符给您的)。相反,你的 for 循环应该使用 startswith 方法来查看行的开头并更接近于此:

for line in infile:
    if line.strip().startswith("EDID description:"):
        copy = True
    elif line.strip().startswith("Report complete.on:"):  # Based on your data, it seems like you need to check for these as well - maybe not?
        copy = True
    elif line.strip().startswith("- Current video timing"):
        copy = True
    else:
        copy = False
    if copy:
        outfile.write(line)

但是可以显着简化循环:

prefixes = [
 "EDID description:", "Report complete.on:", "- Current video timing"
]
for line in infile:
    for prefix in prefixes:
        if line.strip().startswith(prefix):
            outfile.write(line)
            break

这消除了多分支 if/elif 结构以及布尔 copy 标志。

根据你的示例输入数据,我在结果文件中得到了这个:

Report complete.on: E01A040E.xml
- Current video timing: 1280x720p 60Hz
EDID description: E01A0A8A.xml
- Current video timing: 1280x720p 60Hz
EDID description: E01A0C88.xml
- Current video timing: 1280x720p 60Hz

同样,我不确定您是否想要那条 "Report complete.on" 行,但在我看来就像您想要的那样。

如果没有 运行 这个,我看到三个问题需要您解决。

  1. 您在 if/elif 控制结构中的相等条件永远不会为真。
  2. 暂时假设我在第 1 条中描述的内容是正确的;实际上不会达到最终的 elif,因为一旦前面的表达式中的任何一个计算为真,其余 if/elif 就会短路并且不会检查其他条件。
  3. 您需要在 for 循环的下一次迭代之前 "reset" copy = False 否则在第一次设置 copy = True 后,copy 将对所有内容保持为真。

建议的修复:

  1. 使用类似“line.strip().find('EDID description:')”的内容来确定 行是否包含 您要查找的字符串。同样,需要判断子串是行,不等于行
  2. 您需要将复制操作移到 if/elif 结构之外。也就是说,与其将其作为同一 if/elif 结构的一部分,不如在当前结构之后创建一个单独的 'if copy:' 结构,以便在找到该行时输出该行。
  3. 输出该行后,设置 copy=False 以便正确初始化 for 循环的下一次迭代。否则,您将在第一场比赛打印后打印每一行。

类似这样的事情:(我还没有实际测试过...)

#!/usr/bin/env python

inFile = open("batch01.txt")
outFile = open("result.txt", "w")

with open('batch01.txt') as infile, open('result.txt', 'w') as outfile:
    copy = False
    for line in infile:
        # use the find to see if the line CONTAINS the string you are looking for
        if line.strip().find("EDID description:"):
            copy = True
        elif line.strip().find("- Current video timing"):
            copy = True

        # make this a separate if
        if copy:
            outfile.write(line)

        # reset this to False to it can be evaluated and set properly in the next iteration
        copy = False

inFile.close()
outFile.close()

希望对您有所帮助。

下面有一个批处理文件.bat的解决方案,我认为更简单...

编辑程序已按评论要求修改

@echo off
(for /F "tokens=3-6" %%a in ('findstr /L ".xml Current" batch01.txt') do (
   if "%%b" equ "" (
      set /P "=%%a - " < NUL
   ) else (
      if "%%c" equ "" (
         echo No VSYNC detected
      ) else (
         echo Current video timing: %%c %%d
      )
   )
)) > result.txt

输出:

E01A040E.xml - Current video timing: 1280x720p 60Hz
E01A0A8A.xml - Current video timing: 1280x720p 60Hz
E01A0C88.xml - Current video timing: 1280x720p 60Hz