如何使用脚本语言提取文本文件中的多个文本
How to extract multiple text on a text file with a scripting language
很可能有人问过这个问题,但我无法很好地理解那里的代码来实现我所追求的目标。
我有一个包含 1000 个条目的文本文件,例如以下 3 个连续的条目。从我想提取的那个文本文件中
number.xml及其对应的当前视频时序:1280x720p 60Hz
并把它一个一个地吐到一个文本文件上。
Report complete.on: E01A040E.xml
EDID raw data:
--- 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
000 00 FF FF FF FF FF FF 00 4C 2D FC 08 00 00 00 00
010 29 15 01 03 80 10 09 78 0A EE 91 A3 54 4C 99 26
020 0F 50 54 BD EE 00 81 C0 01 01 01 01 01 01 01 01
030 01 01 01 01 01 01 66 21 56 AA 51 00 1E 30 46 8F
040 33 00 A0 5A 00 00 00 1E 01 1D 00 72 51 D0 1E 20
050 6E 28 55 00 A0 5A 00 00 00 1E 00 00 00 FD 00 18
060 4B 0F 44 17 00 0A 20 20 20 20 20 20 00 00 00 FC
070 00 53 41 4D 53 55 4E 47 0A 20 20 20 20 20 01 6D
080 02 03 1F F1 47 84 05 03 10 20 22 07 23 09 07 07
090 83 01 00 00 E2 00 0F 67 03 0C 00 10 00 B8 2D 01
0A0 1D 80 18 71 1C 16 20 58 2C 25 00 A0 5A 00 00 00
0B0 9E 8C 0A D0 8A 20 E0 2D 10 10 3E 96 00 A0 5A 00
0C0 00 00 18 02 3A 80 18 71 38 2D 40 58 2C 45 00 A0
0D0 5A 00 00 00 1E 00 00 00 00 00 00 00 00 00 00 00
0E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FD
+5V: OK
Video information:
- Current video timing: 1280x720p 60Hz
- Incoming video matches CEA-861 VIC 4 and 69 exactly
- HDMI video detected
- Received AVI VIC 4
- Color space: YCbCr 4:4:4 8 bpc
#
EDID description: E01A0A8A.xml
EDID raw data:
--- 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
000 00 FF FF FF FF FF FF 00 4C 2D FC 08 00 00 00 00
010 29 15 01 03 80 10 09 78 0A EE 91 A3 54 4C 99 26
020 0F 50 54 BD EE 00 81 C0 01 01 01 01 01 01 01 01
030 01 01 01 01 01 01 66 21 56 AA 51 00 1E 30 46 8F
040 33 00 A0 5A 00 00 00 1E 01 1D 00 72 51 D0 1E 20
050 6E 28 55 00 A0 5A 00 00 00 1E 00 00 00 FD 00 18
060 4B 0F 44 17 00 0A 20 20 20 20 20 20 00 00 00 FC
070 00 53 41 4D 53 55 4E 47 0A 20 20 20 20 20 01 6D
080 02 03 1F F1 47 84 05 03 10 20 22 07 23 09 07 07
090 83 01 00 00 E2 00 0F 67 03 0C 00 10 00 B8 2D 01
0A0 1D 80 18 71 1C 16 20 58 2C 25 00 A0 5A 00 00 00
0B0 9E 8C 0A D0 8A 20 E0 2D 10 10 3E 96 00 A0 5A 00
0C0 00 00 18 02 3A 80 18 71 38 2D 40 58 2C 45 00 A0
0D0 5A 00 00 00 1E 00 00 00 00 00 00 00 00 00 00 00
0E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FD
+5V: OK
Video information:
- Current video timing: 1280x720p 60Hz
- Incoming video matches CEA-861 VIC 4 and 69 exactly
- HDMI video detected
- Received AVI VIC 4
- Color space: YCbCr 4:4:4 8 bpc
#
EDID description: E01A0C88.xml
EDID raw data:
--- 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
000 00 FF FF FF FF FF FF 00 08 59 42 00 01 00 00 00
010 01 16 01 03 80 45 27 78 0A D0 DD A9 53 49 9D 23
020 11 47 4A A3 08 00 81 C0 81 00 81 0F 81 40 81 80
030 95 00 B3 00 01 01 52 35 80 80 70 38 1F 40 20 20
040 13 00 C4 8E 21 00 00 1E 46 20 00 A4 51 00 2A 30
050 50 80 37 00 20 46 21 00 00 1A 00 00 00 FC 00 4E
060 53 2D 33 32 4C 32 34 30 41 31 33 0A 00 00 00 FD
070 00 37 4C 1E 50 11 00 0A 20 20 20 20 20 20 01 23
080 02 03 20 73 48 05 04 03 02 01 06 07 90 26 09 07
090 07 15 07 50 83 01 00 00 67 03 0C 00 10 00 B8 2D
0A0 01 1D 00 72 51 D0 1E 20 6E 28 55 00 C4 8E 21 00
0B0 00 1E 8C 0A D0 8A 20 E0 2D 10 10 3E 96 00 13 8E
0C0 21 00 00 18 01 1D 80 18 71 1C 16 20 58 2C 25 00
0D0 C4 8E 21 00 00 9E 00 00 00 00 00 00 00 00 00 00
0E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 53
+5V: OK
Video information:
- Current video timing: 1280x720p 60Hz
- Incoming video matches CEA-861 VIC 4 and 69 exactly
- HDMI video detected
- Received AVI VIC 4
- Color space: YCbCr 4:4:4 8 bpc
这是我目前拥有的代码,但对我没有用。这是在 python 中完成的,但如果它可以完成任何其他脚本语言。只是我不习惯编写脚本。向伸出援助之手的人致以百万谢意。
#!/usr/bin/env python
inFile = open("batch01.txt")
outFile = open("result.txt", "w")
with open('batch01.txt') as infile, open('result.txt', 'w') as outfile:
copy = False
for line in infile:
if line.strip() == "EDID description:":
copy = True
elif line.strip() == "- Current video timing":
copy = True
elif copy:
outfile.write(line)
inFile.close()
outFile.close()
我会使用 regular expressions 来做到这一点:
#!/usr/bin/env python
import re
with open('batch01.txt') as infile, open('result.txt', 'w') as outfile:
for line in infile:
m = re.search('(EDID description|- Current video timing): (.*)', line)
if m is not None:
outfile.write(m.group(2) + '\n')
这将打印出来
1280x720p 60Hz
E01A0A8A.xml
1280x720p 60Hz
E01A0C88.xml
1280x720p 60Hz
看来您实际上需要检查给定行是否以某些子字符串开头,而不是完全精确的比较(这是 ==
运算符给您的)。相反,你的 for
循环应该使用 startswith
方法来查看行的开头并更接近于此:
for line in infile:
if line.strip().startswith("EDID description:"):
copy = True
elif line.strip().startswith("Report complete.on:"): # Based on your data, it seems like you need to check for these as well - maybe not?
copy = True
elif line.strip().startswith("- Current video timing"):
copy = True
else:
copy = False
if copy:
outfile.write(line)
但是可以显着简化循环:
prefixes = [
"EDID description:", "Report complete.on:", "- Current video timing"
]
for line in infile:
for prefix in prefixes:
if line.strip().startswith(prefix):
outfile.write(line)
break
这消除了多分支 if
/elif
结构以及布尔 copy
标志。
根据你的示例输入数据,我在结果文件中得到了这个:
Report complete.on: E01A040E.xml
- Current video timing: 1280x720p 60Hz
EDID description: E01A0A8A.xml
- Current video timing: 1280x720p 60Hz
EDID description: E01A0C88.xml
- Current video timing: 1280x720p 60Hz
同样,我不确定您是否想要那条 "Report complete.on" 行,但在我看来就像您想要的那样。
如果没有 运行 这个,我看到三个问题需要您解决。
- 您在 if/elif 控制结构中的相等条件永远不会为真。
- 暂时假设我在第 1 条中描述的内容是正确的;实际上不会达到最终的 elif,因为一旦前面的表达式中的任何一个计算为真,其余 if/elif 就会短路并且不会检查其他条件。
- 您需要在 for 循环的下一次迭代之前 "reset" copy = False 否则在第一次设置 copy = True 后,copy 将对所有内容保持为真。
建议的修复:
- 使用类似“line.strip().find('EDID description:')”的内容来确定 行是否包含 您要查找的字符串。同样,需要判断子串是在行,不等于行
- 您需要将复制操作移到 if/elif 结构之外。也就是说,与其将其作为同一 if/elif 结构的一部分,不如在当前结构之后创建一个单独的 'if copy:' 结构,以便在找到该行时输出该行。
- 输出该行后,设置 copy=False 以便正确初始化 for 循环的下一次迭代。否则,您将在第一场比赛打印后打印每一行。
类似这样的事情:(我还没有实际测试过...)
#!/usr/bin/env python
inFile = open("batch01.txt")
outFile = open("result.txt", "w")
with open('batch01.txt') as infile, open('result.txt', 'w') as outfile:
copy = False
for line in infile:
# use the find to see if the line CONTAINS the string you are looking for
if line.strip().find("EDID description:"):
copy = True
elif line.strip().find("- Current video timing"):
copy = True
# make this a separate if
if copy:
outfile.write(line)
# reset this to False to it can be evaluated and set properly in the next iteration
copy = False
inFile.close()
outFile.close()
希望对您有所帮助。
下面有一个批处理文件.bat的解决方案,我认为更简单...
编辑:程序已按评论要求修改
@echo off
(for /F "tokens=3-6" %%a in ('findstr /L ".xml Current" batch01.txt') do (
if "%%b" equ "" (
set /P "=%%a - " < NUL
) else (
if "%%c" equ "" (
echo No VSYNC detected
) else (
echo Current video timing: %%c %%d
)
)
)) > result.txt
输出:
E01A040E.xml - Current video timing: 1280x720p 60Hz
E01A0A8A.xml - Current video timing: 1280x720p 60Hz
E01A0C88.xml - Current video timing: 1280x720p 60Hz
很可能有人问过这个问题,但我无法很好地理解那里的代码来实现我所追求的目标。
我有一个包含 1000 个条目的文本文件,例如以下 3 个连续的条目。从我想提取的那个文本文件中
number.xml及其对应的当前视频时序:1280x720p 60Hz 并把它一个一个地吐到一个文本文件上。
Report complete.on: E01A040E.xml
EDID raw data:
--- 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
000 00 FF FF FF FF FF FF 00 4C 2D FC 08 00 00 00 00
010 29 15 01 03 80 10 09 78 0A EE 91 A3 54 4C 99 26
020 0F 50 54 BD EE 00 81 C0 01 01 01 01 01 01 01 01
030 01 01 01 01 01 01 66 21 56 AA 51 00 1E 30 46 8F
040 33 00 A0 5A 00 00 00 1E 01 1D 00 72 51 D0 1E 20
050 6E 28 55 00 A0 5A 00 00 00 1E 00 00 00 FD 00 18
060 4B 0F 44 17 00 0A 20 20 20 20 20 20 00 00 00 FC
070 00 53 41 4D 53 55 4E 47 0A 20 20 20 20 20 01 6D
080 02 03 1F F1 47 84 05 03 10 20 22 07 23 09 07 07
090 83 01 00 00 E2 00 0F 67 03 0C 00 10 00 B8 2D 01
0A0 1D 80 18 71 1C 16 20 58 2C 25 00 A0 5A 00 00 00
0B0 9E 8C 0A D0 8A 20 E0 2D 10 10 3E 96 00 A0 5A 00
0C0 00 00 18 02 3A 80 18 71 38 2D 40 58 2C 45 00 A0
0D0 5A 00 00 00 1E 00 00 00 00 00 00 00 00 00 00 00
0E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FD
+5V: OK
Video information:
- Current video timing: 1280x720p 60Hz
- Incoming video matches CEA-861 VIC 4 and 69 exactly
- HDMI video detected
- Received AVI VIC 4
- Color space: YCbCr 4:4:4 8 bpc
#
EDID description: E01A0A8A.xml
EDID raw data:
--- 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
000 00 FF FF FF FF FF FF 00 4C 2D FC 08 00 00 00 00
010 29 15 01 03 80 10 09 78 0A EE 91 A3 54 4C 99 26
020 0F 50 54 BD EE 00 81 C0 01 01 01 01 01 01 01 01
030 01 01 01 01 01 01 66 21 56 AA 51 00 1E 30 46 8F
040 33 00 A0 5A 00 00 00 1E 01 1D 00 72 51 D0 1E 20
050 6E 28 55 00 A0 5A 00 00 00 1E 00 00 00 FD 00 18
060 4B 0F 44 17 00 0A 20 20 20 20 20 20 00 00 00 FC
070 00 53 41 4D 53 55 4E 47 0A 20 20 20 20 20 01 6D
080 02 03 1F F1 47 84 05 03 10 20 22 07 23 09 07 07
090 83 01 00 00 E2 00 0F 67 03 0C 00 10 00 B8 2D 01
0A0 1D 80 18 71 1C 16 20 58 2C 25 00 A0 5A 00 00 00
0B0 9E 8C 0A D0 8A 20 E0 2D 10 10 3E 96 00 A0 5A 00
0C0 00 00 18 02 3A 80 18 71 38 2D 40 58 2C 45 00 A0
0D0 5A 00 00 00 1E 00 00 00 00 00 00 00 00 00 00 00
0E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FD
+5V: OK
Video information:
- Current video timing: 1280x720p 60Hz
- Incoming video matches CEA-861 VIC 4 and 69 exactly
- HDMI video detected
- Received AVI VIC 4
- Color space: YCbCr 4:4:4 8 bpc
#
EDID description: E01A0C88.xml
EDID raw data:
--- 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
000 00 FF FF FF FF FF FF 00 08 59 42 00 01 00 00 00
010 01 16 01 03 80 45 27 78 0A D0 DD A9 53 49 9D 23
020 11 47 4A A3 08 00 81 C0 81 00 81 0F 81 40 81 80
030 95 00 B3 00 01 01 52 35 80 80 70 38 1F 40 20 20
040 13 00 C4 8E 21 00 00 1E 46 20 00 A4 51 00 2A 30
050 50 80 37 00 20 46 21 00 00 1A 00 00 00 FC 00 4E
060 53 2D 33 32 4C 32 34 30 41 31 33 0A 00 00 00 FD
070 00 37 4C 1E 50 11 00 0A 20 20 20 20 20 20 01 23
080 02 03 20 73 48 05 04 03 02 01 06 07 90 26 09 07
090 07 15 07 50 83 01 00 00 67 03 0C 00 10 00 B8 2D
0A0 01 1D 00 72 51 D0 1E 20 6E 28 55 00 C4 8E 21 00
0B0 00 1E 8C 0A D0 8A 20 E0 2D 10 10 3E 96 00 13 8E
0C0 21 00 00 18 01 1D 80 18 71 1C 16 20 58 2C 25 00
0D0 C4 8E 21 00 00 9E 00 00 00 00 00 00 00 00 00 00
0E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 53
+5V: OK
Video information:
- Current video timing: 1280x720p 60Hz
- Incoming video matches CEA-861 VIC 4 and 69 exactly
- HDMI video detected
- Received AVI VIC 4
- Color space: YCbCr 4:4:4 8 bpc
这是我目前拥有的代码,但对我没有用。这是在 python 中完成的,但如果它可以完成任何其他脚本语言。只是我不习惯编写脚本。向伸出援助之手的人致以百万谢意。
#!/usr/bin/env python
inFile = open("batch01.txt")
outFile = open("result.txt", "w")
with open('batch01.txt') as infile, open('result.txt', 'w') as outfile:
copy = False
for line in infile:
if line.strip() == "EDID description:":
copy = True
elif line.strip() == "- Current video timing":
copy = True
elif copy:
outfile.write(line)
inFile.close()
outFile.close()
我会使用 regular expressions 来做到这一点:
#!/usr/bin/env python
import re
with open('batch01.txt') as infile, open('result.txt', 'w') as outfile:
for line in infile:
m = re.search('(EDID description|- Current video timing): (.*)', line)
if m is not None:
outfile.write(m.group(2) + '\n')
这将打印出来
1280x720p 60Hz
E01A0A8A.xml
1280x720p 60Hz
E01A0C88.xml
1280x720p 60Hz
看来您实际上需要检查给定行是否以某些子字符串开头,而不是完全精确的比较(这是 ==
运算符给您的)。相反,你的 for
循环应该使用 startswith
方法来查看行的开头并更接近于此:
for line in infile:
if line.strip().startswith("EDID description:"):
copy = True
elif line.strip().startswith("Report complete.on:"): # Based on your data, it seems like you need to check for these as well - maybe not?
copy = True
elif line.strip().startswith("- Current video timing"):
copy = True
else:
copy = False
if copy:
outfile.write(line)
但是可以显着简化循环:
prefixes = [
"EDID description:", "Report complete.on:", "- Current video timing"
]
for line in infile:
for prefix in prefixes:
if line.strip().startswith(prefix):
outfile.write(line)
break
这消除了多分支 if
/elif
结构以及布尔 copy
标志。
根据你的示例输入数据,我在结果文件中得到了这个:
Report complete.on: E01A040E.xml
- Current video timing: 1280x720p 60Hz
EDID description: E01A0A8A.xml
- Current video timing: 1280x720p 60Hz
EDID description: E01A0C88.xml
- Current video timing: 1280x720p 60Hz
同样,我不确定您是否想要那条 "Report complete.on" 行,但在我看来就像您想要的那样。
如果没有 运行 这个,我看到三个问题需要您解决。
- 您在 if/elif 控制结构中的相等条件永远不会为真。
- 暂时假设我在第 1 条中描述的内容是正确的;实际上不会达到最终的 elif,因为一旦前面的表达式中的任何一个计算为真,其余 if/elif 就会短路并且不会检查其他条件。
- 您需要在 for 循环的下一次迭代之前 "reset" copy = False 否则在第一次设置 copy = True 后,copy 将对所有内容保持为真。
建议的修复:
- 使用类似“line.strip().find('EDID description:')”的内容来确定 行是否包含 您要查找的字符串。同样,需要判断子串是在行,不等于行
- 您需要将复制操作移到 if/elif 结构之外。也就是说,与其将其作为同一 if/elif 结构的一部分,不如在当前结构之后创建一个单独的 'if copy:' 结构,以便在找到该行时输出该行。
- 输出该行后,设置 copy=False 以便正确初始化 for 循环的下一次迭代。否则,您将在第一场比赛打印后打印每一行。
类似这样的事情:(我还没有实际测试过...)
#!/usr/bin/env python
inFile = open("batch01.txt")
outFile = open("result.txt", "w")
with open('batch01.txt') as infile, open('result.txt', 'w') as outfile:
copy = False
for line in infile:
# use the find to see if the line CONTAINS the string you are looking for
if line.strip().find("EDID description:"):
copy = True
elif line.strip().find("- Current video timing"):
copy = True
# make this a separate if
if copy:
outfile.write(line)
# reset this to False to it can be evaluated and set properly in the next iteration
copy = False
inFile.close()
outFile.close()
希望对您有所帮助。
下面有一个批处理文件.bat的解决方案,我认为更简单...
编辑:程序已按评论要求修改
@echo off
(for /F "tokens=3-6" %%a in ('findstr /L ".xml Current" batch01.txt') do (
if "%%b" equ "" (
set /P "=%%a - " < NUL
) else (
if "%%c" equ "" (
echo No VSYNC detected
) else (
echo Current video timing: %%c %%d
)
)
)) > result.txt
输出:
E01A040E.xml - Current video timing: 1280x720p 60Hz
E01A0A8A.xml - Current video timing: 1280x720p 60Hz
E01A0C88.xml - Current video timing: 1280x720p 60Hz