使用批处理文件在文件中查找文本,然后清理它,保存到一个文件和一个变量

Using a batch file to find text in a file, then cleaning it up, saving to a file and a variable

好的,我已经搜索并搜索了几天,但找不到任何有用的东西。如果我错过了什么,我真的很抱歉。

我的问题: 我有一个包含网页源代码的文本文件。我的目标是搜索文本文件并找到。 "below is the lines around what i want"

<b>public</b>
</a>
</td>
<td></td>
<td class="b">
705330
</td>
<tr>
<tr>
<td>

(有更多源代码与其他数字类似,但 public 是唯一的。下面是唯一的(不是数字),但我认为越匹配越好)

<td class="b">
705330
</td>

我正在尝试获取这些数字(因此必须删除除数字以外的所有内容),数字会改变,但其余数字不会。我想将数字保存到文件 .txt(只是数字)(第一行并覆盖之前的保存)并分配给一个变量,以便它可以与之前的 运行 一些命令进行比较。

喜欢将新的(变量)与旧的 .txt 进行比较并做一些事情。

我把剩下的都记下来了,就是想不通。我试过 find、findstr 和每个论坛都试图找到适合我的东西。 但是无法将搜索到的字符串放入变量中,它只是回显了大约 30 行或者什么也没做。

感谢任何帮助,提前致谢

好吧——这很棘手...

在批处理中,大多数应用程序都比较严格,因此处理这种高度灵活的文本 (HTML) 文件具有挑战性;不过我试过了...

下面的批处理脚本假定:

  • 一个(文本)文件作为命令行参数给出;
  • public”字段是真正唯一的(不会检查);
  • public”字段和 "class=" 标记均区分大小写;
  • 在“public”字段后总是出现一个 "class=" 标记;
  • 兴趣数量出现在 "class=" 标记之后的某处(可选地紧跟在之后);
  • 感兴趣的数量在其单独的行中给出;

让我们开始吧(备注中的解释):

@echo off
setlocal EnableDelayedExpansion

rem set constant holding exact appearance of "public" field;
rem ^ escapes < and > which would otherwise constitute (unintended) redirections
set PUBLIC_TAG=^<b^>public^</b^>
rem set constant holding exact appearance of "class" token
set CLASS_PROP=class=
rem set constant to non-empty value if you want the target value to be right after the "class" token
set CLASS_GLUE=

rem initialise variable that holds line number of "public" field
set LinePublic=0
rem clear variable that is set as soon as "class" token is found
set FoundClass=
rem clear variable that will hold resulting (numeric) target value
set FieldValue=

rem check for command line argument being given
if "%1"=="" (echo No file given^^!& exit /B)

rem search for unique "public" field, return found line, prefix with line number;
rem the `2> nul` portion avoids displaying any `findstr` errors in case of input lines > 8192 chars.;
rem wrapped-around `for /F` retrieves line number only, stored in %LinePublic%
for /F "delims=:" %%L in ('type "%~1" ^| findstr /N /L "%PUBLIC_TAG%" 2^> nul') do set LinePublic=%%L
rem if no "public" field found, terminate batch script
if %LinePublic% equ 0 (echo File does not contain field "%PUBLIC_TAG%"^^!& exit /B)
rem starting at line number %LinePublic%, go through each line
for /F "usebackq skip=%LinePublic% delims=" %%F in ("%~1") do (
rem check if %FoundClass% has been set in (one of the) previous `for` iteration(s)
if defined FoundClass (
rem "class" token found previously, so check if target value has already been found
if not defined FieldValue (
rem no target value available yet, so check if current line contains decimal digits only
echo."%%F" | findstr /R "^\"[0-9][0-9]*\" $" > nul
rem if ErrorLevel is 0 (below 1), current line constitutes one numeric value, so store it;
rem the `call` statement is necessary to avoid syntax errors due to < and > in line text %%F
if not ErrorLevel 1 ((call set FieldValue=%%F) & goto :FINE) else (
rem if you want the target value to be right after the "class" token, %CLASS_GLUE% must be set:
if defined CLASS_GLUE (echo No number follows "%CLASS_PROP%" token^^!& exit /B)
)))
rem search current line for "class" token; ErrorLevel is 0 if found
echo."%%F" | findstr /L "%CLASS_PROP%" > nul
rem if ErrorLevel is below 1, indicate by setting %FoundClass%, checked in next `for` iteration
if not ErrorLevel 1 set FoundClass=True
) & rem next %%F
:FINE
rem this compound statement makes %FieldValue% to survive `setlocal`/`endlocal` block
endlocal & set FieldValue=%FieldValue%
echo.%FieldValue%

这至少适用于您的文本文件示例...

提示:如果希望在"class="标记后立即期望数值,设置变量( constant) CLASS_GLUE 到任何有效的非空值。

所以最后要完成将数字存储到文本文件中的任务,您需要输入:

above_batch_script_name.bat input_html_text_file.txt > output_text_file.txt

备注:由于批处理脚本在字符串操作和操作方面并不强大,因此它们可能不是此类挑战的最佳选择。无论如何,我希望这对您有所帮助...