搜索 10 个连续的个位数
Search for 10 consecutive single digits
我有一位女士给我发 phone 号码。它们以凌乱的方式发送。每次。所以我想从 Skype 复制她的整个消息,并让一个批处理文件解析保存的 .txt 文件,只搜索 10 个连续的数字。
例如她发给我:
Hello more numbers for settings please,
WYK-0123456789
CAMP-0123456789
0123456789
Include 0123456789
This is an urgent number: 0123456789
TIDO: 0123456789
Send to> 0123456789
一团糟,唯一不变的是 10 位数字。所以我想要 .bat 文件来了解如何扫描这个怪物并给我留下如下内容:
例如我想要的:
0123456789
0123456789
0123456789
0123456789
0123456789
0123456789
0123456789
我试过下面this
@echo off
setlocal enableDelayedExpansion
(
for /f %%A in (
'findstr "^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]" yourFile.txt'
) do (
set "ln=%%A"
echo !ln:~0,9!
)
)>newFile.txt
不幸的是,它仅在每行的开头以 10 位数字开头时才有效,并且在 10 位数字位于行的中间或末尾的情况下对我没有帮助。
不幸的是,很难以一般方式解决这个问题。下面的批处理文件正确地从您的示例文件中获取数字,但是如果您的真实数据包含不同格式的数字,程序将失败......当然,在这种情况下,只需要包含新格式在节目中! :)
@echo off
setlocal EnableDelayedExpansion
set "digits=0123456789"
(
rem Find lines with 10 consecutive digits (or more)
for /f "delims=" %%A in (
'findstr "[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]" yourFile.txt'
) do (
set "ln=%%A"
rem Separate line in "words" delimited by space or hypen
set "ln=!ln: =" "!"
set "ln=!ln:-=" "!"
for %%B in ("!ln!") do (
set "word=%%~B"
rem If a word have exactly 10 chars...
if "!word:~9,1!" neq "" if "!word:~10!" equ "" (
rem and the first one is a digit
for /F %%D in ("!word:~0,1!") do (
if "!digits:%%D=!" neq "%digits%" echo !word!
)
)
)
)
) > newFile.txt
例如,如果 "word" 有 10 个字符,则此程序将失败,这不是电话。数字,以数字开头...
鉴于 10 位数字是文件每一行中的第一个数字部分(让我们称之为 numbers.txt
)在任何其他数字之前,您可以使用以下内容:
@echo off
setlocal EnableExtensions EnableDelayedExpansion
rem // Define constants here:
set "_FILE=.\numbers.txt"
set /A "_DIG=10"
rem // The first delimiter is TAB, the last one is SPACE:
for /F "usebackq tokens=1 delims= ^!#$%%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^^_`abcdefghijklmnopqrstuvwxyz{|}~ " %%L in ("!_FILE!") do (
set "NUM=%%L#"
if "!NUM:~%_DIG%!"=="#" echo(%%L
)
endlocal
exit /B
这利用了 for /F
及其 delims
选项字符串,其中包括除数字以外的大多数 ASCII 字符。您可以扩展 delims
选项字符串以包含扩展字符(代码大于 0x7F
的字符);确保 SPACE 是指定的最后一个字符。
这种方法可以像这样从一行中提取 10 位数字:
garbage text>0123456789_more text0123-end
但是如果一行看起来像这样就会失败,所以当第一个数字不是 10 位数字时:
garbage text: 0123 tel. 0123456789; end
这是基于上述方法的综合解决方案。 for /F
的 delims
选项的字符列表会在此处自动创建。这甚至可能需要几秒钟,但这只在一开始就完成一次,所以对于大文件,您可能不会意识到这种开销:
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=.\numbers.txt"
set /A "_DIG=10"
rem // Define global variables here:
set "$CHARS="
rem // Capture current code page and set Windows default one:
for /F "tokens=2 delims=:" %%P in ('chcp') do set /A "CP=%%P"
> nul chcp 437
rem /* Generate list of escaped characters other than numerals (escaped means every character
rem is preceded by `^`); there are some characters excluded:
rem - NUL (this cannot be stored in an environment variable and should not occur anyway),
rem - CR + LF, (they build up line-breaks, so they cannot occur within a line obviously),
rem - SPACE, (because this must be placed as the last character of the `delims`option),
rem - `"`, (because this impairs the quotation within the following code portion),
rem - `!` + `^` (they may lead to unexpected results when delayed expansion is enabled): */
setlocal EnableDelayedExpansion
for /L %%I in (0x01,1,0xFF) do (
rem // Exclude codes of aforementioned characters:
if %%I GEQ 0x30 if %%I LSS 0x3A (set "SKIP=#") else (set "SKIP=")
if not defined SKIP if %%I NEQ 0x00 if %%I NEQ 0x0A if %%I NEQ 0x0D (
if %%I NEQ 0x20 if %%I NEQ 0x21 if %%I NEQ 0x22 if %%I NEQ 0x5E (
rem // Convert code to character and append to list separated by `^`:
cmd /C exit %%I
for /F delims^=^ eol^= %%J in ('
forfiles /P "%~dp0." /M "%~nx0" /C "cmd /C echo 0x220x!=ExitCode:~-2!0x22"
') do (
set "$CHARS=!$CHARS!^^%%~J"
)
)
)
)
endlocal & set "$CHARS=%$CHARS%"
rem /* Apply escaped list of characters as delimiters and apply some of the characters
rem excluded before, namely SPACE, `"`, `!` and `^`;
rem read file using `type` in order to convert from Unicode, if applicable: */
for /F tokens^=1*^ eol^=^ ^ delims^=^!^"^^%$CHARS%^ %%K in ('type "%_FILE%"') do (
set "NUM=%%K#" & set "REST=%%L"
rem // Test whether extracted numeric string holds the given number of digits:
setlocal EnableDelayedExpansion
if "!NUM:~%_DIG%!"=="#" echo(%%K
endlocal
rem /* Current line holds more than a single numeric portion, so process them in a
rem sub-routine; this is not called if the line contains a single number only: */
if defined REST call :SUB REST
)
rem // Restore previous code page:
> nul chcp %CP%
endlocal
exit /B
:SUB ref_string
setlocal DisableDelayedExpansion
setlocal EnableDelayedExpansion
set "STR=!%~1!"
rem // Parse line string using the same approach as in the main routine:
:LOOP
if defined STR (
for /F tokens^=1*^ eol^=^ ^ delims^=^^^!^"^^^^%$CHARS%^ %%E in ("!STR!") do (
endlocal
set "NUM=%%E#" & set "STR=%%F"
setlocal EnableDelayedExpansion
rem // Test whether extracted numeric string holds the given number of digits:
if "!NUM:~%_DIG%!"=="#" echo(%%E
)
rem // Loop back if there are still more numeric parts encountered:
goto :LOOP
)
endlocal
endlocal
exit /B
此方法检测文件中所有位置的 10 位数字,即使一行中有多个数字也是如此。
只是另一个选择
@echo off
setlocal enableextensions disabledelayedexpansion
rem Configure
set "file=input.txt"
rem Initializacion
set "counter=0" & set "number="
rem Convert file to a character per line and add ending line
(for /f "delims=" %%a in ('
^( cmd /q /u /c type "%file%" ^& echo( ^)^| find /v ""
') do (
rem See if current character is a number
(for /f "delims=0123456789" %%b in ("%%a") do (
rem Not a number, see if we have retrieved 10 consecutive numbers
set /a "1/((counter+1)%%11)" || (
rem We probably have 10 numbers, check and output data
setlocal enabledelayedexpansion
if !counter!==10 echo !number!
endlocal
)
rem As current character is not a number, initialize
set "counter=0" & set "number="
)) || (
rem Number readed, increase counter and concatenate
set /a "counter+=1"
setlocal enabledelayedexpansion
for %%b in ("!number!") do endlocal & set "number=%%~b%%a"
)
)) 2>nul
基本思路是启动一个带有 unicode 输出的 cmd
实例,从这个实例中键入文件并使用 find
过滤两个字节输出,将每个输入行扩展为一个字符线路输出。
一旦我们将每个字符放在单独的行中,并在 for /f
命令中处理此输出,我们只需要连接连续的数字,直到找到非数字字符。此时我们检查是否读取了一组 10 个数字,并在需要时输出数据。
@ECHO OFF
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "destdir=U:\destdir"
SET "filename1=%sourcedir%\q44134518.txt"
SET "outfile=%destdir%\outfile.txt"
ECHO %time%
(
FOR /f "usebackqdelims=" %%a IN ("%filename1%") DO SET "line=%%a"&CALL :process
)>"%outfile%"
ECHO %time%
GOTO :EOF
:lopchar
SET "line=%line:~1%"
:process
IF "%line:~9,1%"=="" GOTO :eof
SET "candidate=%line:~0,10%"
SET /a count=0
:testlp
SET "char=%candidate:~0,1%"
IF "%char%" gtr "9" GOTO lopchar
IF "%char%" lss "0" GOTO lopchar
SET /a count+=1
IF %count% lss 10 SET "candidate=%candidate:~1%"&GOTO testlp
ECHO %line:~0,10%
GOTO :eof
您需要更改 sourcedir
和 destdir
的设置以适合您的情况。
我使用了一个名为 q44134518.txt
的文件,其中包含您的数据以及一些额外的数据用于我的测试。
生成定义为 %outfile%
的文件
读取每行数据到%%a
,然后line
。
从 :process
开始处理每个 line
。查看该行是否超过10个字符,如果不是则终止子程序。
由于该行是10个或更多字符,select前10到candidate
并清除count
到0。
将第一个字符分配给 char
,并测试 >'9' 或小于 '0'。如果其中一个为真,则删除 line
的第一个字符并重试(直到我们有一个数字或 line
有 9 个或更少的字符)
计算每个连续的数字。如果我们还没有数到 10,请删除 candidate
中的第一个字符并再次检查。
当我们达到 10 个连续的数字时,echo
line
的前 10 个字符,所有这些都是数字和查找的数据。
我有一位女士给我发 phone 号码。它们以凌乱的方式发送。每次。所以我想从 Skype 复制她的整个消息,并让一个批处理文件解析保存的 .txt 文件,只搜索 10 个连续的数字。
例如她发给我:
Hello more numbers for settings please,
WYK-0123456789
CAMP-0123456789
0123456789
Include 0123456789
This is an urgent number: 0123456789
TIDO: 0123456789
Send to> 0123456789
一团糟,唯一不变的是 10 位数字。所以我想要 .bat 文件来了解如何扫描这个怪物并给我留下如下内容:
例如我想要的:
0123456789
0123456789
0123456789
0123456789
0123456789
0123456789
0123456789
我试过下面this
@echo off
setlocal enableDelayedExpansion
(
for /f %%A in (
'findstr "^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]" yourFile.txt'
) do (
set "ln=%%A"
echo !ln:~0,9!
)
)>newFile.txt
不幸的是,它仅在每行的开头以 10 位数字开头时才有效,并且在 10 位数字位于行的中间或末尾的情况下对我没有帮助。
不幸的是,很难以一般方式解决这个问题。下面的批处理文件正确地从您的示例文件中获取数字,但是如果您的真实数据包含不同格式的数字,程序将失败......当然,在这种情况下,只需要包含新格式在节目中! :)
@echo off
setlocal EnableDelayedExpansion
set "digits=0123456789"
(
rem Find lines with 10 consecutive digits (or more)
for /f "delims=" %%A in (
'findstr "[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]" yourFile.txt'
) do (
set "ln=%%A"
rem Separate line in "words" delimited by space or hypen
set "ln=!ln: =" "!"
set "ln=!ln:-=" "!"
for %%B in ("!ln!") do (
set "word=%%~B"
rem If a word have exactly 10 chars...
if "!word:~9,1!" neq "" if "!word:~10!" equ "" (
rem and the first one is a digit
for /F %%D in ("!word:~0,1!") do (
if "!digits:%%D=!" neq "%digits%" echo !word!
)
)
)
)
) > newFile.txt
例如,如果 "word" 有 10 个字符,则此程序将失败,这不是电话。数字,以数字开头...
鉴于 10 位数字是文件每一行中的第一个数字部分(让我们称之为 numbers.txt
)在任何其他数字之前,您可以使用以下内容:
@echo off
setlocal EnableExtensions EnableDelayedExpansion
rem // Define constants here:
set "_FILE=.\numbers.txt"
set /A "_DIG=10"
rem // The first delimiter is TAB, the last one is SPACE:
for /F "usebackq tokens=1 delims= ^!#$%%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^^_`abcdefghijklmnopqrstuvwxyz{|}~ " %%L in ("!_FILE!") do (
set "NUM=%%L#"
if "!NUM:~%_DIG%!"=="#" echo(%%L
)
endlocal
exit /B
这利用了 for /F
及其 delims
选项字符串,其中包括除数字以外的大多数 ASCII 字符。您可以扩展 delims
选项字符串以包含扩展字符(代码大于 0x7F
的字符);确保 SPACE 是指定的最后一个字符。
这种方法可以像这样从一行中提取 10 位数字:
garbage text>0123456789_more text0123-end
但是如果一行看起来像这样就会失败,所以当第一个数字不是 10 位数字时:
garbage text: 0123 tel. 0123456789; end
这是基于上述方法的综合解决方案。 for /F
的 delims
选项的字符列表会在此处自动创建。这甚至可能需要几秒钟,但这只在一开始就完成一次,所以对于大文件,您可能不会意识到这种开销:
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=.\numbers.txt"
set /A "_DIG=10"
rem // Define global variables here:
set "$CHARS="
rem // Capture current code page and set Windows default one:
for /F "tokens=2 delims=:" %%P in ('chcp') do set /A "CP=%%P"
> nul chcp 437
rem /* Generate list of escaped characters other than numerals (escaped means every character
rem is preceded by `^`); there are some characters excluded:
rem - NUL (this cannot be stored in an environment variable and should not occur anyway),
rem - CR + LF, (they build up line-breaks, so they cannot occur within a line obviously),
rem - SPACE, (because this must be placed as the last character of the `delims`option),
rem - `"`, (because this impairs the quotation within the following code portion),
rem - `!` + `^` (they may lead to unexpected results when delayed expansion is enabled): */
setlocal EnableDelayedExpansion
for /L %%I in (0x01,1,0xFF) do (
rem // Exclude codes of aforementioned characters:
if %%I GEQ 0x30 if %%I LSS 0x3A (set "SKIP=#") else (set "SKIP=")
if not defined SKIP if %%I NEQ 0x00 if %%I NEQ 0x0A if %%I NEQ 0x0D (
if %%I NEQ 0x20 if %%I NEQ 0x21 if %%I NEQ 0x22 if %%I NEQ 0x5E (
rem // Convert code to character and append to list separated by `^`:
cmd /C exit %%I
for /F delims^=^ eol^= %%J in ('
forfiles /P "%~dp0." /M "%~nx0" /C "cmd /C echo 0x220x!=ExitCode:~-2!0x22"
') do (
set "$CHARS=!$CHARS!^^%%~J"
)
)
)
)
endlocal & set "$CHARS=%$CHARS%"
rem /* Apply escaped list of characters as delimiters and apply some of the characters
rem excluded before, namely SPACE, `"`, `!` and `^`;
rem read file using `type` in order to convert from Unicode, if applicable: */
for /F tokens^=1*^ eol^=^ ^ delims^=^!^"^^%$CHARS%^ %%K in ('type "%_FILE%"') do (
set "NUM=%%K#" & set "REST=%%L"
rem // Test whether extracted numeric string holds the given number of digits:
setlocal EnableDelayedExpansion
if "!NUM:~%_DIG%!"=="#" echo(%%K
endlocal
rem /* Current line holds more than a single numeric portion, so process them in a
rem sub-routine; this is not called if the line contains a single number only: */
if defined REST call :SUB REST
)
rem // Restore previous code page:
> nul chcp %CP%
endlocal
exit /B
:SUB ref_string
setlocal DisableDelayedExpansion
setlocal EnableDelayedExpansion
set "STR=!%~1!"
rem // Parse line string using the same approach as in the main routine:
:LOOP
if defined STR (
for /F tokens^=1*^ eol^=^ ^ delims^=^^^!^"^^^^%$CHARS%^ %%E in ("!STR!") do (
endlocal
set "NUM=%%E#" & set "STR=%%F"
setlocal EnableDelayedExpansion
rem // Test whether extracted numeric string holds the given number of digits:
if "!NUM:~%_DIG%!"=="#" echo(%%E
)
rem // Loop back if there are still more numeric parts encountered:
goto :LOOP
)
endlocal
endlocal
exit /B
此方法检测文件中所有位置的 10 位数字,即使一行中有多个数字也是如此。
只是另一个选择
@echo off
setlocal enableextensions disabledelayedexpansion
rem Configure
set "file=input.txt"
rem Initializacion
set "counter=0" & set "number="
rem Convert file to a character per line and add ending line
(for /f "delims=" %%a in ('
^( cmd /q /u /c type "%file%" ^& echo( ^)^| find /v ""
') do (
rem See if current character is a number
(for /f "delims=0123456789" %%b in ("%%a") do (
rem Not a number, see if we have retrieved 10 consecutive numbers
set /a "1/((counter+1)%%11)" || (
rem We probably have 10 numbers, check and output data
setlocal enabledelayedexpansion
if !counter!==10 echo !number!
endlocal
)
rem As current character is not a number, initialize
set "counter=0" & set "number="
)) || (
rem Number readed, increase counter and concatenate
set /a "counter+=1"
setlocal enabledelayedexpansion
for %%b in ("!number!") do endlocal & set "number=%%~b%%a"
)
)) 2>nul
基本思路是启动一个带有 unicode 输出的 cmd
实例,从这个实例中键入文件并使用 find
过滤两个字节输出,将每个输入行扩展为一个字符线路输出。
一旦我们将每个字符放在单独的行中,并在 for /f
命令中处理此输出,我们只需要连接连续的数字,直到找到非数字字符。此时我们检查是否读取了一组 10 个数字,并在需要时输出数据。
@ECHO OFF
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "destdir=U:\destdir"
SET "filename1=%sourcedir%\q44134518.txt"
SET "outfile=%destdir%\outfile.txt"
ECHO %time%
(
FOR /f "usebackqdelims=" %%a IN ("%filename1%") DO SET "line=%%a"&CALL :process
)>"%outfile%"
ECHO %time%
GOTO :EOF
:lopchar
SET "line=%line:~1%"
:process
IF "%line:~9,1%"=="" GOTO :eof
SET "candidate=%line:~0,10%"
SET /a count=0
:testlp
SET "char=%candidate:~0,1%"
IF "%char%" gtr "9" GOTO lopchar
IF "%char%" lss "0" GOTO lopchar
SET /a count+=1
IF %count% lss 10 SET "candidate=%candidate:~1%"&GOTO testlp
ECHO %line:~0,10%
GOTO :eof
您需要更改 sourcedir
和 destdir
的设置以适合您的情况。
我使用了一个名为 q44134518.txt
的文件,其中包含您的数据以及一些额外的数据用于我的测试。
生成定义为 %outfile%
的文件读取每行数据到%%a
,然后line
。
从 :process
开始处理每个 line
。查看该行是否超过10个字符,如果不是则终止子程序。
由于该行是10个或更多字符,select前10到candidate
并清除count
到0。
将第一个字符分配给 char
,并测试 >'9' 或小于 '0'。如果其中一个为真,则删除 line
的第一个字符并重试(直到我们有一个数字或 line
有 9 个或更少的字符)
计算每个连续的数字。如果我们还没有数到 10,请删除 candidate
中的第一个字符并再次检查。
当我们达到 10 个连续的数字时,echo
line
的前 10 个字符,所有这些都是数字和查找的数据。