搜索 10 个连续的个位数

Search for 10 consecutive single digits

我有一位女士给我发 phone 号码。它们以凌乱的方式发送。每次。所以我想从 Skype 复制她的整个消息,并让一个批处理文件解析保存的 .txt 文件,只搜索 10 个连续的数字。

例如她发给我:

Hello more numbers for settings please,
WYK-0123456789 
CAMP-0123456789 
0123456789
Include 0123456789
This is an urgent number: 0123456789 
TIDO: 0123456789
Send to> 0123456789

一团糟,唯一不变的是 10 位数字。所以我想要 .bat 文件来了解如何扫描这个怪物并给我留下如下内容:

例如我想要的:

0123456789 
0123456789 
0123456789
0123456789
0123456789 
0123456789
0123456789

我试过下面this

@echo off
setlocal enableDelayedExpansion
(
  for /f %%A in (
    'findstr "^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]" yourFile.txt'
  ) do (
    set "ln=%%A"
    echo !ln:~0,9!
  )
)>newFile.txt

不幸的是,它仅在每行的开头以 10 位数字开头时才有效,并且在 10 位数字位于行的中间或末尾的情况下对我没有帮助。

不幸的是,很难以一般方式解决这个问题。下面的批处理文件正确地从您的示例文件中获取数字,但是如果您的真实数据包含不同格式的数字,程序将失败......当然,在这种情况下,只需要包含新格式在节目中! :)

@echo off
setlocal EnableDelayedExpansion

set "digits=0123456789"

(
   rem Find lines with 10 consecutive digits (or more)
   for /f "delims=" %%A in (
      'findstr "[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]" yourFile.txt'
   ) do (
      set "ln=%%A"

      rem Separate line in "words" delimited by space or hypen
      set "ln=!ln: =" "!"
      set "ln=!ln:-=" "!"
      for %%B in ("!ln!") do (
         set "word=%%~B"

         rem If a word have exactly 10 chars...
         if "!word:~9,1!" neq "" if "!word:~10!" equ "" (
            rem and the first one is a digit
            for /F %%D in ("!word:~0,1!") do (
               if "!digits:%%D=!" neq "%digits%" echo !word!
            )
         )

      )
   )
) > newFile.txt

例如,如果 "word" 有 10 个字符,则此程序将失败,这不是电话。数字,以数字开头...

鉴于 10 位数字是文件每一行中的第一个数字部分(让我们称之为 numbers.txt)在任何其他数字之前,您可以使用以下内容:

@echo off
setlocal EnableExtensions EnableDelayedExpansion

rem // Define constants here:
set "_FILE=.\numbers.txt"
set /A "_DIG=10"

rem // The first delimiter is TAB, the last one is SPACE:
for /F "usebackq tokens=1 delims=   ^!#$%%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^^_`abcdefghijklmnopqrstuvwxyz{|}~ " %%L in ("!_FILE!") do (
    set "NUM=%%L#"
    if "!NUM:~%_DIG%!"=="#" echo(%%L
)

endlocal
exit /B

这利用了 for /F 及其 delims 选项字符串,其中包括除数字以外的大多数 ASCII 字符。您可以扩展 delims 选项字符串以包含扩展字符(代码大于 0x7F 的字符);确保 SPACE 是指定的最后一个字符。

这种方法可以像这样从一行中提取 10 位数字:

garbage text>0123456789_more text0123-end

但是如果一行看起来像这样就会失败,所以当第一个数字不是 10 位数字时:

garbage text: 0123 tel. 0123456789; end

这是基于上述方法的综合解决方案。 for /Fdelims 选项的字符列表会在此处自动创建。这甚至可能需要几秒钟,但这只在一开始就完成一次,所以对于大文件,您可能不会意识到这种开销:

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "_FILE=.\numbers.txt"
set /A "_DIG=10"

rem // Define global variables here:
set "$CHARS="

rem // Capture current code page and set Windows default one:
for /F "tokens=2 delims=:" %%P in ('chcp') do set /A "CP=%%P"
> nul chcp 437

rem /* Generate list of escaped characters other than numerals (escaped means every character
rem    is preceded by `^`); there are some characters excluded:
rem    - NUL (this cannot be stored in an environment variable and should not occur anyway),
rem    - CR + LF, (they build up line-breaks, so they cannot occur within a line obviously),
rem    - SPACE, (because this must be placed as the last character of the `delims`option),
rem    - `"`, (because this impairs the quotation within the following code portion),
rem    - `!` + `^` (they may lead to unexpected results when delayed expansion is enabled): */
setlocal EnableDelayedExpansion
for /L %%I in (0x01,1,0xFF) do (
    rem // Exclude codes of aforementioned characters:
    if %%I GEQ 0x30 if %%I LSS 0x3A (set "SKIP=#") else (set "SKIP=")
    if not defined SKIP if %%I NEQ 0x00 if %%I NEQ 0x0A if %%I NEQ 0x0D (
        if %%I NEQ 0x20 if %%I NEQ 0x21 if %%I NEQ 0x22 if %%I NEQ 0x5E (
            rem // Convert code to character and append to list separated by `^`:
            cmd /C exit %%I
            for /F delims^=^ eol^= %%J in ('
                forfiles /P "%~dp0." /M "%~nx0" /C "cmd /C echo 0x220x!=ExitCode:~-2!0x22"
            ') do (
                set "$CHARS=!$CHARS!^^%%~J"
            )
        )
    )
)
endlocal & set "$CHARS=%$CHARS%"

rem /* Apply escaped list of characters as delimiters and apply some of the characters
rem    excluded before, namely SPACE, `"`, `!` and `^`;
rem    read file using `type` in order to convert from Unicode, if applicable: */
for /F tokens^=1*^ eol^=^ ^ delims^=^!^"^^%$CHARS%^  %%K in ('type "%_FILE%"') do (
    set "NUM=%%K#" & set "REST=%%L"
    rem // Test whether extracted numeric string holds the given number of digits:
    setlocal EnableDelayedExpansion
    if "!NUM:~%_DIG%!"=="#" echo(%%K
    endlocal
    rem /* Current line holds more than a single numeric portion, so process them in a
    rem    sub-routine; this is not called if the line contains a single number only: */
    if defined REST call :SUB REST
)

rem // Restore previous code page:
> nul chcp %CP%

endlocal
exit /B


:SUB  ref_string
    setlocal DisableDelayedExpansion
    setlocal EnableDelayedExpansion
    set "STR=!%~1!"
    rem // Parse line string using the same approach as in the main routine:
    :LOOP
    if defined STR (
        for /F tokens^=1*^ eol^=^ ^ delims^=^^^!^"^^^^%$CHARS%^  %%E in ("!STR!") do (
            endlocal
            set "NUM=%%E#" & set "STR=%%F"
            setlocal EnableDelayedExpansion
            rem // Test whether extracted numeric string holds the given number of digits:
            if "!NUM:~%_DIG%!"=="#" echo(%%E
        )
        rem // Loop back if there are still more numeric parts encountered:
        goto :LOOP
    )
    endlocal
    endlocal
    exit /B

此方法检测文件中所有位置的 10 位数字,即使一行中有多个数字也是如此。

只是另一个选择

@echo off
    setlocal enableextensions disabledelayedexpansion

    rem Configure
    set "file=input.txt"

    rem Initializacion
    set "counter=0" & set "number="

    rem Convert file to a character per line and add ending line
    (for /f "delims=" %%a in ('
        ^( cmd /q /u /c type "%file%" ^& echo( ^)^| find /v ""
    ') do (
        rem See if current character is a number
        (for /f "delims=0123456789" %%b in ("%%a") do (
            rem Not a number, see if we have retrieved 10 consecutive numbers 
            set /a "1/((counter+1)%%11)" || (
                rem We probably have 10 numbers, check and output data
                setlocal enabledelayedexpansion
                if !counter!==10 echo !number!
                endlocal
            )
            rem As current character is not a number, initialize
            set "counter=0" & set "number="
        )) || ( 
            rem Number readed, increase counter and concatenate
            set /a "counter+=1"
            setlocal enabledelayedexpansion
            for %%b in ("!number!") do endlocal & set "number=%%~b%%a"
        )
    )) 2>nul 

基本思路是启动一个带有 unicode 输出的 cmd 实例,从这个实例中键入文件并使用 find 过滤两个字节输出,将每个输入行扩展为一个字符线路输出。

一旦我们将每个字符放在单独的行中,并在 for /f 命令中处理此输出,我们只需要连接连续的数字,直到找到非数字字符。此时我们检查是否读取了一组 10 个数字,并在需要时输出数据。

@ECHO OFF
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "destdir=U:\destdir"
SET "filename1=%sourcedir%\q44134518.txt"
SET "outfile=%destdir%\outfile.txt"
ECHO %time%
(
FOR /f "usebackqdelims=" %%a IN ("%filename1%") DO SET "line=%%a"&CALL :process
)>"%outfile%"
ECHO %time%

GOTO :EOF

:lopchar
SET "line=%line:~1%"
:process
IF "%line:~9,1%"=="" GOTO :eof
SET "candidate=%line:~0,10%"
SET /a count=0
:testlp
SET "char=%candidate:~0,1%"
IF "%char%" gtr "9" GOTO lopchar
IF "%char%" lss "0" GOTO lopchar
SET /a count+=1
IF %count% lss 10 SET "candidate=%candidate:~1%"&GOTO testlp
ECHO %line:~0,10%
GOTO :eof

您需要更改 sourcedirdestdir 的设置以适合您的情况。 我使用了一个名为 q44134518.txt 的文件,其中包含您的数据以及一些额外的数据用于我的测试。

生成定义为 %outfile%

的文件

读取每行数据到%%a,然后line

:process 开始处理每个 line。查看该行是否超过10个字符,如果不是则终止子程序。

由于该行是10个或更多字符,select前10到candidate并清除count到0。

将第一个字符分配给 char,并测试 >'9' 或小于 '0'。如果其中一个为真,则删除 line 的第一个字符并重试(直到我们有一个数字或 line 有 9 个或更少的字符)

计算每个连续的数字。如果我们还没有数到 10,请删除 candidate 中的第一个字符并再次检查。

当我们达到 10 个连续的数字时,echo line 的前 10 个字符,所有这些都是数字和查找的数据。