如果两个连续的行几乎相同,则拆分文本文件
Split a text file if two consecutive lines almost identical
我需要根据上一行字符串的内容(从位置2到13)和当前行字符串的内容(从位置)拆分文本文件(使用.bat命令) 2 到 13)...
我解释一下:
我的文件看起来像这样:
IA1234567890A XX33 AZE
bla1 XX34 DES
bla2 XX34 DES
bla3 XX34 DES
FA1234567890A XX35 AZE
IA1234567890A XX36 AZE
bla4 XX34 DES
bla5 XX34 DES
bla6 XX34 DES
FA1234567890A XX37 AZE
IB0987654321A XX38 AZE
bla7 XX34 DES
bla8 XX34 DES
bla9 XX34 DES
FB0987654321A XX39 AZE
当以 "I" 开头的一行的前 12 个字符(不考虑 "I")与上一行的前 12 个字符不同时,我想拆分文件(除第一行外,它始终以 "F" 开头,但比较不应考虑 "F").
所以我不会在这两行之间拆分文件:
FA1234567890A XX35 AZE
IA1234567890A XX36 AZE
但我会在这两行之间拆分文件:
FA1234567890A XX37 AZE
IB0987654321A XX38 AZE
我知道如何使用定界符拆分文件,但我完全迷失了这种比较方式...
如果你们中有人能帮助我解决这个棘手的案例,我将不胜感激...
谢谢!
这从 data.txt
读取并创建 output1.txt
、output2.txt
、... outputn.txt
:
@echo off
setlocal enabledelayedexpansion
set outputcount=0
set previousblock=
for /f "delims=" %%s in (data.txt) do (
set line=%%s
set currentblock=!line:~1,13!
if "!line:~0,1!" EQU "I" (
if "!previousblock!" NEQ "!currentblock!" (
set /A outputcount=!outputcount!+1
)
)
echo !line!>>output!outputcount!.txt
set previousblock=!currentblock!
)
例如
D:\scripts>splitfile.bat
D:\scripts>type output*
output1.txt
IA1234567890A XX33 AZE
bla1 XX34 DES
bla2 XX34 DES
bla3 XX34 DES
FA1234567890A XX35 AZE
IA1234567890A XX36 AZE
bla4 XX34 DES
bla5 XX34 DES
bla6 XX34 DES
FA1234567890A XX37 AZE
output2.txt
IB0987654321A XX38 AZE
bla7 XX34 DES
bla8 XX34 DES
bla9 XX34 DES
FB0987654321A XX39 AZE
编辑
更新代码以使其工作。
试试这个:
#!/bin/sh
## clean any split files (got created in previous runs)
rm split.*;
## define variables, ct=counter for reading next line, cnt=counter for creating split.X file and file=split filename
ct=2
cnt=1
file="split.$cnt";
## Read line with spaces, IFS=''
IFS=''
while read lineP
do
## Read next line and increment ct variable
lineN="$(sed -n "${ct}p" inputfile.txt)" && ((ct++))
## Read first character of two lines and the next 12 characters
lineP121=${lineP:0:1} && lineN121=${lineN:0:1}
lineP1212=${lineP:1:12} && lineN1212=${lineN:1:12}
## Match / Condition
if [[ "$lineP1212" != "$lineN1212" && ( "$lineP121" == "F" && "$lineN121" == "I" ) ]];
then
echo "${lineP}:" >> $file;
((++cnt));
file="split.$cnt";
else
echo -e "$lineP\n" >> $file;
fi
done < inputfile.txt
echo -e "\n\nFile created are (with contents in split.X files):\n\n"
ls -l split.* && echo && grep -n . split.* && echo
输出 是:创建的文件数 2 个 split.1 和 split.2 文件(根据输入文件)。
File created are (with contents in split.X files. Output generated by grep -n command. You can use simple cat command if you want):
-rw-r--r-- 1 koba loki 450 Jun 3 19:01 split.1
-rw-r--r-- 1 koba loki 225 Jun 3 19:01 split.2
split.1:1:IA1234567890A XX33 AZE
split.1:3:bla1 XX34 DES
split.1:5:bla2 XX34 DES
split.1:7:bla3 XX34 DES
split.1:9:FA1234567890A XX35 AZE
split.1:11:IA1234567890A XX36 AZE
split.1:13:bla4 XX34 DES
split.1:15:bla5 XX34 DES
split.1:17:bla6 XX34 DES
split.1:19:FA1234567890A XX37 AZE:
split.2:1:IB0987654321A XX38 AZE
split.2:3:bla7 XX34 DES
split.2:5:bla8 XX34 DES
split.2:7:bla9 XX34 DES
split.2:9:FB0987654321A XX39 AZE
如果输入文件很大,此方法应该运行更快,因为它不会检查所有行。它还能正确处理带有特殊批处理字符的行。
@echo off
setlocal EnableDelayedExpansion
rem Read the first line, and create a dummy previous "endLine" with same name
set /P "endName=" < test.txt
set "endName=F%endName:~1%"
set startLine=1
set "startName="
rem Redirect the input file to a code block, in order to read it
< test.txt (
rem Locate all lines that start with "I" or "F"
for /F "tokens=1,2 delims=: " %%a in ('findstr /N /B "I F" test.txt') do (
if not defined startName (
set "startName=%%b"
if "!startName:~1,12!" neq "!endName:~1,12!" (
rem New section starts: copy it to its own file
set /A lines=endLine-startLine+1
(for /L %%i in (1,1,!lines!) do (
set /P "line="
echo !line!
)) > "Part !endName:~1,12!.txt"
set "endName=F%startName:~1%"
set "startLine=%%a"
)
) else (
set "endLine=%%a"
set "endName=%%b"
set "startName="
)
)
rem Copy last section to its own file
findstr "^" > "Part !endName:~1,12!.txt"
)
输出:
C:\> type Part*.txt
Part A1234567890A.txt
IA1234567890A XX33 AZE
bla1 XX34 DES
bla2 XX34 DES
bla3 XX34 DES
FA1234567890A XX35 AZE
IA1234567890A XX36 AZE
bla4 XX34 DES
bla5 XX34 DES
bla6 XX34 DES
FA1234567890A XX37 AZE
Part B0987654321A.txt
IB0987654321A XX38 AZE
bla7 XX34 DES
bla8 XX34 DES
bla9 XX34 DES
FB0987654321A XX39 AZE
我需要根据上一行字符串的内容(从位置2到13)和当前行字符串的内容(从位置)拆分文本文件(使用.bat命令) 2 到 13)...
我解释一下:
我的文件看起来像这样:
IA1234567890A XX33 AZE
bla1 XX34 DES
bla2 XX34 DES
bla3 XX34 DES
FA1234567890A XX35 AZE
IA1234567890A XX36 AZE
bla4 XX34 DES
bla5 XX34 DES
bla6 XX34 DES
FA1234567890A XX37 AZE
IB0987654321A XX38 AZE
bla7 XX34 DES
bla8 XX34 DES
bla9 XX34 DES
FB0987654321A XX39 AZE
当以 "I" 开头的一行的前 12 个字符(不考虑 "I")与上一行的前 12 个字符不同时,我想拆分文件(除第一行外,它始终以 "F" 开头,但比较不应考虑 "F").
所以我不会在这两行之间拆分文件:
FA1234567890A XX35 AZE
IA1234567890A XX36 AZE
但我会在这两行之间拆分文件:
FA1234567890A XX37 AZE
IB0987654321A XX38 AZE
我知道如何使用定界符拆分文件,但我完全迷失了这种比较方式...
如果你们中有人能帮助我解决这个棘手的案例,我将不胜感激...
谢谢!
这从 data.txt
读取并创建 output1.txt
、output2.txt
、... outputn.txt
:
@echo off
setlocal enabledelayedexpansion
set outputcount=0
set previousblock=
for /f "delims=" %%s in (data.txt) do (
set line=%%s
set currentblock=!line:~1,13!
if "!line:~0,1!" EQU "I" (
if "!previousblock!" NEQ "!currentblock!" (
set /A outputcount=!outputcount!+1
)
)
echo !line!>>output!outputcount!.txt
set previousblock=!currentblock!
)
例如
D:\scripts>splitfile.bat
D:\scripts>type output*
output1.txt
IA1234567890A XX33 AZE
bla1 XX34 DES
bla2 XX34 DES
bla3 XX34 DES
FA1234567890A XX35 AZE
IA1234567890A XX36 AZE
bla4 XX34 DES
bla5 XX34 DES
bla6 XX34 DES
FA1234567890A XX37 AZE
output2.txt
IB0987654321A XX38 AZE
bla7 XX34 DES
bla8 XX34 DES
bla9 XX34 DES
FB0987654321A XX39 AZE
编辑
更新代码以使其工作。
试试这个:
#!/bin/sh
## clean any split files (got created in previous runs)
rm split.*;
## define variables, ct=counter for reading next line, cnt=counter for creating split.X file and file=split filename
ct=2
cnt=1
file="split.$cnt";
## Read line with spaces, IFS=''
IFS=''
while read lineP
do
## Read next line and increment ct variable
lineN="$(sed -n "${ct}p" inputfile.txt)" && ((ct++))
## Read first character of two lines and the next 12 characters
lineP121=${lineP:0:1} && lineN121=${lineN:0:1}
lineP1212=${lineP:1:12} && lineN1212=${lineN:1:12}
## Match / Condition
if [[ "$lineP1212" != "$lineN1212" && ( "$lineP121" == "F" && "$lineN121" == "I" ) ]];
then
echo "${lineP}:" >> $file;
((++cnt));
file="split.$cnt";
else
echo -e "$lineP\n" >> $file;
fi
done < inputfile.txt
echo -e "\n\nFile created are (with contents in split.X files):\n\n"
ls -l split.* && echo && grep -n . split.* && echo
输出 是:创建的文件数 2 个 split.1 和 split.2 文件(根据输入文件)。
File created are (with contents in split.X files. Output generated by grep -n command. You can use simple cat command if you want):
-rw-r--r-- 1 koba loki 450 Jun 3 19:01 split.1
-rw-r--r-- 1 koba loki 225 Jun 3 19:01 split.2
split.1:1:IA1234567890A XX33 AZE
split.1:3:bla1 XX34 DES
split.1:5:bla2 XX34 DES
split.1:7:bla3 XX34 DES
split.1:9:FA1234567890A XX35 AZE
split.1:11:IA1234567890A XX36 AZE
split.1:13:bla4 XX34 DES
split.1:15:bla5 XX34 DES
split.1:17:bla6 XX34 DES
split.1:19:FA1234567890A XX37 AZE:
split.2:1:IB0987654321A XX38 AZE
split.2:3:bla7 XX34 DES
split.2:5:bla8 XX34 DES
split.2:7:bla9 XX34 DES
split.2:9:FB0987654321A XX39 AZE
如果输入文件很大,此方法应该运行更快,因为它不会检查所有行。它还能正确处理带有特殊批处理字符的行。
@echo off
setlocal EnableDelayedExpansion
rem Read the first line, and create a dummy previous "endLine" with same name
set /P "endName=" < test.txt
set "endName=F%endName:~1%"
set startLine=1
set "startName="
rem Redirect the input file to a code block, in order to read it
< test.txt (
rem Locate all lines that start with "I" or "F"
for /F "tokens=1,2 delims=: " %%a in ('findstr /N /B "I F" test.txt') do (
if not defined startName (
set "startName=%%b"
if "!startName:~1,12!" neq "!endName:~1,12!" (
rem New section starts: copy it to its own file
set /A lines=endLine-startLine+1
(for /L %%i in (1,1,!lines!) do (
set /P "line="
echo !line!
)) > "Part !endName:~1,12!.txt"
set "endName=F%startName:~1%"
set "startLine=%%a"
)
) else (
set "endLine=%%a"
set "endName=%%b"
set "startName="
)
)
rem Copy last section to its own file
findstr "^" > "Part !endName:~1,12!.txt"
)
输出:
C:\> type Part*.txt
Part A1234567890A.txt
IA1234567890A XX33 AZE
bla1 XX34 DES
bla2 XX34 DES
bla3 XX34 DES
FA1234567890A XX35 AZE
IA1234567890A XX36 AZE
bla4 XX34 DES
bla5 XX34 DES
bla6 XX34 DES
FA1234567890A XX37 AZE
Part B0987654321A.txt
IB0987654321A XX38 AZE
bla7 XX34 DES
bla8 XX34 DES
bla9 XX34 DES
FB0987654321A XX39 AZE