批处理:迭代 .csv 文件列

Batch: Iterate .csv file columns

我有这个 .csv 文件:

col0,col1,col2,col3,col4
a,1,10,100,1000
b,2,11,101,1001
c,3,12,102,1002
d,4,13,103,1003
e,5,14,105,1004

我需要在不知道列数的情况下迭代 .csv 中的每一列。 第一列被跳过,因为不需要。 到目前为止我有这段代码,但是我需要一个解决方案来解决我不知道列数的情况。 我需要在稍后的计算步骤中每列的值。

@echo off
setlocal enableDelayedExpansion
:: set workspace data
set INPUT_FILE_LOCATION=D:\Scripts\
set CSV_FILE_NAME=test.csv

pushd %INPUT_FILE_LOCATION%
::loop through the csv file
for /F "tokens=2,3,4,5 delims=," %%i in (%CSV_FILE_NAME%) do (
echo %%i,%%j,%%k,%%l
rem echo.%%~i^|END
)
endlocal 

更具体地说,我有一个 .csv 文件,其中包含一些列和许多行。从第二列开始,我需要对每一列的每两个元素进行差值,以验证是否至少有一个差值大于 1。(列中的值将按升序排列,举个例子使用上面的 csv,代码应该执行以下操作:从 col1 开始,验证是否 2-1 > 1,然后如果 3-2 > 1,然后如果 4-3 > 1 然后 5-4 > 1,那么它应该验证下一列(col2)也是如此,依此类推,直到我们到达最后一列。如果我发现一个大于 1 的差异,我想打印一条消息“在 [=18 上发现了更大的差异” =] 发现较大差异的那一列;我想通过使用 header 中的列标题来定位发现意外差异的列;例如,在 col3 中,我们有差异大于 1,我想打印“col3 中的差异大于 1”,其中 col3 在 header 中)。随着时间的推移,我将需要添加更多的列,因此该文件可能有 30 或 40 列,其结构与之前的相同。

@ECHO OFF
SETLOCAL
rem The following settings for the source directory, filenames are names
rem that I use for testing and deliberately include names which include spaces to make sure
rem that the process works using such names. These will need to be changed to suit your situation.

SET "sourcedir=u:\your files"
SET "filename1=%sourcedir%\q71308045.txt"

:: comma-separated list of columns to ignore
SET "ignorecolumns=1"

:: remove all 'cell' variables from environment
For %%b IN (cell) DO FOR  /F "delims==" %%c In ('set %%b 2^>Nul') DO SET "%%c="

SET /a rowcount=0
SET /a maxcolumns=0



rem usebackq should be omitted if the source filename is not quoted
rem skip=1 skips the first (header) line. Omit to skip no lines
FOR /f "usebackq skip=1 delims=" %%b IN ("%filename1%") DO (
 CALL :process %%b
 CALL :linebyline
)
ECHO %rowcount% rows, maximum %maxcolumns% columns
SET cell
GOTO :EOF

:process
SET /a rowcount+=1
SET /a columns=0

:procloop
IF "%~1"=="" GOTO :eof
SET /a columns+=1
IF DEFINED ignorecolumns FOR %%c IN (%ignorecolumns%) DO IF %columns%==%%c GOTO donecolumn
SET "cell[%rowcount%,%columns%]=%~1"

:donecolumn
IF %columns% gtr %maxcolumns% SET /a maxcolumns=columns
SET /a cellsinrow[%rowcount%]=%columns%
SHIFT
GOTO procloop

GOTO :eof

:: processing line-by-line if required

:linebyline
ECHO row %rowcount% has %columns% columns
GOTO :eof

在没有具体细节的情况下,这是一个通用的解决方案。

请注意,它不适合空列。

每一行都作为参数提供给:process:process 计算每一列并将其插入到 cells wired-array 中,省略任何不需要的列,并跟踪找到的最大值 column-number 和每行中的单元格数。

环境space AFAIAA 有限,因此如果处理大量数据,则需要采取补偿措施。

对每一行执行:linebyline例程,所以如果需要的处理不需要cells那么rowcount可以在这个例程中设置回0,效果每行的报告 cells[1,*]

--- 澄清后的修订

@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION 
rem The following settings for the source directory, filenames are names
rem that I use for testing and deliberately include names which include spaces to make sure
rem that the process works using such names. These will need to be changed to suit your situation.

SET "sourcedir=u:\your files"
SET "filename1=%sourcedir%\q71308045.txt"

:: comma-separated list of columns to ignore
SET "ignorecolumns=1"

:: remove all 'cell' variables from environment
For %%b IN (cell) DO FOR  /F "delims==" %%c In ('set %%b 2^>Nul') DO SET "%%c="

SET /a rowcount=0
SET /a maxcolumns=0

rem usebackq should be omitted if the source filename is not quoted
rem skip=1 skips the first (header) line. Omit to skip no lines
FOR /f "usebackq delims=" %%b IN ("%filename1%") DO (
 CALL :process %%b
 CALL :linebyline
)
rem ECHO %rowcount% rows, maximum %maxcolumns% columns
rem SET cell
GOTO :EOF

:process
SET /a rowcount+=1
SET /a columns=0

:procloop
IF "%~1"=="" GOTO :eof
SET /a columns+=1
IF DEFINED ignorecolumns FOR %%c IN (%ignorecolumns%) DO IF %columns%==%%c GOTO donecolumn
SET "cell[%rowcount%,%columns%]=%~1"

:donecolumn
IF %columns% gtr %maxcolumns% SET /a maxcolumns=columns
SET /a cellsinrow[%rowcount%]=%columns%
SHIFT
GOTO procloop

GOTO :eof

:: processing line-by-line if required

:linebyline
:: if rowcount=1 then column names are in cell[1,*] and nothing to do
:: if rowcount=2 then we have the starting data row and nothing to do
IF %rowcount% lss 3 GOTO :eof

:: Now we can compare row 2 to row %rowcount%
FOR /L %%c IN (1,1,%maxcolumns%) DO IF "!cell[2,%%c]!" neq "" CALL :matchcells %%c
:: And move row %rowcount% to row 2; removing row %rowcount% from environment
FOR /L %%c IN (1,1,%maxcolumns%) DO IF "!cell[2,%%c]!" neq ""  SET cell[2,%%c]=!cell[%rowcount%,%%c]!&SET "cell[%rowcount%,%%c]="
GOTO :eof

:: Match cell[2,%1] to cell[%rowcount%,%1]

:matchcells
SET /a celldiff = !cell[%rowcount%,%1]! - !cell[2,%1]!
IF %celldiff% == 1 GOTO :eof
ECHO row %rowcount% column %1 [!cell[1,%1]!] value difference = %celldiff%
GOTO :eof

好吧,这里差别不大。我仍然认为规范是错误的,因为如果您知道第一个数据行,那么您就知道接下来的每个数据行 应该 是什么,因为每个后续行中的每一列都应该多于一个前一行中的值。因此,您只需要一行数据,因为您可以生成剩余的行并且不需要经历 generate/verify 循环。