在 Window 批处理中 - 如何解析字段包含逗号和双引号的 CSV 文件

In Window Batch - how do I parse CSV file where fields include Comma and double quote

我有一个输入 CSV 文件,ttt.csv,它是逗号分隔的,每个字段可以包含双引号和逗号:

这里是ttt.csv的内容:

"CN=Bar\,Alex,OU=Users,OU=Headquarters,DC=CORP",Bar,Alex,"Barziza,Alex",BARAAA,aaa@email.com

"CN=Boo\,Ryan,OU=Users,OU=Headquarters,DC=CORP",Boo,Ryan,"Boo,Ryan",BABBBB,bbb@email.com

我需要循环此文件,对于每一行,我需要获取 6 个值中的每一个并创建我的 SQL 插入语句到数据库。

对于第 2 行,我需要得到:

Value1=       CN=Bar\,Alex,OU=Users,OU=Headquarters,DC=CORP
Value2=       Boo
Value3=       Ryan
Value4=       Boo,Ryan
Value5=       BABBBB
Value6=       bbb@email.com

我使用了包含双引号的定界符,但它似乎不起作用:

set str2="CN=Bar\,Alex,OU=Users,OU=Headquarters,DC=CORP",Bar,Alex,"Barziza,Alex",BAR‌​AAA,aaa@email.com
echo %str2%
for /f "tokens=1 delims=(,")" %%a in ("!str2!") do ( set newstr2=%%a )
echo !newstr2!

正如我在上面评论的那样,只需使用一个普通的 for 循环——没有 /f,没有 /r,没有 /d,没有 /l,只是一个简单的 for 循环。它将处理 CSV 定界符,同时将引用的内容视为单个标记。

@echo off
setlocal enabledelayedexpansion

set str2="CN=Bar\,Alex,OU=Users,OU=Headquarters,DC=CORP",Bar,Alex,"Barziza,Alex",BARAAA,aaa@email.com
echo %str2%

set idx=0

for %%a in (%str2%) do (
    set "newstr[!idx!]=%%~a"
    set /a idx += 1
)

set newstr

输出:

C:\Users\me\Desktop>test.bat "CN=Bar\,Alex,OU=Users,OU=Headquarters,DC=CORP",Bar,Alex,"Barziza,Alex",BARAAA, aaa@email.com
newstr[0]=CN=Bar\,Alex,OU=Users,OU=Headquarters,DC=CORP
newstr[1]=Bar
newstr[2]=Alex
newstr[3]=Barziza,Alex
newstr[4]=BARAAA
newstr[5]=aaa@email.com


如果您的 csv 数据包含不应该被视为标记分隔符的未引号空格,您可以在拆分之前暂时将空格转换为下划线,然后像这样转换回来:

@echo off
setlocal enabledelayedexpansion

set str2="CN=Ryan\,David Paul,OU=Users,OU=Singapore,DC=GLOBAL,DC=CORP",Ryan,David Paul,"Ryan, David Paul",RPAUL123,David@aaad.com
echo %str2%

set idx=0

for %%a in (%str2: =_%) do (
    set "str=%%~a"
    set "newstr[!idx!]=!str:_= !"
    set /a idx += 1
)

set newstr

如果您愿意,可以read more on substring substitution。输出:

C:\Users\me\Desktop>test.bat
"CN=Ryan\,David Paul,OU=Users,OU=Singapore,DC=GLOBAL,DC=CORP",Ryan,David Paul,"Ryan, David Paul",RPAUL123,David@aaad.com
newstr[0]=CN=Ryan\,David Paul,OU=Users,OU=Singapore,DC=GLOBAL,DC=CORP
newstr[1]=Ryan
newstr[2]=David Paul
newstr[3]=Ryan, David Paul
newstr[4]=RPAUL123
newstr[5]=David@aaad.com

当然,如果您的数据已经包含下划线,则使用它不包含的字符 -- 反引号、波浪号、美元符号或其他字符。

@echo off
(
echo "CN=Bar\,Alex,OU=Users,OU=Headquarters,DC=CORP",Bar,Alex,"Barziza,Alex",BARAAA,aaa@email.com
echo "CN=Boo\,Ryan,OU=Users,OU=Headquarters,DC=CORP",Boo,Ryan,"Boo,Ryan",BABBBB,bbb@em
)>%tmp%\tmp.csv

for /f tokens^=^1*^ delims^=^" %%i in (%tmp%\tmp.csv) do (
  echo value0=       "%%i"
  for /f tokens^=^1-6^ delims^=^=^,^" %%a in ("%%j") do (
    echo value1=       %%a&echo value2=       %%b&echo value3=       %%c,%%d
    echo value4=       %%e&echo value5=       %%f&echo:
  )
)

输出:

value0=       "CN=Bar\,Alex,OU=Users,OU=Headquarters,DC=CORP"
value1=       Bar
value2=       Alex
value3=       Barziza,Alex
value4=       BARAAA
value5=       aaa@email.com

value0=       "CN=Boo\,Ryan,OU=Users,OU=Headquarters,DC=CORP"
value1=       Boo
value2=       Ryan
value3=       Boo,Ryan
value4=       BABBBB
value5=       bbb@em