CSV 中带有特殊字符的正则表达式和带引号的匹配项

Regular expression with special character inside CSV with quoted matches

我试过了,这是我能想到的最好的,需要新的眼光或帮助完成这项工作。

表达式:

\"[a-zA-Z\s0-9\.\']*\"

输入字符串数据:

BOL,"AWBH0876356","HMM","H0010","BEANR","BEANR","AEJEA","BHBAH","","","T","S","","","F","N","","FCL/FCL","BE","","","","","","","","SUNNYLAND DISTRIBUTION NV","EVERDONGENLAAN 12 2300 TURNHOUT","","","INTERNATIONAL AGENCIES CO LTD","BUILDING 406, ROAD 4308, BLOCK 343, MANAMA BAHRAIN","","INTERNATIONAL AGENCIES CO LTD","BUILDING 406, ROAD 4308, BLOCK 343, MANAMA BAHRAIN","","","","","","","N/A","770000",""SHIPPER'S LOAD & COUNT, SAID TO BE:" 1X20'DC CONTAINER S.T.C 1650 CARTON OF JUICES FREIGHT PREPAID","1650","CARTONS","CTN","","","1","1","2.2","21615.0","23815.0","0","0","0","0","","",""

我需要忽略第一个单词 (BOL) 和逗号,这是有效的,但我遇到了具有特殊字符 (',") 的匹配项他们。

以下匹配是一个问题,例如:

""SHIPPER'S LOAD & COUNT, SAID TO BE:" 1X20'DC CONTAINER S.T.C 1650 CARTON OF JUICES FREIGHT PREPAID"
(?:^|(?<=,))"(?!,")(.+?)"(?=,"|$)

尝试 this.See 演示。

https://regex101.com/r/tJ2mW5/4

您的正则表达式(以及要解析的字符串)当前的问题是它不接受值中有引号而字符串有。也许,你可以指定里面可以有一个引号,但是,结束引号只能在逗号​​之前或字符串的末尾,你可以用正 lookhead 来做到这一点:

".*?"(?=,|$)

regex101 demo

".*?" 匹配值,而 (?=,|$) 确保在其后或字符串末尾有一个逗号(由 $ 表示)。

请注意,如果您的字符串的值包含引号后跟逗号,则上述正则表达式将不起作用。

那样的话,我通常做的就是统计匹配的次数。如果这超出了我的预期,我会把原始行分开,这样我就可以一个一个地查看它们(这将涉及一些人工干预,但这总比最终出现很多错误要好!)。

如果所有问题都来自一个列,那么您可以更改您的脚本,使其 'merge' 值从 ij i 是第一个问题发生的列号,j 是下一个)直到有适当数量的值.