CSV 中带有特殊字符的正则表达式和带引号的匹配项
Regular expression with special character inside CSV with quoted matches
我试过了,这是我能想到的最好的,需要新的眼光或帮助完成这项工作。
表达式:
\"[a-zA-Z\s0-9\.\']*\"
输入字符串数据:
BOL,"AWBH0876356","HMM","H0010","BEANR","BEANR","AEJEA","BHBAH","","","T","S","","","F","N","","FCL/FCL","BE","","","","","","","","SUNNYLAND DISTRIBUTION NV","EVERDONGENLAAN 12 2300 TURNHOUT","","","INTERNATIONAL AGENCIES CO LTD","BUILDING 406, ROAD 4308, BLOCK 343, MANAMA BAHRAIN","","INTERNATIONAL AGENCIES CO LTD","BUILDING 406, ROAD 4308, BLOCK 343, MANAMA BAHRAIN","","","","","","","N/A","770000",""SHIPPER'S LOAD & COUNT, SAID TO BE:" 1X20'DC CONTAINER S.T.C 1650 CARTON OF JUICES FREIGHT PREPAID","1650","CARTONS","CTN","","","1","1","2.2","21615.0","23815.0","0","0","0","0","","",""
我需要忽略第一个单词 (BOL
) 和逗号,这是有效的,但我遇到了具有特殊字符 ('
,"
) 的匹配项他们。
以下匹配是一个问题,例如:
""SHIPPER'S LOAD & COUNT, SAID TO BE:" 1X20'DC CONTAINER S.T.C 1650 CARTON OF JUICES FREIGHT PREPAID"
(?:^|(?<=,))"(?!,")(.+?)"(?=,"|$)
尝试 this.See 演示。
您的正则表达式(以及要解析的字符串)当前的问题是它不接受值中有引号而字符串有。也许,你可以指定里面可以有一个引号,但是,结束引号只能在逗号之前或字符串的末尾,你可以用正 lookhead 来做到这一点:
".*?"(?=,|$)
".*?"
匹配值,而 (?=,|$)
确保在其后或字符串末尾有一个逗号(由 $
表示)。
请注意,如果您的字符串的值包含引号后跟逗号,则上述正则表达式将不起作用。
那样的话,我通常做的就是统计匹配的次数。如果这超出了我的预期,我会把原始行分开,这样我就可以一个一个地查看它们(这将涉及一些人工干预,但这总比最终出现很多错误要好!)。
如果所有问题都来自一个列,那么您可以更改您的脚本,使其 'merge' 值从 i 到 j (i 是第一个问题发生的列号,j 是下一个)直到有适当数量的值.
我试过了,这是我能想到的最好的,需要新的眼光或帮助完成这项工作。
表达式:
\"[a-zA-Z\s0-9\.\']*\"
输入字符串数据:
BOL,"AWBH0876356","HMM","H0010","BEANR","BEANR","AEJEA","BHBAH","","","T","S","","","F","N","","FCL/FCL","BE","","","","","","","","SUNNYLAND DISTRIBUTION NV","EVERDONGENLAAN 12 2300 TURNHOUT","","","INTERNATIONAL AGENCIES CO LTD","BUILDING 406, ROAD 4308, BLOCK 343, MANAMA BAHRAIN","","INTERNATIONAL AGENCIES CO LTD","BUILDING 406, ROAD 4308, BLOCK 343, MANAMA BAHRAIN","","","","","","","N/A","770000",""SHIPPER'S LOAD & COUNT, SAID TO BE:" 1X20'DC CONTAINER S.T.C 1650 CARTON OF JUICES FREIGHT PREPAID","1650","CARTONS","CTN","","","1","1","2.2","21615.0","23815.0","0","0","0","0","","",""
我需要忽略第一个单词 (BOL
) 和逗号,这是有效的,但我遇到了具有特殊字符 ('
,"
) 的匹配项他们。
以下匹配是一个问题,例如:
""SHIPPER'S LOAD & COUNT, SAID TO BE:" 1X20'DC CONTAINER S.T.C 1650 CARTON OF JUICES FREIGHT PREPAID"
(?:^|(?<=,))"(?!,")(.+?)"(?=,"|$)
尝试 this.See 演示。
您的正则表达式(以及要解析的字符串)当前的问题是它不接受值中有引号而字符串有。也许,你可以指定里面可以有一个引号,但是,结束引号只能在逗号之前或字符串的末尾,你可以用正 lookhead 来做到这一点:
".*?"(?=,|$)
".*?"
匹配值,而 (?=,|$)
确保在其后或字符串末尾有一个逗号(由 $
表示)。
请注意,如果您的字符串的值包含引号后跟逗号,则上述正则表达式将不起作用。
那样的话,我通常做的就是统计匹配的次数。如果这超出了我的预期,我会把原始行分开,这样我就可以一个一个地查看它们(这将涉及一些人工干预,但这总比最终出现很多错误要好!)。
如果所有问题都来自一个列,那么您可以更改您的脚本,使其 'merge' 值从 i 到 j (i 是第一个问题发生的列号,j 是下一个)直到有适当数量的值.