用空格重新格式化列表字符串

reformat list string with spaces

我有一个打印到控制台的字符串列表。我需要将它转换回带引号的字符串。

假设示例文件如下所示

List(UT_LVL_17_CD, UT_LVL_20_CD, 2018 1Q, 2018 2Q, 2018 3Q, 2018 4Q, 2018 FY)
List(UT_LVL_17_CD,UT_LVL_20_CD,2018 1Q,2018 2Q,018 3Q,2018 4Q,2018 FY)
List( UT_LVL_17_CD,    UT_LVL_20_CD,2018 1Q,2018 2Q, 2018 3Q, 2018 4Q, 2018 FY )

对于上面的所有 3 种组合,输出应该是

List("UT_LVL_17_CD", "UT_LVL_20_CD", "2018 1Q", "2018 2Q", "2018 3Q", "2018 4Q", "2018 FY")

请注意,开始、结束或元素之间的空格是可以接受的。

List(  "UT_LVL_17_CD", "UT_LVL_20_CD", "2018 1Q", "2018 2Q", "2018 3Q", "2018 4Q",    "2018 FY" )

但不在字符串值内,如下所示

"     UT_LVL_17_CD"
"UT_LVL_20_CD   ",

应保留每个元素中已有的空格"2018 4Q"

我正在尝试类似下面的操作,但无法获得正确的结果。

$ perl -pe ' s/(?<=\()|(?=,)|(?=\))/\"/sg ' list.txt
List("UT_LVL_17_CD", UT_LVL_20_CD", 2018 1Q", 2018 2Q", 2018 3Q", 2018 4Q", 2018 FY")
List("UT_LVL_17_CD",UT_LVL_20_CD",2018 1Q",2018 2Q",018 3Q",2018 4Q",2018 FY")
List(" UT_LVL_17_CD",    UT_LVL_20_CD",2018 1Q",2018 2Q", 2018 3Q", 2018 4Q", 2018 FY ")
$

试试这个

(?<=\(|,)\s*(.*?)\s*(?=\)|,)

通过这个正则表达式,您可以将每个文本与开头和结尾不包含 space 的组匹配,然后附加到它“”
look at demo

perl -wpe'
    s{ \(\K ([^)]+) }
     { join ", ", map { s/^\s+|\s+$//g; qq("$_") } split /,/,  }ex
' file

看看以下是否适合您:

[(,]\K\s*(.*?)\s*(?=[),])

在线查看demo


  • [(,] - 匹配逗号或左括号。
  • \K - 重置报告比赛的起点。
  • \s* - 匹配零个或多个空格。
  • (.*?) - 第一个捕获组,用于捕获带有惰性量词的任何字符。
  • \s* - 匹配零个或多个空格。
  • (?=[),]) - 正向预测以匹配逗号或右括号。

根据链接的演示,替换为 ""

另一种选择可能是使用 \G 锚点并匹配由空格和单词字符选择性重复的单词字符。

(?:\G(?!^),|\bList\((?=[^()\r\n]*\)))\K\h*(\w+(?:\h+\w+)*)\h*

说明

  • (?:非捕获组
    • \G(?!^), 断言上一场比赛结束时的位置,但不是开始(因为 \G 可以匹配这两个位置)
    • |
    • \bList\((?=[^()\r\n]*\)) 字边界,然后匹配 List( 并在同一行断言结束 )
  • )关闭非捕获组
  • \K\h* 忘记到目前为止匹配的内容(不删除匹配的 List( 和逗号)并匹配要删除的可选空格
  • ( 捕获 组 1
    • \w+(?:\h+\w+)* 匹配 1+ 个可选的由空格和单词字符重复的单词字符
  • )\h* 关闭第 1 组并匹配要删除的可选尾随空格

Regex demo

在替换中使用双引号之间的第 1 组 ""

又一个变体:

$ perl -pne 's/\(\s+/\(/; /([^(]+\()(.+)\)/; $_="\"".join("\",\"",split(/,\s*/,)).")\n"; ' file
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY)
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","018     3Q","2018 4Q","2018 FY)
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY )

输入测试文件:

$ cat file
List(UT_LVL_17_CD, UT_LVL_20_CD, 2018 1Q, 2018 2Q, 2018 3Q, 2018 4Q, 2018 FY)
List(UT_LVL_17_CD,    UT_LVL_20_CD,2018 1Q,2018 2Q,018     3Q,2018 4Q,2018 FY)
List( UT_LVL_17_CD,    UT_LVL_20_CD,2018 1Q,2018 2Q, 2018 3Q, 2018 4Q, 2018 FY )

OP 提到 leading/trailing 空格是可以接受的……我认为这意味着去掉不必要的 leading/trailing 空格也是可以接受的。

示例输入:

$ cat string.dat
List(UT_LVL_17_CD, UT_LVL_20_CD, 2018 1Q, 2018 2Q, 2018 3Q, 2018 4Q, 2018 FY)
List(UT_LVL_17_CD,UT_LVL_20_CD,2018 1Q,2018 2Q,018 3Q,2018 4Q,2018 FY)
List( UT_LVL_17_CD,    UT_LVL_20_CD,2018 1Q,2018 2Q, 2018 3Q, 2018 4Q, 2018 FY )

一个不太紧凑的awk想法:

awk -F'[()]' '                         # input field delimiters are "(" and ")"
{ printf "%s(",                      # print field #1 + "("
  n=split(,a,",")                    # split field #2 by ",", save in array a[]
  pfx=""                               # initial prefix is ""
  for (i=1 ; i<=n ; i++)               # loop through a[] elements
      { gsub(/^ *| *$/,"",a[i])        # strip leading/trailing spaces
        printf "%s\"%s\"", pfx, a[i]   # print prefix + current a[] element wrapped in double quotes
        pfx=","                        # set prefix to "," for rest of a[] elements
      }
   printf ")\n"                        # print final ")"
}
' string.dat

这会生成:

List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY")
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","018 3Q","2018 4Q","2018 FY")
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY")