用空格重新格式化列表字符串
reformat list string with spaces
我有一个打印到控制台的字符串列表。我需要将它转换回带引号的字符串。
假设示例文件如下所示
List(UT_LVL_17_CD, UT_LVL_20_CD, 2018 1Q, 2018 2Q, 2018 3Q, 2018 4Q, 2018 FY)
List(UT_LVL_17_CD,UT_LVL_20_CD,2018 1Q,2018 2Q,018 3Q,2018 4Q,2018 FY)
List( UT_LVL_17_CD, UT_LVL_20_CD,2018 1Q,2018 2Q, 2018 3Q, 2018 4Q, 2018 FY )
对于上面的所有 3 种组合,输出应该是
List("UT_LVL_17_CD", "UT_LVL_20_CD", "2018 1Q", "2018 2Q", "2018 3Q", "2018 4Q", "2018 FY")
请注意,开始、结束或元素之间的空格是可以接受的。
List( "UT_LVL_17_CD", "UT_LVL_20_CD", "2018 1Q", "2018 2Q", "2018 3Q", "2018 4Q", "2018 FY" )
但不在字符串值内,如下所示
" UT_LVL_17_CD"
"UT_LVL_20_CD ",
应保留每个元素中已有的空格"2018 4Q"
我正在尝试类似下面的操作,但无法获得正确的结果。
$ perl -pe ' s/(?<=\()|(?=,)|(?=\))/\"/sg ' list.txt
List("UT_LVL_17_CD", UT_LVL_20_CD", 2018 1Q", 2018 2Q", 2018 3Q", 2018 4Q", 2018 FY")
List("UT_LVL_17_CD",UT_LVL_20_CD",2018 1Q",2018 2Q",018 3Q",2018 4Q",2018 FY")
List(" UT_LVL_17_CD", UT_LVL_20_CD",2018 1Q",2018 2Q", 2018 3Q", 2018 4Q", 2018 FY ")
$
试试这个
(?<=\(|,)\s*(.*?)\s*(?=\)|,)
通过这个正则表达式,您可以将每个文本与开头和结尾不包含 space 的组匹配,然后附加到它“”
look at demo
perl -wpe'
s{ \(\K ([^)]+) }
{ join ", ", map { s/^\s+|\s+$//g; qq("$_") } split /,/, }ex
' file
看看以下是否适合您:
[(,]\K\s*(.*?)\s*(?=[),])
在线查看demo
[(,]
- 匹配逗号或左括号。
\K
- 重置报告比赛的起点。
\s*
- 匹配零个或多个空格。
(.*?)
- 第一个捕获组,用于捕获带有惰性量词的任何字符。
\s*
- 匹配零个或多个空格。
(?=[),])
- 正向预测以匹配逗号或右括号。
根据链接的演示,替换为 ""
。
另一种选择可能是使用 \G
锚点并匹配由空格和单词字符选择性重复的单词字符。
(?:\G(?!^),|\bList\((?=[^()\r\n]*\)))\K\h*(\w+(?:\h+\w+)*)\h*
说明
(?:
非捕获组
\G(?!^),
断言上一场比赛结束时的位置,但不是开始(因为 \G
可以匹配这两个位置)
|
或
\bList\((?=[^()\r\n]*\))
字边界,然后匹配 List(
并在同一行断言结束 )
)
关闭非捕获组
\K\h*
忘记到目前为止匹配的内容(不删除匹配的 List(
和逗号)并匹配要删除的可选空格
(
捕获 组 1
\w+(?:\h+\w+)*
匹配 1+ 个可选的由空格和单词字符重复的单词字符
)\h*
关闭第 1 组并匹配要删除的可选尾随空格
在替换中使用双引号之间的第 1 组 ""
又一个变体:
$ perl -pne 's/\(\s+/\(/; /([^(]+\()(.+)\)/; $_="\"".join("\",\"",split(/,\s*/,)).")\n"; ' file
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY)
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","018 3Q","2018 4Q","2018 FY)
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY )
输入测试文件:
$ cat file
List(UT_LVL_17_CD, UT_LVL_20_CD, 2018 1Q, 2018 2Q, 2018 3Q, 2018 4Q, 2018 FY)
List(UT_LVL_17_CD, UT_LVL_20_CD,2018 1Q,2018 2Q,018 3Q,2018 4Q,2018 FY)
List( UT_LVL_17_CD, UT_LVL_20_CD,2018 1Q,2018 2Q, 2018 3Q, 2018 4Q, 2018 FY )
OP 提到 leading/trailing 空格是可以接受的……我认为这意味着去掉不必要的 leading/trailing 空格也是可以接受的。
示例输入:
$ cat string.dat
List(UT_LVL_17_CD, UT_LVL_20_CD, 2018 1Q, 2018 2Q, 2018 3Q, 2018 4Q, 2018 FY)
List(UT_LVL_17_CD,UT_LVL_20_CD,2018 1Q,2018 2Q,018 3Q,2018 4Q,2018 FY)
List( UT_LVL_17_CD, UT_LVL_20_CD,2018 1Q,2018 2Q, 2018 3Q, 2018 4Q, 2018 FY )
一个不太紧凑的awk
想法:
awk -F'[()]' ' # input field delimiters are "(" and ")"
{ printf "%s(", # print field #1 + "("
n=split(,a,",") # split field #2 by ",", save in array a[]
pfx="" # initial prefix is ""
for (i=1 ; i<=n ; i++) # loop through a[] elements
{ gsub(/^ *| *$/,"",a[i]) # strip leading/trailing spaces
printf "%s\"%s\"", pfx, a[i] # print prefix + current a[] element wrapped in double quotes
pfx="," # set prefix to "," for rest of a[] elements
}
printf ")\n" # print final ")"
}
' string.dat
这会生成:
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY")
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","018 3Q","2018 4Q","2018 FY")
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY")
我有一个打印到控制台的字符串列表。我需要将它转换回带引号的字符串。
假设示例文件如下所示
List(UT_LVL_17_CD, UT_LVL_20_CD, 2018 1Q, 2018 2Q, 2018 3Q, 2018 4Q, 2018 FY)
List(UT_LVL_17_CD,UT_LVL_20_CD,2018 1Q,2018 2Q,018 3Q,2018 4Q,2018 FY)
List( UT_LVL_17_CD, UT_LVL_20_CD,2018 1Q,2018 2Q, 2018 3Q, 2018 4Q, 2018 FY )
对于上面的所有 3 种组合,输出应该是
List("UT_LVL_17_CD", "UT_LVL_20_CD", "2018 1Q", "2018 2Q", "2018 3Q", "2018 4Q", "2018 FY")
请注意,开始、结束或元素之间的空格是可以接受的。
List( "UT_LVL_17_CD", "UT_LVL_20_CD", "2018 1Q", "2018 2Q", "2018 3Q", "2018 4Q", "2018 FY" )
但不在字符串值内,如下所示
" UT_LVL_17_CD"
"UT_LVL_20_CD ",
应保留每个元素中已有的空格"2018 4Q"
我正在尝试类似下面的操作,但无法获得正确的结果。
$ perl -pe ' s/(?<=\()|(?=,)|(?=\))/\"/sg ' list.txt
List("UT_LVL_17_CD", UT_LVL_20_CD", 2018 1Q", 2018 2Q", 2018 3Q", 2018 4Q", 2018 FY")
List("UT_LVL_17_CD",UT_LVL_20_CD",2018 1Q",2018 2Q",018 3Q",2018 4Q",2018 FY")
List(" UT_LVL_17_CD", UT_LVL_20_CD",2018 1Q",2018 2Q", 2018 3Q", 2018 4Q", 2018 FY ")
$
试试这个
(?<=\(|,)\s*(.*?)\s*(?=\)|,)
通过这个正则表达式,您可以将每个文本与开头和结尾不包含 space 的组匹配,然后附加到它“”
look at demo
perl -wpe'
s{ \(\K ([^)]+) }
{ join ", ", map { s/^\s+|\s+$//g; qq("$_") } split /,/, }ex
' file
看看以下是否适合您:
[(,]\K\s*(.*?)\s*(?=[),])
在线查看demo
[(,]
- 匹配逗号或左括号。\K
- 重置报告比赛的起点。\s*
- 匹配零个或多个空格。(.*?)
- 第一个捕获组,用于捕获带有惰性量词的任何字符。\s*
- 匹配零个或多个空格。(?=[),])
- 正向预测以匹配逗号或右括号。
根据链接的演示,替换为 ""
。
另一种选择可能是使用 \G
锚点并匹配由空格和单词字符选择性重复的单词字符。
(?:\G(?!^),|\bList\((?=[^()\r\n]*\)))\K\h*(\w+(?:\h+\w+)*)\h*
说明
(?:
非捕获组\G(?!^),
断言上一场比赛结束时的位置,但不是开始(因为\G
可以匹配这两个位置)|
或\bList\((?=[^()\r\n]*\))
字边界,然后匹配List(
并在同一行断言结束)
)
关闭非捕获组\K\h*
忘记到目前为止匹配的内容(不删除匹配的List(
和逗号)并匹配要删除的可选空格(
捕获 组 1\w+(?:\h+\w+)*
匹配 1+ 个可选的由空格和单词字符重复的单词字符
)\h*
关闭第 1 组并匹配要删除的可选尾随空格
在替换中使用双引号之间的第 1 组 ""
又一个变体:
$ perl -pne 's/\(\s+/\(/; /([^(]+\()(.+)\)/; $_="\"".join("\",\"",split(/,\s*/,)).")\n"; ' file
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY)
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","018 3Q","2018 4Q","2018 FY)
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY )
输入测试文件:
$ cat file
List(UT_LVL_17_CD, UT_LVL_20_CD, 2018 1Q, 2018 2Q, 2018 3Q, 2018 4Q, 2018 FY)
List(UT_LVL_17_CD, UT_LVL_20_CD,2018 1Q,2018 2Q,018 3Q,2018 4Q,2018 FY)
List( UT_LVL_17_CD, UT_LVL_20_CD,2018 1Q,2018 2Q, 2018 3Q, 2018 4Q, 2018 FY )
OP 提到 leading/trailing 空格是可以接受的……我认为这意味着去掉不必要的 leading/trailing 空格也是可以接受的。
示例输入:
$ cat string.dat
List(UT_LVL_17_CD, UT_LVL_20_CD, 2018 1Q, 2018 2Q, 2018 3Q, 2018 4Q, 2018 FY)
List(UT_LVL_17_CD,UT_LVL_20_CD,2018 1Q,2018 2Q,018 3Q,2018 4Q,2018 FY)
List( UT_LVL_17_CD, UT_LVL_20_CD,2018 1Q,2018 2Q, 2018 3Q, 2018 4Q, 2018 FY )
一个不太紧凑的awk
想法:
awk -F'[()]' ' # input field delimiters are "(" and ")"
{ printf "%s(", # print field #1 + "("
n=split(,a,",") # split field #2 by ",", save in array a[]
pfx="" # initial prefix is ""
for (i=1 ; i<=n ; i++) # loop through a[] elements
{ gsub(/^ *| *$/,"",a[i]) # strip leading/trailing spaces
printf "%s\"%s\"", pfx, a[i] # print prefix + current a[] element wrapped in double quotes
pfx="," # set prefix to "," for rest of a[] elements
}
printf ")\n" # print final ")"
}
' string.dat
这会生成:
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY")
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","018 3Q","2018 4Q","2018 FY")
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY")