awk 与多行正则表达式;基于 awk 匹配的输出文件名
awk with multiline regex; output filename based on awk match
我目前正在尝试从 22kLoC 文件中提取 300 多个函数和子例程,并决定尝试以编程方式进行(我为 'biggest' 块手动完成)。
考虑以下形式的文件
declare sub DoStatsTab12( byval shortlga as string)
declare sub DoStatsTab13( byval shortlga as string)
declare sub ZOMFGAnotherSub
Other lines that start with something other than "/^sub \w+/" or "/^end sub/"
sub main
This is the first sub: it should be in the output file mainFunc.txt
end sub
sub test
This is a second sub
it has more lines than the first.
It is supposed to go to testFunc.txt
end sub
Function ConvertFileName(ByVal sTheName As String) As String
This is a function so I should not see it if I am awking subs
But when I alter the awk to chunk out functions, it will go to ConvertFileNameFunc.txt
End Function
sub InitialiseVars(a, b, c)
This sub has some arguments - next step is to parse out its arguments
Code code code;
more code;
' maybe a comment, even?
and some code which is badly indented (original code was written by a guy who didn't believe in structure or documentation)
and
with an arbitrary number of newlines between bits of code because why not?
So anyhow - the output of awk should be everything from sub InitialiseVars to end sub, and should go into InitialiseVarsFunc.txt
end sub
要点:找到以
^sub [subName](subArgs)
并以
^end sub
然后(这是我想不通的地方):将提取的子程序保存到名为[subName]Func.txt
的文件中
awk
将自己推荐为候选人(我过去曾使用 preg_match()
在 PHP 中编写过文本提取正则表达式查询,但我不想指望有WAMP/LAMP 可用性)。
我的出发点是令人愉快的简约(双引号因为 Windows)
awk "/^sub/,/^end sub/" fName
这会找到相关的块(并将它们打印到标准输出)。
将输出放入文件并在 awk
捕获的
之后命名文件的步骤超出了我的范围。
这个过程的早期阶段涉及 awk
-ing 子例程名称并存储它们:这很容易,因为每个子程序都由
形式的一行声明
declare sub [subName](subArgs)
所以这个做到了,而且做得很完美 -
awk "match([=13=], /declare sub (\w+)/)
{print substr(, RSTART, index(, \"(\")>0 ? index(, \"(\")-1: RLENGTH)
> substr(, RSTART, index(, \"(\")>0 ? index(, \"(\")-1: RLENGTH)\".txt\"}"
fName
(我试图展示它,以便很容易看出 awk
的输出文件名和
- 如果有的话,解析到第一个 ')' -是一样的)。
在我看来,如果
的输出
awk '/^sub/,/^end sub/' fName
被连接成一个数组,然后 $2 (在 '(' 处适当截断)将起作用。但它没有。
我查看了处理多行的各种 SO(和其他 SE 系列)线程 awk
- 例如,this one and this one,但没有一个给我足够的提醒问题(它们有助于获得匹配本身,但不会将其通过管道传输到以其自身命名的文件)。
我有 awk
(和 grep
)的 RTFD,也无济于事。
我建议
awk -F '[ (]*' ' # Field separator is space or open paren (for
# parameter lists). * because there may be multiple
# spaces, and parens only appear after the stuff we
# want to extract.
BEGIN { IGNORECASE = 1 } # case-insensitive pattern matching is probably
# a good idea because Basic is case-insensitive.
/^sub/ { # if the current line begins with "sub"
outfile = "Func.bas" # set the output file name
flag = 1 # and the flag to know that output should happen
}
flag == 1 { # if the flag is set
print > outfile # print the line to the outfile
}
/^end sub/ { # when the sub ends,
flag = 0 # unset the flag
}
' foo.bas
请注意,使用简单的模式匹配工具解析源代码很容易出错,因为编程语言通常不是常规语言(除了 Brainfuck 的一些例外)。这种事情总是取决于代码的格式。
例如,如果在代码的某处将子声明分成两行(我相信 _
可以做到这一点,尽管 Basic 不是我每天都做的事情),尝试从其定义的第一行中提取子名称是徒劳的。格式化也可能对必要的模式进行微调;一行开头的多余空格之类的东西需要处理。严格将这些东西用于一次性代码转换并验证它是否产生了预期的结果,不要试图让它成为常规工作流程的一部分。
另一种 awk 方式
awk -F'[ (]' 'x+=(/^sub/&&file="Func.txt"){print > file}/^end sub/{x=file=""}' file
说明
awk -F'[ (]' - Set field separator to space or brackets
x+=(/^sub/&&file="Func.txt") - Sets x to 1 if line begins with sub and sets file
to the second field + func.txt. As this is a
condition that is checking if x is true then the
next block will repeatedly be executed until x
is unset.
{print > file} - Whilst x is true print the line into the set filename
/^end sub/{x=file=""} - If line begins with end sub then set both x and file
to nothing.
我目前正在尝试从 22kLoC 文件中提取 300 多个函数和子例程,并决定尝试以编程方式进行(我为 'biggest' 块手动完成)。
考虑以下形式的文件
declare sub DoStatsTab12( byval shortlga as string)
declare sub DoStatsTab13( byval shortlga as string)
declare sub ZOMFGAnotherSub
Other lines that start with something other than "/^sub \w+/" or "/^end sub/"
sub main
This is the first sub: it should be in the output file mainFunc.txt
end sub
sub test
This is a second sub
it has more lines than the first.
It is supposed to go to testFunc.txt
end sub
Function ConvertFileName(ByVal sTheName As String) As String
This is a function so I should not see it if I am awking subs
But when I alter the awk to chunk out functions, it will go to ConvertFileNameFunc.txt
End Function
sub InitialiseVars(a, b, c)
This sub has some arguments - next step is to parse out its arguments
Code code code;
more code;
' maybe a comment, even?
and some code which is badly indented (original code was written by a guy who didn't believe in structure or documentation)
and
with an arbitrary number of newlines between bits of code because why not?
So anyhow - the output of awk should be everything from sub InitialiseVars to end sub, and should go into InitialiseVarsFunc.txt
end sub
要点:找到以
^sub [subName](subArgs)
并以
^end sub
然后(这是我想不通的地方):将提取的子程序保存到名为[subName]Func.txt
awk
将自己推荐为候选人(我过去曾使用 preg_match()
在 PHP 中编写过文本提取正则表达式查询,但我不想指望有WAMP/LAMP 可用性)。
我的出发点是令人愉快的简约(双引号因为 Windows)
awk "/^sub/,/^end sub/" fName
这会找到相关的块(并将它们打印到标准输出)。
将输出放入文件并在 awk
捕获的 之后命名文件的步骤超出了我的范围。
这个过程的早期阶段涉及 awk
-ing 子例程名称并存储它们:这很容易,因为每个子程序都由
declare sub [subName](subArgs)
所以这个做到了,而且做得很完美 -
awk "match([=13=], /declare sub (\w+)/)
{print substr(, RSTART, index(, \"(\")>0 ? index(, \"(\")-1: RLENGTH)
> substr(, RSTART, index(, \"(\")>0 ? index(, \"(\")-1: RLENGTH)\".txt\"}"
fName
(我试图展示它,以便很容易看出 awk
的输出文件名和 - 如果有的话,解析到第一个 ')' -是一样的)。
在我看来,如果
的输出awk '/^sub/,/^end sub/' fName
被连接成一个数组,然后 $2 (在 '(' 处适当截断)将起作用。但它没有。
我查看了处理多行的各种 SO(和其他 SE 系列)线程 awk
- 例如,this one and this one,但没有一个给我足够的提醒问题(它们有助于获得匹配本身,但不会将其通过管道传输到以其自身命名的文件)。
我有 awk
(和 grep
)的 RTFD,也无济于事。
我建议
awk -F '[ (]*' ' # Field separator is space or open paren (for
# parameter lists). * because there may be multiple
# spaces, and parens only appear after the stuff we
# want to extract.
BEGIN { IGNORECASE = 1 } # case-insensitive pattern matching is probably
# a good idea because Basic is case-insensitive.
/^sub/ { # if the current line begins with "sub"
outfile = "Func.bas" # set the output file name
flag = 1 # and the flag to know that output should happen
}
flag == 1 { # if the flag is set
print > outfile # print the line to the outfile
}
/^end sub/ { # when the sub ends,
flag = 0 # unset the flag
}
' foo.bas
请注意,使用简单的模式匹配工具解析源代码很容易出错,因为编程语言通常不是常规语言(除了 Brainfuck 的一些例外)。这种事情总是取决于代码的格式。
例如,如果在代码的某处将子声明分成两行(我相信 _
可以做到这一点,尽管 Basic 不是我每天都做的事情),尝试从其定义的第一行中提取子名称是徒劳的。格式化也可能对必要的模式进行微调;一行开头的多余空格之类的东西需要处理。严格将这些东西用于一次性代码转换并验证它是否产生了预期的结果,不要试图让它成为常规工作流程的一部分。
另一种 awk 方式
awk -F'[ (]' 'x+=(/^sub/&&file="Func.txt"){print > file}/^end sub/{x=file=""}' file
说明
awk -F'[ (]' - Set field separator to space or brackets
x+=(/^sub/&&file="Func.txt") - Sets x to 1 if line begins with sub and sets file
to the second field + func.txt. As this is a
condition that is checking if x is true then the
next block will repeatedly be executed until x
is unset.
{print > file} - Whilst x is true print the line into the set filename
/^end sub/{x=file=""} - If line begins with end sub then set both x and file
to nothing.