对每个部分的单词进行排序

sort words per section

我有这个文本文件,需要按部分排序。

#cat raw_file.txt

== other info ==
===instructions===
===english words===
this
is
only
test


=== missing words ===

==== include words ====
some
more
words

==== customer name ====
ram
sham
amar
akbar
anthony

==== cities ====
mumbai
delhi
pune


=== prefix ===

the
a
an

如果我“按原样”对其进行排序,那么它将以 2 个等号开头,后跟 3 个等号,然后是所有单词。如何分别对每个部分的单词进行排序?

# sort raw_file.txt

== other info ==
=== missing words ===
=== prefix ===
==== cities ====
==== customer name ====
==== include words ====
===english words===
===instructions===
a
akbar
amar
an
anthony
delhi
is
more
mumbai
only
pune
ram
sham
some
test
the
this
words

如果重要的话,这是 mediawiki 格式。我正在对每个部分进行排序,这会花费很多时间。

#cat expected_output.txt

== other info ==
===instructions===
===english words===
is
only
test
this

=== missing words ===

==== include words ====
more
some
words

==== customer name ====
akbar
amar
anthony
ram
sham

==== cities ====
delhi
mumbai
pune

=== prefix ===
a
an
the

如果您不担心保留空白行,您可以使用:

awk '/=/ {c++} {print c+1, [=10=]}' file.txt | sort -n | cut -d' ' -f2- | sed '/^$/d'
>== other info ==
>===instructions===
>===english words===
>is
>only
>test
>this
>=== missing words ===
>==== include words ====
>more
>some
>words
>==== customer name ====
>akbar
>amar
>anthony
>ram
>sham
>==== cities ====
>delhi
>mumbai
>pune
>=== prefix ===
>a
>an
>the

这种方法的工作原理是在每一行附加一个索引号,并在每次该行包含“=”时将索引递增 1,然后首先根据索引号排序,然后根据实际单词排序,然后删除索引并删除空行(排序后在每个 'section' 的顶部结束)。

编辑

我刚看到@Bing王的评论-这基本上就是他建议你做的

这也将保留确切数量的空格,按照正常的排序顺序,它们将出现在顶部,因此将其添加到每个部分的底部

$ awk 'BEGIN {s="sort"} 
       !NF   {c++} 
       /^=/  {close(s); 
              for(i=1;i<=c;i++) print ""; 
              c=0; 
              print; 
              next} 
       NF    {print | s}' file

将生成...

== other info ==
===instructions===
===english words===
is
only
test
this


=== missing words ===

==== include words ====
more
some
words

==== customer name ====
akbar
amar
anthony
ram
sham

==== cities ====
delhi
mumbai
pune


=== prefix ===
a
an
the