对每个部分的单词进行排序
sort words per section
我有这个文本文件,需要按部分排序。
#cat raw_file.txt
== other info ==
===instructions===
===english words===
this
is
only
test
=== missing words ===
==== include words ====
some
more
words
==== customer name ====
ram
sham
amar
akbar
anthony
==== cities ====
mumbai
delhi
pune
=== prefix ===
the
a
an
如果我“按原样”对其进行排序,那么它将以 2 个等号开头,后跟 3 个等号,然后是所有单词。如何分别对每个部分的单词进行排序?
# sort raw_file.txt
== other info ==
=== missing words ===
=== prefix ===
==== cities ====
==== customer name ====
==== include words ====
===english words===
===instructions===
a
akbar
amar
an
anthony
delhi
is
more
mumbai
only
pune
ram
sham
some
test
the
this
words
如果重要的话,这是 mediawiki 格式。我正在对每个部分进行排序,这会花费很多时间。
#cat expected_output.txt
== other info ==
===instructions===
===english words===
is
only
test
this
=== missing words ===
==== include words ====
more
some
words
==== customer name ====
akbar
amar
anthony
ram
sham
==== cities ====
delhi
mumbai
pune
=== prefix ===
a
an
the
如果您不担心保留空白行,您可以使用:
awk '/=/ {c++} {print c+1, [=10=]}' file.txt | sort -n | cut -d' ' -f2- | sed '/^$/d'
>== other info ==
>===instructions===
>===english words===
>is
>only
>test
>this
>=== missing words ===
>==== include words ====
>more
>some
>words
>==== customer name ====
>akbar
>amar
>anthony
>ram
>sham
>==== cities ====
>delhi
>mumbai
>pune
>=== prefix ===
>a
>an
>the
这种方法的工作原理是在每一行附加一个索引号,并在每次该行包含“=”时将索引递增 1,然后首先根据索引号排序,然后根据实际单词排序,然后删除索引并删除空行(排序后在每个 'section' 的顶部结束)。
编辑
我刚看到@Bing王的评论-这基本上就是他建议你做的
这也将保留确切数量的空格,按照正常的排序顺序,它们将出现在顶部,因此将其添加到每个部分的底部
$ awk 'BEGIN {s="sort"}
!NF {c++}
/^=/ {close(s);
for(i=1;i<=c;i++) print "";
c=0;
print;
next}
NF {print | s}' file
将生成...
== other info ==
===instructions===
===english words===
is
only
test
this
=== missing words ===
==== include words ====
more
some
words
==== customer name ====
akbar
amar
anthony
ram
sham
==== cities ====
delhi
mumbai
pune
=== prefix ===
a
an
the
我有这个文本文件,需要按部分排序。
#cat raw_file.txt
== other info ==
===instructions===
===english words===
this
is
only
test
=== missing words ===
==== include words ====
some
more
words
==== customer name ====
ram
sham
amar
akbar
anthony
==== cities ====
mumbai
delhi
pune
=== prefix ===
the
a
an
如果我“按原样”对其进行排序,那么它将以 2 个等号开头,后跟 3 个等号,然后是所有单词。如何分别对每个部分的单词进行排序?
# sort raw_file.txt
== other info ==
=== missing words ===
=== prefix ===
==== cities ====
==== customer name ====
==== include words ====
===english words===
===instructions===
a
akbar
amar
an
anthony
delhi
is
more
mumbai
only
pune
ram
sham
some
test
the
this
words
如果重要的话,这是 mediawiki 格式。我正在对每个部分进行排序,这会花费很多时间。
#cat expected_output.txt
== other info ==
===instructions===
===english words===
is
only
test
this
=== missing words ===
==== include words ====
more
some
words
==== customer name ====
akbar
amar
anthony
ram
sham
==== cities ====
delhi
mumbai
pune
=== prefix ===
a
an
the
如果您不担心保留空白行,您可以使用:
awk '/=/ {c++} {print c+1, [=10=]}' file.txt | sort -n | cut -d' ' -f2- | sed '/^$/d'
>== other info ==
>===instructions===
>===english words===
>is
>only
>test
>this
>=== missing words ===
>==== include words ====
>more
>some
>words
>==== customer name ====
>akbar
>amar
>anthony
>ram
>sham
>==== cities ====
>delhi
>mumbai
>pune
>=== prefix ===
>a
>an
>the
这种方法的工作原理是在每一行附加一个索引号,并在每次该行包含“=”时将索引递增 1,然后首先根据索引号排序,然后根据实际单词排序,然后删除索引并删除空行(排序后在每个 'section' 的顶部结束)。
编辑
我刚看到@Bing王的评论-这基本上就是他建议你做的
这也将保留确切数量的空格,按照正常的排序顺序,它们将出现在顶部,因此将其添加到每个部分的底部
$ awk 'BEGIN {s="sort"}
!NF {c++}
/^=/ {close(s);
for(i=1;i<=c;i++) print "";
c=0;
print;
next}
NF {print | s}' file
将生成...
== other info ==
===instructions===
===english words===
is
only
test
this
=== missing words ===
==== include words ====
more
some
words
==== customer name ====
akbar
amar
anthony
ram
sham
==== cities ====
delhi
mumbai
pune
=== prefix ===
a
an
the