行到列

Question

我的文本文件如下：

TOPIC:  0 161416.0

the 10758.0
. 6330.0
, 5043.0
<unknown> 4591.0
in 4521.0
be 4476.0
of 3759.0

TOPIC:  1 93549.0

the 6957.0
, 4170.0
of 3624.0
. 3468.0
<unknown> 2321.0
be 2121.0
a 2073.0
in 1998.0

等等。我在文件中有大约 2000 个主题。

我已经试过了

awk -v RS= '/----/{next}{gsub(/\n/,",")}7' Input File

但是，我得到的输出是

TOPIC:  0 161416.0

the 10758.0,. 6330.0,, 5043.0,<unknown> 4591.0,in 4521.0,be 4476.0,of     3759.0

TOPIC:  1 93549.0

the 6957.0,, 4170.0,of 3624.0,. 3468.0,<unknown> 2321.0,be 2121.0,a 2073.0,in 1998.0

但我需要输出为

我希望我的输出如下：

TOPIC:  0 161416   TOPIC:  1 93549.0  ........... TOPIC:  N

. 6330.0            , 4170.0                      .
.                   of 3624.0                     .
.                   .                             .
.                   .                             .
.                   .

等等....

这里是 words/topics 和它们各自的 weights/values。

PS：每个主题基本上没有相同数量的元素。主题 0 可能有 100 个元素，主题 1 可能有 300 个元素，依此类推

Answer 1

Perl 解决方案：

perl -lne 'chomp;
           if (/Topic: .*/) { push @t, [$_] }
           else { push @{ $t[-1] }, $_ }
           $max = @{ $t[-1] } if @{ $t[-1] } > $max;
           }{
           for $i (0 .. $max-1) {
               print join "\t", map $t[$_][$i], 0 .. $#t
           }' < input > output

行到列

Rows to Columns

linux

row

multiple-columns

python-2.7