LaTeX 文档字数统计
LaTeX document word statistics
我知道 counting words in a LaTeX document 有很多方法,有些方法比其他方法更精确。
我所追求的是一种对 LaTeX 文档进行简单统计的方法。也就是说,我不只是将所有单词分组并计算其长度,而是想分别计算每个单词的实例数。
输出看起来像这样:
1. (15% - 456) that
++++++++++++++++++++++++++++++++++++++++++++
2. (10% - 308) the
++++++++++++++++++++++++++++++
3. (8% - 213) is
+++++++++++++++++++++
4. (4% - 102) of
+++++++++
5. (2% - 55) and
++++
是否有任何工具可以执行类似的操作?
我找不到任何 package/script 来做我需要的,所以我最终建立了自己的。
这是一个小的(基本的)Python 脚本,但它完成了工作。输出如下所示:
Number of unique words: 1945
Total number of words: 16660
0. 1210 (7.26%) - the
1. 461 (2.77%) - in
2. 431 (2.59%) - of
3. 317 (1.90%) - a
4. 313 (1.88%) - and
5. 304 (1.82%) - for
6. 304 (1.82%) - to
7. 241 (1.45%) - is
8. 176 (1.06%) - words
9. 165 (0.99%) - by
Sum percentage: 23.5%
Word lengths distribution:
1 ++ (317)
2 ++++++++++++++++++++ (2602)
3 ++++++++++++++++++++++++++++++ (3947)
4 ++++++++++++++++++ (2342)
5 +++++++++++++ (1752)
6 ++++++++++ (1348)
7 +++++++++ (1154)
8 ++++++++ (1071)
9 ++++++ (787)
10 ++++ (586)
11 +++ (383)
12 + (129)
13 + (123)
14 + (36)
15 + (83)
已上传到 Github 存储库:LaTexWordStats。
我知道 counting words in a LaTeX document 有很多方法,有些方法比其他方法更精确。
我所追求的是一种对 LaTeX 文档进行简单统计的方法。也就是说,我不只是将所有单词分组并计算其长度,而是想分别计算每个单词的实例数。
输出看起来像这样:
1. (15% - 456) that
++++++++++++++++++++++++++++++++++++++++++++
2. (10% - 308) the
++++++++++++++++++++++++++++++
3. (8% - 213) is
+++++++++++++++++++++
4. (4% - 102) of
+++++++++
5. (2% - 55) and
++++
是否有任何工具可以执行类似的操作?
我找不到任何 package/script 来做我需要的,所以我最终建立了自己的。
这是一个小的(基本的)Python 脚本,但它完成了工作。输出如下所示:
Number of unique words: 1945
Total number of words: 16660
0. 1210 (7.26%) - the
1. 461 (2.77%) - in
2. 431 (2.59%) - of
3. 317 (1.90%) - a
4. 313 (1.88%) - and
5. 304 (1.82%) - for
6. 304 (1.82%) - to
7. 241 (1.45%) - is
8. 176 (1.06%) - words
9. 165 (0.99%) - by
Sum percentage: 23.5%
Word lengths distribution:
1 ++ (317)
2 ++++++++++++++++++++ (2602)
3 ++++++++++++++++++++++++++++++ (3947)
4 ++++++++++++++++++ (2342)
5 +++++++++++++ (1752)
6 ++++++++++ (1348)
7 +++++++++ (1154)
8 ++++++++ (1071)
9 ++++++ (787)
10 ++++ (586)
11 +++ (383)
12 + (129)
13 + (123)
14 + (36)
15 + (83)
已上传到 Github 存储库:LaTexWordStats。