Jupyter Notebook 中 Markdown 单元格的字数统计

Word count of Markdown cells in Jupyter Notebook

有没有办法在 Jupyter Notebook 中仅对 markdown 单元格进行字数统计,如果可能的话,在笔记本本身内?谢谢

编辑:看来在notebook里面做比较复杂,我很满意只用外部解决方案

Jupyter notebook 只是一个 JSON 文件(.ipynb-文件)。我们可以使用 Python 解析此 JSON 并使用 'cell_type': 'markdown' 过滤单元格并将 source 内容减少到字数。

为了解析 JSON 文件,我们可以使用内置的 JSON encover/decoder 库,如下所示。

import json

with open('test.ipynb') as json_file:
    data = json.load(json_file)

print(data)

test.ipynb 是一个带有两个代码单元和两个降价单元的 Jupyter 笔记本。 data的输出如下。

{
   "cells":[
      {
         "cell_type":"markdown",
         "metadata":{
            
         },
         "source":[
            "# This is a markdown file\n",
            "Hello World"
         ]
      },
      {
         "cell_type":"code",
         "execution_count":2,
         "metadata":{
            
         },
         "outputs":[
            {
               "name":"stdout",
               "output_type":"stream",
               "text":[
                  "Hello World\n"
               ]
            }
         ],
         "source":[
            "print(\"Hello World\")"
         ]
      },
      {
         "cell_type":"code",
         "execution_count":3,
         "metadata":{
            
         },
         "outputs":[
            {
               "name":"stdout",
               "output_type":"stream",
               "text":[
                  "Hello World 2\n"
               ]
            }
         ],
         "source":[
            "print(\"Hello World 2\")"
         ]
      },
      {
         "cell_type":"markdown",
         "metadata":{
            
         },
         "source":[
            "## More markdown\n",
            "hello"
         ]
      }
   ],
   "metadata":{
      "interpreter":{
         "hash":"e7370f93d1d0cde622a1f8e1c04877d8463912d04d973331ad4851f04de6915a"
      },
      "kernelspec":{
         "display_name":"Python 3.10.2 64-bit",
         "language":"python",
         "name":"python3"
      },
      "language_info":{
         "codemirror_mode":{
            "name":"ipython",
            "version":3
         },
         "file_extension":".py",
         "mimetype":"text/x-python",
         "name":"python",
         "nbconvert_exporter":"python",
         "pygments_lexer":"ipython3",
         "version":"3.10.2"
      },
      "orig_nbformat":4
   },
   "nbformat":4,
   "nbformat_minor":2
}

markdown 类型的单元格的 source 映射中检索所有字符串的可能函数如下所示。

wordCount = 0
for each in data['cells']:
    cellType = each['cell_type']
    if cellType == "markdown":
        content = each['source']
        for line in content:
            temp = [word for word in line.split() if "#" not in word] # we might need to filter for more markdown keywords here
            wordCount = wordCount + len(temp)
            
print(wordCount)