Jupyter Notebook 中 Markdown 单元格的字数统计
Word count of Markdown cells in Jupyter Notebook
有没有办法在 Jupyter Notebook 中仅对 markdown 单元格进行字数统计,如果可能的话,在笔记本本身内?谢谢
编辑:看来在notebook里面做比较复杂,我很满意只用外部解决方案
Jupyter notebook 只是一个 JSON 文件(.ipynb
-文件)。我们可以使用 Python 解析此 JSON 并使用 'cell_type': 'markdown'
过滤单元格并将 source
内容减少到字数。
为了解析 JSON 文件,我们可以使用内置的 JSON encover/decoder 库,如下所示。
import json
with open('test.ipynb') as json_file:
data = json.load(json_file)
print(data)
test.ipynb
是一个带有两个代码单元和两个降价单元的 Jupyter 笔记本。 data
的输出如下。
{
"cells":[
{
"cell_type":"markdown",
"metadata":{
},
"source":[
"# This is a markdown file\n",
"Hello World"
]
},
{
"cell_type":"code",
"execution_count":2,
"metadata":{
},
"outputs":[
{
"name":"stdout",
"output_type":"stream",
"text":[
"Hello World\n"
]
}
],
"source":[
"print(\"Hello World\")"
]
},
{
"cell_type":"code",
"execution_count":3,
"metadata":{
},
"outputs":[
{
"name":"stdout",
"output_type":"stream",
"text":[
"Hello World 2\n"
]
}
],
"source":[
"print(\"Hello World 2\")"
]
},
{
"cell_type":"markdown",
"metadata":{
},
"source":[
"## More markdown\n",
"hello"
]
}
],
"metadata":{
"interpreter":{
"hash":"e7370f93d1d0cde622a1f8e1c04877d8463912d04d973331ad4851f04de6915a"
},
"kernelspec":{
"display_name":"Python 3.10.2 64-bit",
"language":"python",
"name":"python3"
},
"language_info":{
"codemirror_mode":{
"name":"ipython",
"version":3
},
"file_extension":".py",
"mimetype":"text/x-python",
"name":"python",
"nbconvert_exporter":"python",
"pygments_lexer":"ipython3",
"version":"3.10.2"
},
"orig_nbformat":4
},
"nbformat":4,
"nbformat_minor":2
}
从 markdown
类型的单元格的 source
映射中检索所有字符串的可能函数如下所示。
wordCount = 0
for each in data['cells']:
cellType = each['cell_type']
if cellType == "markdown":
content = each['source']
for line in content:
temp = [word for word in line.split() if "#" not in word] # we might need to filter for more markdown keywords here
wordCount = wordCount + len(temp)
print(wordCount)
有没有办法在 Jupyter Notebook 中仅对 markdown 单元格进行字数统计,如果可能的话,在笔记本本身内?谢谢
编辑:看来在notebook里面做比较复杂,我很满意只用外部解决方案
Jupyter notebook 只是一个 JSON 文件(.ipynb
-文件)。我们可以使用 Python 解析此 JSON 并使用 'cell_type': 'markdown'
过滤单元格并将 source
内容减少到字数。
为了解析 JSON 文件,我们可以使用内置的 JSON encover/decoder 库,如下所示。
import json
with open('test.ipynb') as json_file:
data = json.load(json_file)
print(data)
test.ipynb
是一个带有两个代码单元和两个降价单元的 Jupyter 笔记本。 data
的输出如下。
{
"cells":[
{
"cell_type":"markdown",
"metadata":{
},
"source":[
"# This is a markdown file\n",
"Hello World"
]
},
{
"cell_type":"code",
"execution_count":2,
"metadata":{
},
"outputs":[
{
"name":"stdout",
"output_type":"stream",
"text":[
"Hello World\n"
]
}
],
"source":[
"print(\"Hello World\")"
]
},
{
"cell_type":"code",
"execution_count":3,
"metadata":{
},
"outputs":[
{
"name":"stdout",
"output_type":"stream",
"text":[
"Hello World 2\n"
]
}
],
"source":[
"print(\"Hello World 2\")"
]
},
{
"cell_type":"markdown",
"metadata":{
},
"source":[
"## More markdown\n",
"hello"
]
}
],
"metadata":{
"interpreter":{
"hash":"e7370f93d1d0cde622a1f8e1c04877d8463912d04d973331ad4851f04de6915a"
},
"kernelspec":{
"display_name":"Python 3.10.2 64-bit",
"language":"python",
"name":"python3"
},
"language_info":{
"codemirror_mode":{
"name":"ipython",
"version":3
},
"file_extension":".py",
"mimetype":"text/x-python",
"name":"python",
"nbconvert_exporter":"python",
"pygments_lexer":"ipython3",
"version":"3.10.2"
},
"orig_nbformat":4
},
"nbformat":4,
"nbformat_minor":2
}
从 markdown
类型的单元格的 source
映射中检索所有字符串的可能函数如下所示。
wordCount = 0
for each in data['cells']:
cellType = each['cell_type']
if cellType == "markdown":
content = each['source']
for line in content:
temp = [word for word in line.split() if "#" not in word] # we might need to filter for more markdown keywords here
wordCount = wordCount + len(temp)
print(wordCount)