使用许多 <h2> 标记中的第一个重命名 HTML 文件,如果包含正斜杠,则将其替换为连字符
Renaming HTML files using the first of many <h2> tags, if forward slash is contained then replace this with hyphen
我有一个包含一堆 html 个文件的文件夹:
- SMG6E30A14100000000DAAT00.html
- SMB6E30A14400000000DAAT00.html
- SMA6E30A14400120000DAAT00.html
- 等...
我想根据每个文件中的第一个 h2
标签重命名每个文件,如果标签包含正斜杠,则应将斜杠替换为连字符。
所以如果 SMG6E30A14100000000DAAT00.html 包含
</head><body><h2>Side Impact/Sensor (Second) Replacement</h2><a name="iR01"></a><h2><b>Removal</b></h2>
我希望脚本将文件重命名为 Impact-Sensor (Second) Replacement.html
and if(第一个 h2 标签之间没有斜杠)
<h2>Front Seat Belt Replacement</h2>SRS components are located in this area. <a href="./SMG6E00H46400000000DAAT00.html">Review the SRS component locations</a> and the <a href="./SMG6E00H46400000000AAAT00.html">precautions and procedures</a> in the SRS before doing repairs or service.<br><br>NOTE: Check the front seat belts for damage, and replace them if necessary. Be careful not to damage them during removal and installation.<br><br><a name="iR01"></a><h2><b>Front Seat Belt</b></h2>
相应地重命名为 Front Seat Belt Replacement.html
如何在 linux 上执行此操作?
以下命令 returns 所需的文件名 test.html
。
< ./test.html tr -d '\n' | grep -oP -m 1 '(?<=<h2>).*?(?=</h2>)' | head -1 | tr '/' '-'
您可以创建一个 shell 脚本,循环使用它来扫描所有文件,获取新文件名并重命名它们。
for filename in ./input/*.html; do
newname=$(< ${filename} tr -d '\n' | grep -oP -m 1 '(?<=<h2>).*?(?=</h2>)' | head -1 | tr '/' '-')
mv ${filename} "./output/${newname}.html"
done
我有一个包含一堆 html 个文件的文件夹:
- SMG6E30A14100000000DAAT00.html
- SMB6E30A14400000000DAAT00.html
- SMA6E30A14400120000DAAT00.html
- 等...
我想根据每个文件中的第一个 h2
标签重命名每个文件,如果标签包含正斜杠,则应将斜杠替换为连字符。
所以如果 SMG6E30A14100000000DAAT00.html 包含
</head><body><h2>Side Impact/Sensor (Second) Replacement</h2><a name="iR01"></a><h2><b>Removal</b></h2>
我希望脚本将文件重命名为 Impact-Sensor (Second) Replacement.html
and if(第一个 h2 标签之间没有斜杠)
<h2>Front Seat Belt Replacement</h2>SRS components are located in this area. <a href="./SMG6E00H46400000000DAAT00.html">Review the SRS component locations</a> and the <a href="./SMG6E00H46400000000AAAT00.html">precautions and procedures</a> in the SRS before doing repairs or service.<br><br>NOTE: Check the front seat belts for damage, and replace them if necessary. Be careful not to damage them during removal and installation.<br><br><a name="iR01"></a><h2><b>Front Seat Belt</b></h2>
相应地重命名为 Front Seat Belt Replacement.html
如何在 linux 上执行此操作?
以下命令 returns 所需的文件名 test.html
。
< ./test.html tr -d '\n' | grep -oP -m 1 '(?<=<h2>).*?(?=</h2>)' | head -1 | tr '/' '-'
您可以创建一个 shell 脚本,循环使用它来扫描所有文件,获取新文件名并重命名它们。
for filename in ./input/*.html; do
newname=$(< ${filename} tr -d '\n' | grep -oP -m 1 '(?<=<h2>).*?(?=</h2>)' | head -1 | tr '/' '-')
mv ${filename} "./output/${newname}.html"
done