AWK:如何清理 bibtex 文件?
AWK: how to clean bibtex files?
我有一个 bibtex 文件(从 Zotero 导出),我想通过删除特定字段来清理它。
例如,从以下条目中删除 file 字段:
@inproceedings{sridharan_fast_2008,
title = {Fast {Rates} for {Regularized} {Objectives}.},
urldate = {2014-03-26},
booktitle = {{NIPS}},
author = {Sridharan, Karthik and Shalev-Shwartz, Shai and Srebro, Nathan},
year = {2008},
pages = {1545--1552},
file = {3400-fast-rates-for-regularized-objectives.pdf:/home/johnros/.zotero/zotero/66g0wvis.default/zotero/storage/6ND67P5F/3400-fast-rates-for-regularized-objectives.pdf:application/pdf}
}
您可以使用 grep
轻松做到这一点:
grep -v "^\s*file =" bibtext.txt
前一条记录的尾随逗号应该不是问题...see here。
或者,如果您真的热衷于 awk
:
awk '!/file = /' bibtext.txt
我不熟悉 bibtex
格式,如果有一些工具可以更好地编辑这些格式,你应该选择那些工具。
如果您想使用 awk
来处理它,这里有一个 gnu awk 单行代码:
awk -v RS=',\n\s*file\s*=\s[^\n]*' '7' file
基本上,它只是在玩 RS
变量,删除 file=
行以及前一个结束逗号 ","
,以便保持生成的输出仍然是有效的 bibtex 格式. (我希望是)。
用你的例子测试:
kent$ cat f
@inproceedings{sridharan_fast_2008,
title = {Fast {Rates} for {Regularized} {Objectives}.},
urldate = {2014-03-26},
booktitle = {{NIPS}},
author = {Sridharan, Karthik and Shalev-Shwartz, Shai and Srebro, Nathan},
year = {2008},
pages = {1545--1552},
file = {3400-fast-rates-for-regularized-objectives.pdf:/home/johnros/.zotero/zotero/66g0wvis.default/zotero/storage/6ND67P5F/3400-fast-rates-for-regularized-objectives.pdf:application/pdf}
}
kent$ awk -v RS=',\n\s*file\s*=\s[^\n]*' '7' f
@inproceedings{sridharan_fast_2008,
title = {Fast {Rates} for {Regularized} {Objectives}.},
urldate = {2014-03-26},
booktitle = {{NIPS}},
author = {Sridharan, Karthik and Shalev-Shwartz, Shai and Srebro, Nathan},
year = {2008},
pages = {1545--1552}
}
我知道这是一个较旧的问题,但对于那些仍然发现这个问题的人来说:Zotero (Zotero Better BibTeX) 有一个扩展,它允许您在 Zotero 本身内部执行此操作。完全披露:我是这个扩展的作者。
我有一个 bibtex 文件(从 Zotero 导出),我想通过删除特定字段来清理它。
例如,从以下条目中删除 file 字段:
@inproceedings{sridharan_fast_2008,
title = {Fast {Rates} for {Regularized} {Objectives}.},
urldate = {2014-03-26},
booktitle = {{NIPS}},
author = {Sridharan, Karthik and Shalev-Shwartz, Shai and Srebro, Nathan},
year = {2008},
pages = {1545--1552},
file = {3400-fast-rates-for-regularized-objectives.pdf:/home/johnros/.zotero/zotero/66g0wvis.default/zotero/storage/6ND67P5F/3400-fast-rates-for-regularized-objectives.pdf:application/pdf}
}
您可以使用 grep
轻松做到这一点:
grep -v "^\s*file =" bibtext.txt
前一条记录的尾随逗号应该不是问题...see here。
或者,如果您真的热衷于 awk
:
awk '!/file = /' bibtext.txt
我不熟悉 bibtex
格式,如果有一些工具可以更好地编辑这些格式,你应该选择那些工具。
如果您想使用 awk
来处理它,这里有一个 gnu awk 单行代码:
awk -v RS=',\n\s*file\s*=\s[^\n]*' '7' file
基本上,它只是在玩 RS
变量,删除 file=
行以及前一个结束逗号 ","
,以便保持生成的输出仍然是有效的 bibtex 格式. (我希望是)。
用你的例子测试:
kent$ cat f
@inproceedings{sridharan_fast_2008,
title = {Fast {Rates} for {Regularized} {Objectives}.},
urldate = {2014-03-26},
booktitle = {{NIPS}},
author = {Sridharan, Karthik and Shalev-Shwartz, Shai and Srebro, Nathan},
year = {2008},
pages = {1545--1552},
file = {3400-fast-rates-for-regularized-objectives.pdf:/home/johnros/.zotero/zotero/66g0wvis.default/zotero/storage/6ND67P5F/3400-fast-rates-for-regularized-objectives.pdf:application/pdf}
}
kent$ awk -v RS=',\n\s*file\s*=\s[^\n]*' '7' f
@inproceedings{sridharan_fast_2008,
title = {Fast {Rates} for {Regularized} {Objectives}.},
urldate = {2014-03-26},
booktitle = {{NIPS}},
author = {Sridharan, Karthik and Shalev-Shwartz, Shai and Srebro, Nathan},
year = {2008},
pages = {1545--1552}
}
我知道这是一个较旧的问题,但对于那些仍然发现这个问题的人来说:Zotero (Zotero Better BibTeX) 有一个扩展,它允许您在 Zotero 本身内部执行此操作。完全披露:我是这个扩展的作者。