在没有图形对象的情况下使用 python rdflib 解析器
Using python rdflib parsers without the graph object
在 Python 中加载 RDF 数据如下所示:
from rdflib import Graph
g = Graph()
g.parse("demo.nt", format="nt")
但是如何使用独立的格式解析器作为流式解析器,获取已解析的令牌流呢?有人可以给我一个 hint/code 例子吗?
NT 解析器遵循使用 "sink"(不是汇点)在解析时存储三元组的范例。我相信您正在寻找的标记实际上是三元组,因为默认 NT Parser uses the NTriplesParser.
您可以使用与下面示例相同的方法 override NTSink。
此示例将在每次解析一行时加载 test NT formatted file 然后 print
一行文本。您可以执行缺少的方法,而不是打印一行。
example.py
: 要求同一目录中的文件名为 ./anons-01.nt
from rdflib.plugins.parsers.ntriples import NTriplesParser, Sink
# The NTriplesParser is what is used for a format="nt" parsing as found:
# https://github.com/RDFLib/rdflib/blob/395a40101fe133d97f454ee61da0fc748a93b007/rdflib/plugins/parsers/nt.py#L2
# Example NT file from:
# https://github.com/RDFLib/rdflib/blob/395a40101fe133d97f454ee61da0fc748a93b007/test/nt/anons-01.nt
class StreamSink(Sink):
"""
A sink is used to store the results of parsing, this almost matches the sink
example shown in ntriples:
https://github.com/RDFLib/rdflib/blob/395a40101fe133d97f454ee61da0fc748a93b007/rdflib/plugins/parsers/ntriples.py#L43
"""
def triple(self, s, p, o):
self.length += 1
print "Stream of triples s={s}, p={p}, o={o}".format(s=s, p=p, o=o)
if __name__ == "__main__":
# Create a new parser and try to parse the example NT file.
n = NTriplesParser(StreamSink())
with open("./anons-01.nt", "r") as anons:
n.parse(anons)
output
:
Stream of triples s=N33bb017ce2c340999d2aa6a071d79678, p=http://example.org/#p, o=http://example.org/#q
Stream of triples s=N33bb017ce2c340999d2aa6a071d79678, p=http://example.org/#r, o=http://example.org/#s
Stream of triples s=Nb8d195e0586f42c4bcc703be897c74fa, p=http://example.org/#p, o=http://example.org/#q
Stream of triples s=Nb8d195e0586f42c4bcc703be897c74fa, p=http://example.org/#r, o=N235a8c8b4f91453892da284cb0c490e0
Stream of triples s=N235a8c8b4f91453892da284cb0c490e0, p=http://example.org/#s, o=http://example.org/#t
在 Python 中加载 RDF 数据如下所示:
from rdflib import Graph
g = Graph()
g.parse("demo.nt", format="nt")
但是如何使用独立的格式解析器作为流式解析器,获取已解析的令牌流呢?有人可以给我一个 hint/code 例子吗?
NT 解析器遵循使用 "sink"(不是汇点)在解析时存储三元组的范例。我相信您正在寻找的标记实际上是三元组,因为默认 NT Parser uses the NTriplesParser.
您可以使用与下面示例相同的方法 override NTSink。
此示例将在每次解析一行时加载 test NT formatted file 然后 print
一行文本。您可以执行缺少的方法,而不是打印一行。
example.py
: 要求同一目录中的文件名为 ./anons-01.nt
from rdflib.plugins.parsers.ntriples import NTriplesParser, Sink
# The NTriplesParser is what is used for a format="nt" parsing as found:
# https://github.com/RDFLib/rdflib/blob/395a40101fe133d97f454ee61da0fc748a93b007/rdflib/plugins/parsers/nt.py#L2
# Example NT file from:
# https://github.com/RDFLib/rdflib/blob/395a40101fe133d97f454ee61da0fc748a93b007/test/nt/anons-01.nt
class StreamSink(Sink):
"""
A sink is used to store the results of parsing, this almost matches the sink
example shown in ntriples:
https://github.com/RDFLib/rdflib/blob/395a40101fe133d97f454ee61da0fc748a93b007/rdflib/plugins/parsers/ntriples.py#L43
"""
def triple(self, s, p, o):
self.length += 1
print "Stream of triples s={s}, p={p}, o={o}".format(s=s, p=p, o=o)
if __name__ == "__main__":
# Create a new parser and try to parse the example NT file.
n = NTriplesParser(StreamSink())
with open("./anons-01.nt", "r") as anons:
n.parse(anons)
output
:
Stream of triples s=N33bb017ce2c340999d2aa6a071d79678, p=http://example.org/#p, o=http://example.org/#q
Stream of triples s=N33bb017ce2c340999d2aa6a071d79678, p=http://example.org/#r, o=http://example.org/#s
Stream of triples s=Nb8d195e0586f42c4bcc703be897c74fa, p=http://example.org/#p, o=http://example.org/#q
Stream of triples s=Nb8d195e0586f42c4bcc703be897c74fa, p=http://example.org/#r, o=N235a8c8b4f91453892da284cb0c490e0
Stream of triples s=N235a8c8b4f91453892da284cb0c490e0, p=http://example.org/#s, o=http://example.org/#t