带有 Fuseki 运行 的 RDFlib 在本地服务器上查询非常慢

RDFlib with Fuseki running queries very slowly on local server

我有一个小型 wsgi 应用程序,运行 一个本地 Cherry 服务器,我使用 RDFlib 将人类语言查询转换为 SPARQL 查询,以查询加载到 fuseki 的 ttl 文件。它有效,但速度非常慢!这个脚本的前一个版本直接接受 SPARQL 查询,所以我没有使用 RDFlib,而且运行速度非常快!是不是我使用 RDFlib 的方式有问题导致它这么慢??

from abc import ABCMeta, abstractmethod, abstractproperty
from collections import OrderedDict
from threading import Thread
from time import sleep
from cherrypy import engine
from cherrypy.wsgiserver import CherryPyWSGIServer
from werkzeug.wsgi import DispatcherMiddleware
from werkzeug.debug import DebuggedApplication
from werkzeug.wsgi import SharedDataMiddleware
from werkzeug.wrappers import Response, Request
from requests import get, post, RequestException
from jinja2 import Environment, FileSystemLoader
import os
from SPARQLWrapper import SPARQLWrapper, JSON
import rdflib


__author__ = 'authorname'

templates_dir = os.path.abspath('templates')
static_dir = os.path.abspath('static')

class RdfDemoApp(object):
    def __init__(self, sparql_endpoint_address):
        self._sparql_endpoint_address = sparql_endpoint_address
        self._jinja_env = Environment(loader=FileSystemLoader(templates_dir), autoescape=True)

    def render_template(self, template, **params):
        t = self._jinja_env.get_template(template)
        return t.render(params)

    def _app(self, environ, start_response):
        request = Request(environ)
        if 'query' in request.args.keys():
            query_string = "'"+request.args['query']+"'"
            print query_string

            results = g.query("""PREFIX pers: <http://blabla.com/Register/schemas/persons/>
            SELECT ?person ?sibling ?sibforname
            WHERE { 
               ?person pers:name ?name .
                 ?name  pers:forename """+query_string+""" .
                ?person pers:siblingOf ?sibling .
                ?sibling pers:name ?sibname .
                ?sibname pers:forename ?sibforname .
                 ?sibname pers:type "std"  } """)#, format = "JSON")
            for row in results:
                print row
            header = []
            i=0
            for item in results:
                while i in range(len(item)):
                    for x in item:
                        header.append(x)
                        i+=1
            quer = query_string
            response = Response(self.render_template('results_rdf.html', results=results, header = header, query = quer, static_dir = static_dir ), mimetype='text/html')

        else:
            response = Response(self.render_template('form_rdf.html'), mimetype='text/html')

        return response(environ, start_response)

    def __call__(self, environ, start_response):
        app = SharedDataMiddleware(DebuggedApplication(self._app, evalex=True), {
            '/static': static_dir
        })
        return app(environ, start_response)

if __name__ == '__main__':
    g=rdflib.Graph()
    g.parse("/Users/username/Documents/pers_file_new.ttl", format='n3')
    wsgi_app = RdfDemoApp("http://localhost:3030/ds/query")
    try:
        server = CherryPyWSGIServer(('127.0.0.1',10001), wsgi_app)
        server.start()

    except KeyboardInterrupt:
        server.stop()
        print "Logged out"

我不确定 pers_file_new.ttl 中的数据集有多大,但它变慢的一个原因是您正在使用 RDFLib 将其全部读入内存。

g.parse("/Users/username/Documents/pers_file_new.ttl", format='n3')

使用您当前的代码,您查询的不是 Fuseki,而是内存中的 RDFLib 图。您可以按照 SPARQLWrapper 主页中的示例进行操作。它非常接近你想要做的事情。

http://rdflib.github.io/sparqlwrapper/