为什么 Rexster Server(和 Titan)停止响应?

Why does Rexster Server (and Titan) stop responding?

设置

我正在使用 Titan Rexster (titan-server-0.4.4.zip) 和 Elasticsearch 后端在 Ubuntu 12.4 服务器上实施推荐系统 运行ning。为了连接到 Rexster 服务器,我使用 python.

的 Bulbflow 库

Beta 似乎 运行 可以正常使用 3 周,但负载 "increasing"(只有几个用户 ~10),Rexster 服务器停止响应。不知道是我的rexster配置有误还是没有正确使用Bulbflow库

Rexster / Titan 配置

这是我的 rexster-cassandra-es.xml:

    <?xml version="1.0" encoding="UTF-8"?>
    <rexster>
        <http>
            <server-port>8182</server-port>
            <server-host>0.0.0.0</server-host>
            <base-uri>http://MY_IP</base-uri>
            <web-root>public</web-root>
            <character-set>UTF-8</character-set>
            <enable-jmx>false</enable-jmx>
            <enable-doghouse>true</enable-doghouse>
            <max-post-size>2097152</max-post-size>
            <max-header-size>8192</max-header-size>
            <upload-timeout-millis>30000</upload-timeout-millis>
            <thread-pool>
                <worker>
                    <core-size>20</core-size>
                    <max-size>40</max-size>
                </worker>
                <kernal>
                    <core-size>10</core-size>
                    <max-size>20</max-size>
                </kernal>
            </thread-pool>
            <io-strategy>leader-follower</io-strategy>
        </http>
        <rexpro>
            <server-port>8184</server-port>
            <server-host>0.0.0.0</server-host>
            <session-max-idle>1790000</session-max-idle>
            <session-check-interval>3000000</session-check-interval>
            <connection-max-idle>180000</connection-max-idle>
            <connection-check-interval>3000000</connection-check-interval>
            <enable-jmx>false</enable-jmx>
            <thread-pool>
                <worker>
                    <core-size>8</core-size>
                    <max-size>8</max-size>
                </worker>
                <kernal>
                    <core-size>4</core-size>
                    <max-size>4</max-size>
                </kernal>
            </thread-pool>
            <io-strategy>leader-follower</io-strategy>
        </rexpro>
        <shutdown-port>8183</shutdown-port>
        <shutdown-host>127.0.0.1</shutdown-host>
        <script-engines>
            <script-engine>
                <name>gremlin-groovy</name>
                <reset-threshold>-1</reset-threshold>
                <imports>com.tinkerpop.gremlin.*,com.tinkerpop.gremlin.java.*,com.tinkerpop.gremlin.pipes.filter.*,com.tinkerpop.gremlin.pipes.sideeffect.*,com.tinkerpop.gremlin.pipes.transform.*,com.tinkerpop.blueprints.*,com.tinkerpop.blueprints.impls.*,com.tinkerpop.blueprints.impls.tg.*,com.tinkerpop.blueprints.impls.neo4j.*,com.tinkerpop.blueprints.impls.neo4j.batch.*,com.tinkerpop.blueprints.impls.orient.*,com.tinkerpop.blueprints.impls.orient.batch.*,com.tinkerpop.blueprints.impls.dex.*,com.tinkerpop.blueprints.impls.rexster.*,com.tinkerpop.blueprints.impls.sail.*,com.tinkerpop.blueprints.impls.sail.impls.*,com.tinkerpop.blueprints.util.*,com.tinkerpop.blueprints.util.io.*,com.tinkerpop.blueprints.util.io.gml.*,com.tinkerpop.blueprints.util.io.graphml.*,com.tinkerpop.blueprints.util.io.graphson.*,com.tinkerpop.blueprints.util.wrappers.*,com.tinkerpop.blueprints.util.wrappers.batch.*,com.tinkerpop.blueprints.util.wrappers.batch.cache.*,com.tinkerpop.blueprints.util.wrappers.event.*,com.tinkerpop.blueprints.util.wrappers.event.listener.*,com.tinkerpop.blueprints.util.wrappers.id.*,com.tinkerpop.blueprints.util.wrappers.partition.*,com.tinkerpop.blueprints.util.wrappers.readonly.*,com.tinkerpop.blueprints.oupls.sail.*,com.tinkerpop.blueprints.oupls.sail.pg.*,com.tinkerpop.blueprints.oupls.jung.*,com.tinkerpop.pipes.*,com.tinkerpop.pipes.branch.*,com.tinkerpop.pipes.filter.*,com.tinkerpop.pipes.sideeffect.*,com.tinkerpop.pipes.transform.*,com.tinkerpop.pipes.util.*,com.tinkerpop.pipes.util.iterators.*,com.tinkerpop.pipes.util.structures.*,org.apache.commons.configuration.*,com.thinkaurelius.titan.core.*,com.thinkaurelius.titan.core.attribute.*,com.thinkaurelius.titan.core.util.*,com.thinkaurelius.titan.example.*,org.apache.commons.configuration.*,com.tinkerpop.gremlin.Tokens.T,com.tinkerpop.gremlin.groovy.*</imports>
            <static-imports>com.tinkerpop.blueprints.Direction.*,com.tinkerpop.blueprints.TransactionalGraph$Conclusion.*,com.tinkerpop.blueprints.Compare.*,com.thinkaurelius.titan.core.attribute.Geo.*,com.thinkaurelius.titan.core.attribute.Text.*,com.thinkaurelius.titan.core.TypeMaker$UniquenessConsistency.*,com.tinkerpop.blueprints.Query$Compare.*</static-imports>
            </script-engine>
        </script-engines>
        <security>
            <authentication>
                <type>none</type>
                <configuration>
                    <users>
                        <user>
                            <username>rexster</username>
                            <password>rexster</password>
                        </user>
                    </users>
                </configuration>
            </authentication>
        </security>
        <metrics>
            <reporter>
                <type>jmx</type>
            </reporter>
            <reporter>
                <type>http</type>
            </reporter>
            <reporter>
                <type>console</type>
                <properties>
                    <rates-time-unit>SECONDS</rates-time-unit>
                    <duration-time-unit>SECONDS</duration-time-unit>
                    <report-period>10</report-period>
                    <report-time-unit>MINUTES</report-time-unit>
                    <includes>http.rest.*</includes>
                    <excludes>http.rest.*.delete</excludes>
                </properties>
            </reporter>
        </metrics>
        <graphs>
            <graph>
                <graph-name>newspaper</graph-name>
                <graph-type>com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration</graph-type>
                <!-- <graph-location>/tmp/titan</graph-location> -->
                <graph-read-only>false</graph-read-only>
                <properties>
                    <storage.backend>cassandra</storage.backend>
                    <storage.index.search.backend>elasticsearch</storage.index.search.backend>
                    <storage.index.search.hostname>localhost</storage.index.search.hostname>
                    <storage.index.search.client-only>true</storage.index.search.client-only>
                    <storage.index.search.local-mode>false</storage.index.search.local-mode>
                </properties>
                <extensions>
                  <allows>
                    <allow>tp:gremlin</allow>
                  </allows>
                </extensions>
            </graph>
        </graphs>
    </rexster>

我已经更改了 worker 和 kernal 的线程池的核心大小和最大大小,如果不进行更改,Rexster 服务器会挂起/不响应甚至更快。

核心大小和最大大小的合适值是多少?

Bulbflow 用法

为了使用 bulbflow,我每次需要执行请求时都会创建一个新的 Graph 对象。有很多请求,所以创建这些对象的频率很高。

我真的应该为每个新请求创建一个新的 Graph 对象吗?

是否可以只创建一个图形对象并在每次向图形数据库发送新请求时使用它,或者我是否 运行 遇到会话问题?

错误信息

当一切都卡住并且我强行终止程序 (ctrl-c) 时,我得到以下堆栈跟踪:

Exception happened during processing of request from ('my_ip', 57489)
Traceback (most recent call last):
  File "/usr/lib/python2.7/SocketServer.py", line 284, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 310, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 323, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python2.7/SocketServer.py", line 638, in __init__
    self.handle()
  File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/werkzeug/serving.py", line 200, in handle
    rv = BaseHTTPRequestHandler.handle(self)
  File "/usr/lib/python2.7/BaseHTTPServer.py", line 340, in handle
    self.handle_one_request()
  File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/werkzeug/serving.py", line 235, in handle_one_request
    return self.run_wsgi()
  File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/werkzeug/serving.py", line 177, in run_wsgi
    execute(self.server.app)
  File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/werkzeug/serving.py", line 165, in execute
    application_iter = app(environ, start_response)
  File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/user/dir/recommender/project/api/start.py", line 65, in put_user
    graphdb.insert_user(user_id)
  File "project/api/graphdb.py", line 14, in insert_user
    user_with_id = g.users.index.lookup(user_sqlid=user_id)
  File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/bulbs/titan/index.py", line 270, in lookup
    resp = self.client.lookup_vertex(self.index_name,key,value)
  File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/bulbs/titan/client.py", line 348, in lookup_vertex
    return self.request.get(path,params)
  File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/bulbs/rest.py", line 101, in get
    return self.request(GET, path, params)
  File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/bulbs/rest.py", line 184, in request
    http_resp = self.http.request(uri, method, body, headers)
  File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1593, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1335, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1291, in _conn_request
    response = conn.getresponse()
  File "/usr/lib/python2.7/httplib.py", line 1030, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 407, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
    line = self.fp.readline()
  File "/usr/lib/python2.7/socket.py", line 430, in readline
    data = recv(1)

恢复

为了恢复,我必须关闭 rexster / titan 并重新启动它。每当我停止 Rexster 服务器 (./bin/titan -c cassandra-es stop) 时,我都会收到以下输出:

Killing Titan + Rexster (pid 26779)...
Rexster shutdown timeout exceeded (60 seconds)
Killing Cassandra (pid 26201)...

Rexster 完全卡住了。

期待收到一些有用的指导。

首先,我建议您迁移到 titan 1.0 版本,因为 Rexster 已被 Gremlin Server 取代,并进行了一些重大更改。 如果仍然需要使用 titan 0.4.4 版本,那么我会说尝试 运行 它作为一项服务。会话结束可能会导致实例终止所有作业。

检查以下文档。 Gremlin Server Documentation

Titan 邮件列表中的以下主题可能对您有用:Rexster REST API stops responding。但是,我认为他们从未设法为 Titan 解决这个问题,而且 Rexster 开发人员无法重现它。

也就是说,我强烈建议升级到 Titan v1.0.0,它使用 TinkerPop 3.0+ Gremlin 服务器而不是 TinkerPop 2.x Rexster。您将获得更少的错误、更多的功能,尤其是更具表现力的 Gremlin 查询(请参阅 TinkerPop 3.0.1 documentation used by Titan v1.0.0)。 Titan v0.4.4 是一个非常老的版本,我认为解决这个特定问题不值得,特别是如果您不熟悉图形。