2 个数据集 + 文本索引的 Fuseki 配置：如何使用海龟文件？

Question

我是 fuseki 的新手，想为我们的项目使用 2 个 TDB 数据集：一个小的用于我们自己的数据，一个大的（168 M 三元组，从 http://data.bnf.fr 导入数据）。

我们需要索引数据，因为使用 "FILTER(CONTAINS())" 的 SPARQL 查询不适用于大型数据集 ("BnF_text")。因此，我已经为 "BnF_text" 建立了一个文本索引，在 post 之后：（但我必须修改 turtle 配置文件才能使 text:query 正常工作）。

有效，但我在 "BnF_text" 遇到了一个奇怪的问题：有时，相同的查询 returns 超时，我在 fuseki 日志中看不到查找错误也没有 apache 日志。

~~~~~~~ 这是我的问题：~~~~~~~

我的配置文件有问题吗？
性能是否受 2 个数据集共存的影响？

~~~~~~~ 下面是我的安装细节：~~~~~~~

修改Java脚本中的内存限制fuseki-server：设置为--Xmx4000M。
SPARQL 查询通过 PHP EasyRDF 库
我有 2 个配置文件：$FUSEKI_PATH/text_config.ttl + $FUSEKI_PATH/run/configuration/MY_DATASET.ttl
I 运行 fuseki-server 使用此命令：./fuseki-server --config text_config.ttl

配置文件

1) text_config.ttl

@prefix :        <#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dcterms: <http://purl.org/dc/terms/> .

## Initialize TDB --------------------------------

[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

## Initialize text query -------------------------------------
[] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
# Lucene index
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

## ---------------------------------------------------------------
## This URI must be fixed - it's used to assemble the text dataset.

:text_dataset rdf:type     text:TextDataset ;

    text:dataset :tdb_dataset_readwrite ;
    text:index     <#indexLucene> ;
    .

# A TDB datset used for RDF storage ------------------------------
:tdb_dataset_readwrite                    # <= EDIT : instead of <#dataset>  
        a             tdb:DatasetTDB ;
        tdb:location  "TDB_PATH" ;
.

# Text index description ------------------------------------------
<#indexLucene> a text:TextIndexLucene ;
    text:directory <file:LUCENE_PATH> ;
    text:entityMap <#entMap> ;
    text:storeValues true ;
    .

# Mapping in the index ---------------------------------------------
# URI stored in field "uri" 
<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;
    text:map (
         [ text:field "text" ; text:predicate dcterms:title ]
         [ text:field "text" ; text:predicate foaf:familyName ]
         [ text:field "text" ; text:predicate foaf:name ]
         ) .

# Fuseki services (http) --------------------------------------------- 

# EDIT : added following lines

:service_tdb_all  a                   fuseki:Service ;
        rdfs:label                    "TDB BnF_text" ;
        fuseki:dataset                :text_dataset ; ### 
        fuseki:name                   "BnF_text" ;
        fuseki:serviceQuery           "query" , "sparql" ;
        fuseki:serviceReadGraphStore  "get" ;
        fuseki:serviceReadWriteGraphStore " .

2) MY_DATASET.ttl

@prefix :      <http://base/#> .
@prefix tdb:   <http://jena.hpl.hp.com/2008/tdb#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .

:service_tdb_all  a                   fuseki:Service ;
        rdfs:label                    "TDB MY_DATASET" ;
        fuseki:dataset                :tdb_dataset_readwrite ;
        fuseki:name                   "MY_DATASET" ;
        fuseki:serviceQuery           "query" , "sparql" ;
        fuseki:serviceReadGraphStore  "get" ;
        fuseki:serviceReadWriteGraphStore
                "data" ;
        fuseki:serviceUpdate          "update" ;
        fuseki:serviceUpload          "upload" .

:tdb_dataset_readwrite
        a             tdb:DatasetTDB ;
        tdb:location  "MY_DATASET_TDB_PATH" .

提前致谢

Answer 1

谢谢安迪，你是对的。问题来自 EasyRDF 而不是 Fuseki。我发现了这个：https://groups.google.com/d/msg/skosmos-users/WhtZwnsxOFs/MtAocr8vDgAJ，所以在 vendor/easyrdf/easyrdf/lib/EasyRdf/Http/Client.php 中更改了超时，现在一切似乎都正常了。我将再进行一些测试，然后尝试将问题标记为已解决。

编辑：'everything seems to be ok now' = 来自 EasyRdf_Exception 的 "timeout" 消息已消失

2 个数据集 + 文本索引的 Fuseki 配置：如何使用海龟文件？

Fuseki config for 2 datasets + text index : how to use turtle files?

lucene

jena

fuseki

tdb

easyrdf