Jena TDB 是否每次都将所有数据加载到内存中?
Does Jena TDB load all data into memory every time?
我是耶拿的新手。我尝试使用 TDB 处理 Yoga 数据集。数据集大约 200M,每次我 运行 相同的查询,加载数据然后给出结果需要大约 5 分钟。我想知道我是否误解了 TDB 的任何部分?以下是我的代码。
String directory = "tdb";
Dataset dataset = TDBFactory.createDataset(directory);
dataset.begin(ReadWrite.WRITE);
Model tdb = dataset.getDefaultModel();
//String source = "yagoMetaFacts.ttl";
//FileManager.get().readModel(tdb, source);
String queryString = "SELECT DISTINCT ?p WHERE { ?s ?p ?o. }";
Query query = QueryFactory.create(queryString);
try(QueryExecution qexec = QueryExecutionFactory.create(query, tdb)){
ResultSet results = qexec.execSelect();
ResultSetFormatter.out(System.out, results, query) ;
}
dataset.commit();
dataset.end();
有两种方法可以将数据加载到tdb,通过API或CMD。非常感谢@ASKW 和@AndyS
1 通过 API
加载数据
这些代码只需要执行一次,尤其是readModel
行,需要很长时间。
String directory = "tdb";
Dataset dataset = TDBFactory.createDataset(directory);
dataset.begin(ReadWrite.WRITE);
Model tdb = dataset.getDefaultModel();
String source = "yagoMetaFacts.ttl";
FileManager.get().readModel(tdb, source);
dataset.commit(); //Important!! This is to commit the data to tdb.
dataset.end();
数据加载到tdb后,我们可以使用如下代码进行查询。并且不需要再次加载数据。
String directory = "path\to\tdb";
Dataset dataset = TDBFactory.createDataset(directory);
Model tdb = dataset.getDefaultModel();
String queryString = "SELECT DISTINCT ?p WHERE { ?s ?p ?o. }";
Query query = QueryFactory.create(queryString);
try(QueryExecution qexec = QueryExecutionFactory.create(query, tdb)){
ResultSet results = qexec.execSelect();
ResultSetFormatter.out(System.out, results, query) ;
}
2 通过CMD加载数据
加载数据
>tdbloader --loc=path\to\tdb path\to\dataset.ttl
查询
>tdbquery --loc=path\to\tdb --query=q1.rq
q1.rq 是存储查询的文件
应该得到这样的结果
-------------------------------------------------------
| p |
=======================================================
| <http://yago-knowledge.org/resource/hasGloss> |
| <http://yago-knowledge.org/resource/occursSince> |
| <http://yago-knowledge.org/resource/occursUntil> |
| <http://yago-knowledge.org/resource/byTransport> |
| <http://yago-knowledge.org/resource/hasPredecessor> |
| <http://yago-knowledge.org/resource/hasSuccessor> |
| <http://www.w3.org/2000/01/rdf-schema#comment> |
-------------------------------------------------------
我是耶拿的新手。我尝试使用 TDB 处理 Yoga 数据集。数据集大约 200M,每次我 运行 相同的查询,加载数据然后给出结果需要大约 5 分钟。我想知道我是否误解了 TDB 的任何部分?以下是我的代码。
String directory = "tdb";
Dataset dataset = TDBFactory.createDataset(directory);
dataset.begin(ReadWrite.WRITE);
Model tdb = dataset.getDefaultModel();
//String source = "yagoMetaFacts.ttl";
//FileManager.get().readModel(tdb, source);
String queryString = "SELECT DISTINCT ?p WHERE { ?s ?p ?o. }";
Query query = QueryFactory.create(queryString);
try(QueryExecution qexec = QueryExecutionFactory.create(query, tdb)){
ResultSet results = qexec.execSelect();
ResultSetFormatter.out(System.out, results, query) ;
}
dataset.commit();
dataset.end();
有两种方法可以将数据加载到tdb,通过API或CMD。非常感谢@ASKW 和@AndyS
1 通过 API
加载数据这些代码只需要执行一次,尤其是readModel
行,需要很长时间。
String directory = "tdb";
Dataset dataset = TDBFactory.createDataset(directory);
dataset.begin(ReadWrite.WRITE);
Model tdb = dataset.getDefaultModel();
String source = "yagoMetaFacts.ttl";
FileManager.get().readModel(tdb, source);
dataset.commit(); //Important!! This is to commit the data to tdb.
dataset.end();
数据加载到tdb后,我们可以使用如下代码进行查询。并且不需要再次加载数据。
String directory = "path\to\tdb";
Dataset dataset = TDBFactory.createDataset(directory);
Model tdb = dataset.getDefaultModel();
String queryString = "SELECT DISTINCT ?p WHERE { ?s ?p ?o. }";
Query query = QueryFactory.create(queryString);
try(QueryExecution qexec = QueryExecutionFactory.create(query, tdb)){
ResultSet results = qexec.execSelect();
ResultSetFormatter.out(System.out, results, query) ;
}
2 通过CMD加载数据
加载数据
>tdbloader --loc=path\to\tdb path\to\dataset.ttl
查询
>tdbquery --loc=path\to\tdb --query=q1.rq
q1.rq 是存储查询的文件 应该得到这样的结果
-------------------------------------------------------
| p |
=======================================================
| <http://yago-knowledge.org/resource/hasGloss> |
| <http://yago-knowledge.org/resource/occursSince> |
| <http://yago-knowledge.org/resource/occursUntil> |
| <http://yago-knowledge.org/resource/byTransport> |
| <http://yago-knowledge.org/resource/hasPredecessor> |
| <http://yago-knowledge.org/resource/hasSuccessor> |
| <http://www.w3.org/2000/01/rdf-schema#comment> |
-------------------------------------------------------