提取给定节点的所有父节点
Extract all parents of a given node
我正在尝试使用 EBI-RDF sparql endpoint, I was based on two 类似的问题来提取每个给定 GO Id(一个节点)的所有父节点来制定查询,这里有两个例子说明了这个问题:
示例 1 (Link to the structure):
biological_process (GO:0008150)
|__ metabolic process (GO:0008152)
|__ methylation (GO:0032259)
在此示例中,使用以下查询:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
SELECT (count(?mid) as ?depth)
(group_concat(distinct ?midId ; separator = " / ") AS ?treePath)
FROM <http://rdf.ebi.ac.uk/dataset/go>
WHERE {
obo:GO_0032259 rdfs:subClassOf* ?mid .
?mid rdfs:subClassOf* ?class .
?mid <http://www.geneontology.org/formats/oboInOwl#id> ?midId.
}
GROUP BY ?treePath
ORDER BY ?depth
我毫无问题地得到了想要的结果:
c | treePath
--|-------------------------------------
6 | GO:0008150 / GO:0008152 / GO:0032259
但是当术语存在于多个分支中时(例如GO:0007267
),如下例,之前的方法不起作用:
示例 2 (Link to the structure)
biological_process (GO:0008150)
|__ cellular_process (GO:0009987)
| |__ cell communication (GO:0007154)
| |__ cell-cell signaling (GO:0007267)
|
|__ signaling (GO:0023052)
|__ cell-cell signaling (GO:0007267)
结果:
c | treePath
--|---------------------------------------------------------------
15| GO:0007154 / GO:0007267 / GO:0008150 / GO:0009987 / GO:0023052
我想得到的是:
GO:0008150 / GO:0009987 / GO:0007154 / GO:0007267
GO:0008150 / GO:0023052 / GO:0007267
我的理解是,在引擎盖下我正在计算每个级别的深度并使用它来构建路径,当我们有一个仅属于一个分支的元素时,这工作正常。
SELECT (count(?mid) as ?depth) ?midId
FROM <http://rdf.ebi.ac.uk/dataset/go>
WHERE {
obo:GO_0032259 rdfs:subClassOf* ?mid .
?mid rdfs:subClassOf* ?class .
?mid <http://www.geneontology.org/formats/oboInOwl#id> ?midId.
}
GROUP BY ?midId
ORDER BY ?depth
结果:
depth | midId
------|------------
1 | GO:0008150
2 | GO:0008152
3 | GO:0032259
在第二个例子中,事情被遗漏了,我不明白为什么,无论如何我确定问题的一部分是具有相同 depth/level 的术语,但我不知道不知道怎么解决。
depth | midId
------|------------
2 | GO:0008150
2 | GO:0009987
2 | GO:0023052
3 | GO:0007154
6 | GO:0007267
感谢@AKSW,我找到了一个使用 HyperGraphQL (a GraphQL 接口在 Web 上查询和提供链接数据的不错的解决方案。
我会在这里留下详细的答案,它可能会对某人有所帮助。
- 我下载并设置了 HyperGraphQL download page
将其链接到 this tutorial
中所述的 EBI Sparql 端点
我使用的config.json
文件:
{
"name": "ebi-hgql",
"schema": "ebischema.graphql",
"server": {
"port": 8081,
"graphql": "/graphql",
"graphiql": "/graphiql"
},
"services": [
{
"id": "ebi-sparql",
"type": "SPARQLEndpointService",
"url": "http://www.ebi.ac.uk/rdf/services/sparql",
"graph": "http://rdf.ebi.ac.uk/dataset/go",
"user": "",
"password": ""
}
]
}
这是我的 ebischema.graphql
文件的样子(因为我只需要 Class
、id
、label
和 subClassOf
):
type __Context {
Class: _@href(iri: "http://www.w3.org/2002/07/owl#Class")
id: _@href(iri: "http://www.geneontology.org/formats/oboInOwl#id")
label: _@href(iri: "http://www.w3.org/2000/01/rdf-schema#label")
subClassOf: _@href(iri: "http://www.w3.org/2000/01/rdf-schema#subClassOf")
}
type Class @service(id:"ebi-sparql") {
id: [String] @service(id:"ebi-sparql")
label: [String] @service(id:"ebi-sparql")
subClassOf: [Class] @service(id:"ebi-sparql")
}
我开始测试一些简单的查询,但不断得到空响应; this issue 的答案解决了我的问题。
最后我构建了获取树的查询
使用这个查询:
{
Class_GET_BY_ID(uris:[
"http://purl.obolibrary.org/obo/GO_0032259",
"http://purl.obolibrary.org/obo/GO_0007267"]) {
id
label
subClassOf {
id
label
subClassOf {
id
label
}
}
}
}
我得到了一些有趣的结果:
{
"extensions": {},
"data": {
"@context": {
"_type": "@type",
"_id": "@id",
"id": "http://www.geneontology.org/formats/oboInOwl#id",
"label": "http://www.w3.org/2000/01/rdf-schema#label",
"Class_GET_BY_ID": "http://hypergraphql.org/query/Class_GET_BY_ID",
"subClassOf": "http://www.w3.org/2000/01/rdf-schema#subClassOf"
},
"Class_GET_BY_ID": [
{
"id": [
"GO:0032259"
],
"label": [
"methylation"
],
"subClassOf": [
{
"id": [
"GO:0008152"
],
"label": [
"metabolic process"
],
"subClassOf": [
{
"id": [
"GO:0008150"
],
"label": [
"biological_process"
]
}
]
}
]
},
{
"id": [
"GO:0007267"
],
"label": [
"cell-cell signaling"
],
"subClassOf": [
{
"id": [
"GO:0007154"
],
"label": [
"cell communication"
],
"subClassOf": [
{
"id": [
"GO:0009987"
],
"label": [
"cellular process"
]
}
]
},
{
"id": [
"GO:0023052"
],
"label": [
"signaling"
],
"subClassOf": [
{
"id": [
"GO:0008150"
],
"label": [
"biological_process"
]
}
]
}
]
}
]
},
"errors": []
}
编辑
这正是我想要的,但我注意到我不能像这样添加另一个子级别:
{
Class_GET_BY_ID(uris:[
"http://purl.obolibrary.org/obo/GO_0032259",
"http://purl.obolibrary.org/obo/GO_0007267"]) {
id
label
subClassOf {
id
label
subClassOf {
id
label
subClassOf { # <--- 4th sublevel
id
label
}
}
}
}
}
我创建了一个新问题:Endpoint returned Content-Type: text/html which is not recognized for SELECT queries
我正在尝试使用 EBI-RDF sparql endpoint, I was based on
示例 1 (Link to the structure):
biological_process (GO:0008150)
|__ metabolic process (GO:0008152)
|__ methylation (GO:0032259)
在此示例中,使用以下查询:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
SELECT (count(?mid) as ?depth)
(group_concat(distinct ?midId ; separator = " / ") AS ?treePath)
FROM <http://rdf.ebi.ac.uk/dataset/go>
WHERE {
obo:GO_0032259 rdfs:subClassOf* ?mid .
?mid rdfs:subClassOf* ?class .
?mid <http://www.geneontology.org/formats/oboInOwl#id> ?midId.
}
GROUP BY ?treePath
ORDER BY ?depth
我毫无问题地得到了想要的结果:
c | treePath
--|-------------------------------------
6 | GO:0008150 / GO:0008152 / GO:0032259
但是当术语存在于多个分支中时(例如GO:0007267
),如下例,之前的方法不起作用:
示例 2 (Link to the structure)
biological_process (GO:0008150)
|__ cellular_process (GO:0009987)
| |__ cell communication (GO:0007154)
| |__ cell-cell signaling (GO:0007267)
|
|__ signaling (GO:0023052)
|__ cell-cell signaling (GO:0007267)
结果:
c | treePath
--|---------------------------------------------------------------
15| GO:0007154 / GO:0007267 / GO:0008150 / GO:0009987 / GO:0023052
我想得到的是:
GO:0008150 / GO:0009987 / GO:0007154 / GO:0007267
GO:0008150 / GO:0023052 / GO:0007267
我的理解是,在引擎盖下我正在计算每个级别的深度并使用它来构建路径,当我们有一个仅属于一个分支的元素时,这工作正常。
SELECT (count(?mid) as ?depth) ?midId
FROM <http://rdf.ebi.ac.uk/dataset/go>
WHERE {
obo:GO_0032259 rdfs:subClassOf* ?mid .
?mid rdfs:subClassOf* ?class .
?mid <http://www.geneontology.org/formats/oboInOwl#id> ?midId.
}
GROUP BY ?midId
ORDER BY ?depth
结果:
depth | midId
------|------------
1 | GO:0008150
2 | GO:0008152
3 | GO:0032259
在第二个例子中,事情被遗漏了,我不明白为什么,无论如何我确定问题的一部分是具有相同 depth/level 的术语,但我不知道不知道怎么解决。
depth | midId
------|------------
2 | GO:0008150
2 | GO:0009987
2 | GO:0023052
3 | GO:0007154
6 | GO:0007267
感谢@AKSW,我找到了一个使用 HyperGraphQL (a GraphQL 接口在 Web 上查询和提供链接数据的不错的解决方案。
我会在这里留下详细的答案,它可能会对某人有所帮助。
- 我下载并设置了 HyperGraphQL download page
将其链接到 this tutorial
中所述的 EBI Sparql 端点我使用的
config.json
文件:{ "name": "ebi-hgql", "schema": "ebischema.graphql", "server": { "port": 8081, "graphql": "/graphql", "graphiql": "/graphiql" }, "services": [ { "id": "ebi-sparql", "type": "SPARQLEndpointService", "url": "http://www.ebi.ac.uk/rdf/services/sparql", "graph": "http://rdf.ebi.ac.uk/dataset/go", "user": "", "password": "" } ] }
这是我的
ebischema.graphql
文件的样子(因为我只需要Class
、id
、label
和subClassOf
):type __Context { Class: _@href(iri: "http://www.w3.org/2002/07/owl#Class") id: _@href(iri: "http://www.geneontology.org/formats/oboInOwl#id") label: _@href(iri: "http://www.w3.org/2000/01/rdf-schema#label") subClassOf: _@href(iri: "http://www.w3.org/2000/01/rdf-schema#subClassOf") } type Class @service(id:"ebi-sparql") { id: [String] @service(id:"ebi-sparql") label: [String] @service(id:"ebi-sparql") subClassOf: [Class] @service(id:"ebi-sparql") }
我开始测试一些简单的查询,但不断得到空响应; this issue 的答案解决了我的问题。
最后我构建了获取树的查询
使用这个查询:
{ Class_GET_BY_ID(uris:[ "http://purl.obolibrary.org/obo/GO_0032259", "http://purl.obolibrary.org/obo/GO_0007267"]) { id label subClassOf { id label subClassOf { id label } } } }
我得到了一些有趣的结果:
{ "extensions": {}, "data": { "@context": { "_type": "@type", "_id": "@id", "id": "http://www.geneontology.org/formats/oboInOwl#id", "label": "http://www.w3.org/2000/01/rdf-schema#label", "Class_GET_BY_ID": "http://hypergraphql.org/query/Class_GET_BY_ID", "subClassOf": "http://www.w3.org/2000/01/rdf-schema#subClassOf" }, "Class_GET_BY_ID": [ { "id": [ "GO:0032259" ], "label": [ "methylation" ], "subClassOf": [ { "id": [ "GO:0008152" ], "label": [ "metabolic process" ], "subClassOf": [ { "id": [ "GO:0008150" ], "label": [ "biological_process" ] } ] } ] }, { "id": [ "GO:0007267" ], "label": [ "cell-cell signaling" ], "subClassOf": [ { "id": [ "GO:0007154" ], "label": [ "cell communication" ], "subClassOf": [ { "id": [ "GO:0009987" ], "label": [ "cellular process" ] } ] }, { "id": [ "GO:0023052" ], "label": [ "signaling" ], "subClassOf": [ { "id": [ "GO:0008150" ], "label": [ "biological_process" ] } ] } ] } ] }, "errors": [] }
编辑
这正是我想要的,但我注意到我不能像这样添加另一个子级别:
{
Class_GET_BY_ID(uris:[
"http://purl.obolibrary.org/obo/GO_0032259",
"http://purl.obolibrary.org/obo/GO_0007267"]) {
id
label
subClassOf {
id
label
subClassOf {
id
label
subClassOf { # <--- 4th sublevel
id
label
}
}
}
}
}
我创建了一个新问题:Endpoint returned Content-Type: text/html which is not recognized for SELECT queries