tarql 中带空格的列名

Question

我正在使用 tarql (https://github.com/tarql/tarql) - 使用 sparql 语法 - 将 CSV 数据转换为 RDF 三元组。

我有一个列名 "web site"。如何使用 BIND 函数绑定到变量？我尝试了很多方法，但我没有找到解决方案：

BIND (?web site AS ?homepage)
BIND (?"web site" AS ?homepage)
BIND (?'web site' AS ?homepage)
BIND (?web\ site AS ?homepage)

全部导致解析错误。

Answer 1

当你不得不处理复杂的情况时，我的建议是：先尝试探索性测试；让我们举个例子：

假设您的源数据文件是：./table/table.csv 其中包含：

main index;web site;title, to translate
1;"ciao.ronda.com";"this is the first"
2;"miao.ronda.it";"this is the second"
3;"bao.ronda.uk";"this is the third"

step1: 探索性测试查询 `qstar.sparql`:

SELECT *
  FROM <file:table.csv#delimiter=%3B;>
  WHERE {}
  LIMIT 100

兰彻示例：

#!/bin/bash -
table=./data/table.csv
query=./data/qstar.sparql 
./bin/tarql --test  --delimiter \; --header-row --verbose ${query} ${table}

结果：

 $ ./launcher0.sh
--------------------------------------------------------
| main_index | web_site         | title,_to_translate  |
========================================================
| "1"        | "ciao.ronda.com" | "this is the first"  |
| "2"        | "miao.ronda.it"  | "this is the second" |
| "3"        | "bao.ronda.uk"   | "this is the third"  |
--------------------------------------------------------

现在我们知道用这些选项计算的第三列变量名是：title,_to_translate

step2: 测试 BIND 语句的语法是否支持 proceeds 变量名（`title,_to_translate` 在我们的例子中）

这里我们需要一个基于示例 BIND 的查询来理解问题；假设这是我们尝试使用名为以下字段的查询：?title,_to_translate

SELECT ?homepage ?uri ?title_with_language_tag
  WHERE {
    BIND (?web_site AS ?homepage)
    BIND (URI(CONCAT('http://website.com/ns#', ?main_index)) AS ?uri)
    BIND (STRLANG(?title,_to_translate, 'en') AS ?title_with_language_tag)
  }

结果：

 $ ./launcher0.sh
com.hp.hpl.jena.query.QueryParseException: Lexical error at line 5, column 27.  Encountered: "t" (116), after : "_"
    at org.deri.tarql.TarqlParser.parse(TarqlParser.java:113)

简而言之，此查询包含词法错误，ena.query.QueryParser

不支持

在这种情况下，与其继续与语言作斗争，不如采用一些变通方法

第 3 步：带有一些变通方法的解决方案

让我们利用选项 -H --no-header-row CSV file has no header row; use variable names ?a, ?b, ... 并享受一个简单的解决方案；我们需要做的就是从我们的源数据文件的内容中删除第一行标题（这是一项简单的任务，您可以通过管道传输到流程或按照您喜欢的方式执行）为了方便测试我复制了没有第一列的数据 ./data/table0-noheader.csv.

现在相同的查询对于解析器来说变得更容易了； ./data/query0.sparql:

SELECT ?homepage ?uri ?title_with_language_tag
  WHERE {
    BIND (?a AS ?homepage)
    BIND (URI(CONCAT('http://website.com/ns#', ?b)) AS ?uri)
    BIND (STRLANG(?c, 'en') AS ?title_with_language_tag)
  }

launcher-noheader.sh:

!/bin/bash -
table=./data/table0-noheader.csv
query=./data/query0.sparql 
./bin/tarql --test  --no-header-row --delimiter \; --header-row --verbose ${query} ${table}

结果：

 $ ./launcher-noheader.sh 
-------------------------------------------------------------------------------
| homepage | uri                                    | title_with_language_tag |
===============================================================================
| "1"      | <http://website.com/ns#ciao.ronda.com> | "this is the first"@en  |
| "2"      | <http://website.com/ns#miao.ronda.it>  | "this is the second"@en |
| "3"      | <http://website.com/ns#bao.ronda.uk>   | "this is the third"@en  |
-------------------------------------------------------------------------------

备注

参考文档： Header row, delimiters, quotes and character encoding in CSV/TSV files 列出了表达选项的所有可能方式和组合：值得一读。
另一个有用的参考可能是： Possible names for variables SPARQL 1.1 查询语言

tarql 中带空格的列名

Column name with spaces in tarql

csv

rdf

sparql

triples

step1: 探索性测试查询 `qstar.sparql`:

step2: 测试 BIND 语句的语法是否支持 proceeds 变量名（`title,_to_translate` 在我们的例子中）

第 3 步：带有一些变通方法的解决方案

tarql 中带空格的列名

Column name with spaces in tarql

csv

rdf

sparql

triples

step1: 探索性测试查询 qstar.sparql:

step2: 测试 BIND 语句的语法是否支持 proceeds 变量名（title,_to_translate 在我们的例子中）

第 3 步：带有一些变通方法的解决方案

step1: 探索性测试查询 `qstar.sparql`:

step2: 测试 BIND 语句的语法是否支持 proceeds 变量名（`title,_to_translate` 在我们的例子中）