通过 sparql-update 查询将 TTL 上传到 GraphDB 时因变音符号而失败

Upload of TTL by sparql-update queryinto GraphDB fails on diacritics

使用以下 bash 脚本上传海龟数据:

#!/usr/bin/env bash
RDF4J_ENDPOINT=endpoint_uri
DIR="~/modelio/workspace/IPR/"
IFS=
FILE=tmp.rq

function runUpdateQuery() {
    cp  $FILE
    sed -i -e "s!__VOC_IRI__!!g" $FILE
    curl --netrc-file .netrc -X POST -H "Content-type: application/sparql-update" -T $FILE $RDF4J_ENDPOINT/statements
}

function transform() {
    VOC_IRI=
    PREFIX=

    URL="$RDF4J_ENDPOINT/rdf-graphs/service?graph=$VOC_IRI"
    curl --netrc-file .netrc -X POST -H "Content-type: text/turtle" -T "$DIR/$PREFIX-model.ttl" $URL
}

transform http://onto.fel.cvut.cz/ontologies/slovník/datový-psp-2016 psp-2016

词汇 IRI (.../slovník/datový-...) 中的变音符号失败,出现以下错误:

<!doctype html><html lang="en"><head><title>HTTP Status 400 – Bad Request</title><style type="text/css">h1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} h2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} h3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} body {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} b {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} p {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;} a {color:black;} a.name {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 400 – Bad Request</h1><hr class="line" /><p><b>Type</b> Exception Report</p><p><b>Message</b> Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986</p><p><b>Description</b> The server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax, invalid request message framing, or deceptive request routing).</p><p><b>Exception</b></p><pre>java.lang.IllegalArgumentException: Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986
    org.apache.coyote.http11.Http11InputBuffer.parseRequestLine(Http11InputBuffer.java:467)
    org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:294)
    org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
    org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:834)
    org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1417)
    org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    java.lang.Thread.run(Thread.java:748)
</pre><p><b>Note</b> The full stack trace of the root cause is available in the server logs.</p><hr class="line" /><h3>Apache Tomcat/9.0.14</h3></body></html>

删除变音符号后效果很好。知道哪里出了问题吗?

GraphDB 使用 Unicode,特别是 UTF-8 编码用于所有通过 HTTP 的通信。为了在 URL 中传递任何非 ASCII,它需要被编码为 UTF-8。如果您那样使用 Curl,它不会自动执行此操作。您可以手动 URL 编码“í”和“ý”的 UTF-8 表示(%C3%AD 和 %C3%BD),或者您可以使用此 curl 功能:

curl -X POST -H "Content-type: text/turtle" -T file.ttl\
     -G --data-urlencode "graph=http://onto.fel.cvut.cz/ontologies/slovník/datový-psp-2016"\
     http://hostname:7200/repositories/repo/rdf-graphs/service

关键是 -G 选项,它告诉 curl 将 URL 编码的参数附加到 URL.