xsd:date 上 FILTER 的意外行为

unexpected behaviour with FILTER on xsd:date

如何过滤特定年份 (1970) 的结果计数(罢工次数)?我的解决方案提供了意想不到的结果。在查询中,我写下了我尝试过的替代方案及其结果。

其他人([1], [2])提到的解决方案没有解决问题。

终点是: https://api.druid.datalegend.net/datasets/rlzijdeman/ClariahTech2017/containers/clariahTech2017/sparql

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gg: <http://www.gemeentegeschiedenis.nl/gg-schema#>
PREFIX strikes: <https://iisg.amsterdam/vocab/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?muni ?sdate (COUNT(?muni) as ?muniCount)  
WHERE {
  ?strike strikes:place ?splace .
  ?strike strikes:date ?sdate .
  ?muni rdf:type gg:Municipality .
  ?muni rdfs:label ?ggplace . 
  FILTER regex(?splace, ?ggplace)

  ### TASK: Filter results above to strikes in 1970 only

  # solution 1: extract year and FILTER on 1970
  # FILTER ( year(?sdate) = 1970 )
  ### Virtuoso 22003 Error SR586: Incomplete RDF box as argument 0 for year().

  # solution 2: filter on ?sdate
  # FILTER ( ?sdate >= '1970-01-01'^^xsd:date && ?sdate <= '1970-12-31'^^xsd:date )
  ### Virtuoso 2201B Error SR098: regexp error at '? [Arnhem ( Gelderland )]' column 0 (nothing to repeat)
  ####### Why? This was no problem under solution 1 ?!
  ####### Also: note that each of these works seperately, but not together(!):
  # FILTER ( ?sdate >= '1970-01-01'^^xsd:date )
  # FILTER ( ?sdate <= '1970-12-31'^^xsd:date )

} 
LIMIT 10

关于"Solution 1":

函数 yearxsd:dateTime 类型的文字作为输入 - 您的数据仅包含 xsd:datexsd:gYearMonth 文字。这就是演员表可能失败的原因。

关于"Solution 2":

可能是 Virtuoso 中的错误。但总的来说,我不确定你为什么在这里需要 REGEX。如果您只想去掉语言标签进行比较,请使用 str 函数。它也快得多:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gg: <http://www.gemeentegeschiedenis.nl/gg-schema#>
PREFIX strikes: <https://iisg.amsterdam/vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT * {
  ?strike strikes:place ?splace .
  ?strike strikes:date ?sdate .
  ?muni rdf:type gg:Municipality .
  ?muni rdfs:label ?ggplace . 
  FILTER (?sdate >= '1970-01-01'^^xsd:date && ?sdate < '1971-01-01'^^xsd:date)
  FILTER(str(?splace) = str(?ggplace)) 
} 
LIMIT 10

我在您的查询中发现的另一件事很奇怪,您不应该计算罢工而不是市政当局本身吗?我的意思是,据我了解,您想获得特定日期每个城市的罢工次数(如果我错了请纠正我)。如果是这样,查询应如下所示:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gg: <http://www.gemeentegeschiedenis.nl/gg-schema#>
PREFIX strikes: <https://iisg.amsterdam/vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?muni (COUNT(?strike) as ?strikes) {
  ?strike strikes:place ?splace .
  ?strike strikes:date ?sdate .
  ?muni rdf:type gg:Municipality .
  ?muni rdfs:label ?ggplace . 
  FILTER (?sdate >= '1970-01-01'^^xsd:date && ?sdate < '1971-01-01'^^xsd:date)
  FILTER(str(?splace) = str(?ggplace)) 
} 
GROUP BY ?muni
LIMIT 10

此外,如果有多次罢工,获得 ?sdate 没有意义,对吗?除非你想像这样获得所有罢工的日期:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gg: <http://www.gemeentegeschiedenis.nl/gg-schema#>
PREFIX strikes: <https://iisg.amsterdam/vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?muni (COUNT(?strike) as ?strikes) (GROUP_CONCAT(?sdate; separator = ";") as ?s_dates) {
  ?strike strikes:place ?splace .
  ?strike strikes:date ?sdate .
  ?muni rdf:type gg:Municipality .
  ?muni rdfs:label ?ggplace . 
  FILTER (?sdate >= '1970-01-01'^^xsd:date && ?sdate < '1971-01-01'^^xsd:date)
  FILTER(str(?splace) = str(?ggplace)) 
} 
GROUP BY ?muni
LIMIT 10

小评论

我也尝试先转换为 xsd:dateTime 然后 select year:

FILTER (year(xsd:dateTime(?sdate)) = 1970)

有趣的是,由于 29.2,这失败了。 :D:

Virtuoso 22007 Error DT006: Cannot convert 1911-02-29 to datetime : Too many days (29, the month has only 28)

不确定,如果 2 月的值 28 在 Virtuoso 中是硬编码的,或者它是否一定是闰年 - 至少这是有道理的,因为 1911 不是闰年(1911 不能被 4 整除,因此, 平年)