XQuery如何制作相似度矩阵?

XQuery how to make a similarity matrix?

让我们假设我们有 n 条记录。我想计算每条记录与所有其他记录之间的相似度。我想做一个相似度矩阵。我是 XQuery 的新手,但我正在尽力而为。我附上了一对记录之间的相似度必须是什么样子的屏幕截图。

这是一个csv字符串。我使用以下 for 循环来生成此示例:

for $item1 at $index in /rec:Record 
let $records:= /rec:Record 
for $item2 in $records[$index + 1]

(: here I call the similarity functions :)

return 
(: csv output :)

我需要编辑 for 循环以生成数据集中每对记录之间的相似度矩阵。怎么做??

注意:相似度函数已准备就绪,我的问题是 NOT 计算相似度本身。

你可能会做这样的事情。我不确定您的 csv 是什么样子或您的解析器如何加载它。我还模拟了您表示已经拥有的某种功能。

declare function local:somefn ($listA as xs:integer*, $listB as xs:integer*) xs:string { "6,7,10,3" };

let $data :=
    <csv>
        <row>1,1,1</row>
        <row>2,2,2</row>
        <row>3,3,3</row>
        <row>4,4,4</row>
    </csv>

for $row1 at $pos in $data/row
for $row2 in $data/row[ position() > $pos ]
    let $x := local:somefn($row1, $row2)
    return $x

在 baseX 中这产生:

6,7,10,3
6,7,10,3
6,7,10,3
6,7,10,3
6,7,10,3
6,7,10,3

编辑:添加 CSV 输出作为文本节点结束:

想想 MarkLogic 中地图的强大功能。

在 ML 中表示矩阵的示例如下。我也迷上了两件事:一个函数作为你的公式的占位符(包括传递你的原始序列以防你需要它进行分析)以及一个小函数来显示如何访问地图的地图。

xquery version "1.0-ml";

declare function local:csv($matrix){
  let $nl := "&#10;"
  return text{ 
    for $x in map:keys($matrix)
      let $row := map:get($matrix, $x)
      order by xs:int($x)
      return fn:string-join(for $y in map:keys($row)
        order by xs:int($y)
        return xs:string(map:get($row, $y))
      , ",") || $nl 
  }
};

declare function local:my-formula($x, $y, $seq){
let $foo := "do something"
return "your-formula for " || xs:string($x) || " and " || xs:string($y)
};

declare function local:pretty($matrix){
  <matrix>
  {
    for $x in map:keys($matrix)
      order by xs:int($x)
    return <row>
    {
    let $row := map:get($matrix, $x)
     for $y in map:keys($row)
        order by xs:int($y)
            return <cell x="{$x}" y="{$y}">{map:get($row, $y)}</cell>

    }
    </row>


  }
 </matrix> 
};

let $matrix := map:map()
let $numbers := "1,2,3,4,5,5,6,7,8"
let $seq := fn:tokenize($numbers, ",")

let $_ := for $x in $seq
    let $map := map:map()
    let $_ := for $y in $seq
       return  map:put($map, $y, local:my-formula($x, $y, $seq))
    return map:put($matrix, $x, $map)

return local:pretty($matrix)

您可以直接转储地图中的地图($matrix)。不过local:pretty函数returns一个格式让你很方便的看地图的构​​造图:

<matrix>
  <row>
    <cell x="1" y="1">your-formula for 1 and 1</cell>
    <cell x="1" y="2">your-formula for 1 and 2</cell>
    <cell x="1" y="3">your-formula for 1 and 3</cell>
    <cell x="1" y="4">your-formula for 1 and 4</cell>
    <cell x="1" y="5">your-formula for 1 and 5</cell>
    <cell x="1" y="6">your-formula for 1 and 6</cell>
    <cell x="1" y="7">your-formula for 1 and 7</cell>
    <cell x="1" y="8">your-formula for 1 and 8</cell>
  </row>
  <row>
    <cell x="2" y="1">your-formula for 2 and 1</cell>
    <cell x="2" y="2">your-formula for 2 and 2</cell>
    <cell x="2" y="3">your-formula for 2 and 3</cell>
    <cell x="2" y="4">your-formula for 2 and 4</cell>
    <cell x="2" y="5">your-formula for 2 and 5</cell>
    <cell x="2" y="6">your-formula for 2 and 6</cell>
    <cell x="2" y="7">your-formula for 2 and 7</cell>
    <cell x="2" y="8">your-formula for 2 and 8</cell>
  </row>
  <row>
    <cell x="3" y="1">your-formula for 3 and 1</cell>
    <cell x="3" y="2">your-formula for 3 and 2</cell>
    <cell x="3" y="3">your-formula for 3 and 3</cell>
    <cell x="3" y="4">your-formula for 3 and 4</cell>
    <cell x="3" y="5">your-formula for 3 and 5</cell>
    <cell x="3" y="6">your-formula for 3 and 6</cell>
    <cell x="3" y="7">your-formula for 3 and 7</cell>
    <cell x="3" y="8">your-formula for 3 and 8</cell>
  </row>
  <row>
    <cell x="4" y="1">your-formula for 4 and 1</cell>
    <cell x="4" y="2">your-formula for 4 and 2</cell>
    <cell x="4" y="3">your-formula for 4 and 3</cell>
    <cell x="4" y="4">your-formula for 4 and 4</cell>
    <cell x="4" y="5">your-formula for 4 and 5</cell>
    <cell x="4" y="6">your-formula for 4 and 6</cell>
    <cell x="4" y="7">your-formula for 4 and 7</cell>
    <cell x="4" y="8">your-formula for 4 and 8</cell>
  </row>
  <row>
    <cell x="5" y="1">your-formula for 5 and 1</cell>
    <cell x="5" y="2">your-formula for 5 and 2</cell>
    <cell x="5" y="3">your-formula for 5 and 3</cell>
    <cell x="5" y="4">your-formula for 5 and 4</cell>
    <cell x="5" y="5">your-formula for 5 and 5</cell>
    <cell x="5" y="6">your-formula for 5 and 6</cell>
    <cell x="5" y="7">your-formula for 5 and 7</cell>
    <cell x="5" y="8">your-formula for 5 and 8</cell>
  </row>
  <row>
    <cell x="6" y="1">your-formula for 6 and 1</cell>
    <cell x="6" y="2">your-formula for 6 and 2</cell>
    <cell x="6" y="3">your-formula for 6 and 3</cell>
    <cell x="6" y="4">your-formula for 6 and 4</cell>
    <cell x="6" y="5">your-formula for 6 and 5</cell>
    <cell x="6" y="6">your-formula for 6 and 6</cell>
    <cell x="6" y="7">your-formula for 6 and 7</cell>
    <cell x="6" y="8">your-formula for 6 and 8</cell>
  </row>
  <row>
    <cell x="7" y="1">your-formula for 7 and 1</cell>
    <cell x="7" y="2">your-formula for 7 and 2</cell>
    <cell x="7" y="3">your-formula for 7 and 3</cell>
    <cell x="7" y="4">your-formula for 7 and 4</cell>
    <cell x="7" y="5">your-formula for 7 and 5</cell>
    <cell x="7" y="6">your-formula for 7 and 6</cell>
    <cell x="7" y="7">your-formula for 7 and 7</cell>
    <cell x="7" y="8">your-formula for 7 and 8</cell>
  </row>
  <row>
    <cell x="8" y="1">your-formula for 8 and 1</cell>
    <cell x="8" y="2">your-formula for 8 and 2</cell>
    <cell x="8" y="3">your-formula for 8 and 3</cell>
    <cell x="8" y="4">your-formula for 8 and 4</cell>
    <cell x="8" y="5">your-formula for 8 and 5</cell>
    <cell x="8" y="6">your-formula for 8 and 6</cell>
    <cell x="8" y="7">your-formula for 8 and 7</cell>
    <cell x="8" y="8">your-formula for 8 and 8</cell>
  </row>
</matrix>

对于 CSV,有一个名为 local:csv 的示例函数,它创建一个文本节点,结果如下:

 your-formula for 1 and 1,your-formula for 1 and 2,your-formula for 1 and 3,your-formula for 1 and 4,your-formula for 1 and 5,your-formula for 1 and 6,your-formula for 1 and 7,your-formula for 1 and 8
 your-formula for 2 and 1,your-formula for 2 and 2,your-formula for 2 and 3,your-formula for 2 and 4,your-formula for 2 and 5,your-formula for 2 and 6,your-formula for 2 and 7,your-formula for 2 and 8
 your-formula for 3 and 1,your-formula for 3 and 2,your-formula for 3 and 3,your-formula for 3 and 4,your-formula for 3 and 5,your-formula for 3 and 6,your-formula for 3 and 7,your-formula for 3 and 8
 your-formula for 4 and 1,your-formula for 4 and 2,your-formula for 4 and 3,your-formula for 4 and 4,your-formula for 4 and 5,your-formula for 4 and 6,your-formula for 4 and 7,your-formula for 4 and 8
 your-formula for 5 and 1,your-formula for 5 and 2,your-formula for 5 and 3,your-formula for 5 and 4,your-formula for 5 and 5,your-formula for 5 and 6,your-formula for 5 and 7,your-formula for 5 and 8
 your-formula for 6 and 1,your-formula for 6 and 2,your-formula for 6 and 3,your-formula for 6 and 4,your-formula for 6 and 5,your-formula for 6 and 6,your-formula for 6 and 7,your-formula for 6 and 8
 your-formula for 7 and 1,your-formula for 7 and 2,your-formula for 7 and 3,your-formula for 7 and 4,your-formula for 7 and 5,your-formula for 7 and 6,your-formula for 7 and 7,your-formula for 7 and 8
 your-formula for 8 and 1,your-formula for 8 and 2,your-formula for 8 and 3,your-formula for 8 and 4,your-formula for 8 and 5,your-formula for 8 and 6,your-formula for 8 and 7,your-formula for 8 and 8