在 P 中包装纯文本块,同时跳过已经在 P 中包装的块

Wrap plain text chunks in P while skipping chunks that are already wrapped in P

我需要用段落包裹所有纯文本块,但可能存在应该跳过的嵌套段落。我将如何解决这个问题?

我很难理解如何在跳过现有段落的同时将一些纯文本包装到一个段落中。

给定 XML:

<section xmlns="http://www.w3.org/1999/xhtml">    
    <div>
        test test 
        <p>test</p>
        <ins>INS</ins>
        text
    </div> 
</section>

预期结果:

<section xmlns="http://www.w3.org/1999/xhtml">    
    <div>
        <p>test test</p>
        <p>test</p>
        <p>
            <ins>INS</ins>
            text
        </p>
    </div> 
</section>

这是一种使用简单递归算法有效地将 div 内容划分为 p 个节点的方法

declare default element namespace "http://www.w3.org/1999/xhtml";

declare function local:collect($sequence as node()*) as node()* {
  (: index of last p in this candidate subsequence :)
  let $nextP := max((0, 
    (for $i in (1 to count($sequence)) 
     where $sequence[$i][self::p] 
     return $i)))
  return
    (: if sequence is empty then return empty sequence :)
    if(count($sequence) = 0) then ()
    (: if no p in this candidate subsequence, then wrap it in a p :)
    else if($nextP = 0) then <p>{$sequence}</p>
    (: otherwise evaluate subsequence before the last p, the p, 
       and the subsequence after the last p 
     :)
    else (
      local:collect(subsequence($sequence,1,$nextP - 1)),
      $sequence[$nextP],
      local:collect(subsequence($sequence,$nextP + 1))
    )
};

let $input :=
    <section>    
       <div>
            test test 
            <p>test</p>
            <ins>INS</ins>
            text
       </div> 
    </section>
return
  <section>    
  {
    for $div in $input/div
    return <div>{local:collect($div/(*|text()))}</div>
  }
  </section>

产生以下结果:

<section xmlns="http://www.w3.org/1999/xhtml">
   <div>
      <p>
         test test 
         </p>
      <p>test</p>
      <p><ins>INS</ins>
         text
         </p>
   </div>
</section>

您的预期结果与文本节点中的 leading/trailing 白色 space 不一致。目前尚不清楚您是否真的希望获得 whitespace 对某些文本而不是对其他文本进行标准化的确切结果。应该不是。

要在所有文本节点中标准化白色space,请替换为:

<p>{$sequence}</p>

与:

<p>{for $x in $sequence return if($x[self::text()]) then normalize-space($x) else ($x)}</p>

产生:

<section xmlns="http://www.w3.org/1999/xhtml">
  <div>
    <p>test test</p>
    <p>test</p>
    <p><ins>INS</ins>text</p>
  </div>
</section>

这里的算法在没有p或多个p的情况下有效,但我没有测试每个场景。

在 XQuery 3 中,这可以通过翻滚来简化 windows,例如:

(: Return true if the passed nodes exist and both p or neither are p.
 :)
declare function local:same($compare1 as node()?, $compare2 as node()?) as xs:boolean {
  if(not($compare1) or not($compare2)) then false()
  else if(($compare1[self::p] and $compare2[self::p]) 
    or (not($compare1[self::p]) and not($compare2[self::p])))
  then true()
  else false()
};

let $input :=
    <section>    
        <div>
            test test 
            <p>test</p>
            <ins>INS</ins>
            text
        </div> 
    </section>

return
  <section>    
  {
    for $div in $input/div
    return
      <div>
      {
        for tumbling window $partition in $div/(*|text())
        start $s previous $s-prev when not(local:same($s, $s-prev))
        end   $e next $e-next     when not(local:same($e, $e-next))
        return 
          if($partition[1][self::p]) 
          then $partition 
          else <p>{$partition}</p>
      }
      </div>
  }
  </section>

与标准化 space 类似,替换:

<p>{$partition}</p>

类似

<p>{for $x in $partition return if($x[self::text()]) then normalize-space($x) else ($x)}</p>