使用正则表达式和 bash 在 xidel 中为 xpath 表达式创建别名

Creating an alias for xpath expression in xidel with regex and bash

如果您已经使用过Xidel,您将经常需要定位具有一定class的节点。为了更容易做到这一点,我想创建 has-class("class") 函数作为表达式的别名:
contains(concat(" ", normalize-space(@class), " "), " class ").

示例:

$ e-xidel.sh example.com '//article/p//img[has-class("wp-image")]'

e-xidel.sh 包含此代码:

#!/bin/bash

echo -e "$(tput setaf 2) Checking... $(tput sgr0)"

path=
expression=

# expression = '//article/p//img[has-class("wp-image")]'
# Regex to replace every * has-class("class") * by * contains(concat(" ", normalize-space(@class), " "), " class ") *
# ...
# ...
# expression = '//article/p//img[contains(concat(" ", normalize-space(@class), " "), " wp-image ")]'

xoutput=$(xidel $path --printed-node-format=html --output-declaration= -e "$expression")

echo -e "$(tput setaf 1) $xoutput $(tput sgr0)"

您可以使用 sed(GNU 版本,不能保证它可以与其他实现一起使用)来实现您的需求:

sed 's/has-class("\([^)]\+\)")/contains(concat(" ", normalize-space(@class), " "), "  ")/g'

解释:

  • s/pattern/substitution/g:用substitution字符串替换匹配模式的部分; g 标志用于替换行的所有部分(全局替换)
  • has-class("\([^)]\+\)"):以 has-class(" 开头的部分,包含除右括号 ([^)]) 以外的任何字符,以 [=18 结尾=].围绕内部部分的转义括号捕获子部分并将其与别名关联 </code>,因为它是第一个创建的捕获组。</li> <li><strong><code>contains(concat(" ", normalize-space(@class), " "), " "):用这段文字替换加工部分; </code> 将根据关联的捕获组的内容进行扩展。 </li> </ul> <p>您的脚本将是:</p> <pre><code>#!/bin/bash function expand-has-class() { echo "" | sed 's/has-class("\([^)]\+\)")/contains(concat(" ", normalize-space(@class), " "), " ")/g' } echo -e "$(tput setaf 2) Checking... $(tput sgr0)" path= expression="$(expand-has-class "")" # expression = '//article/p//img[has-class("wp-image")]' # Regex to replace every * has-class("class") * by * contains(concat(" ", normalize-space(@class), " "), " class ") * # ... # ... # expression = '//article/p//img[contains(concat(" ", normalize-space(@class), " "), " wp-image ")]' xoutput=$(xidel $path --printed-node-format=html --output-declaration= -e "$expression") echo -e "$(tput setaf 1) $xoutput $(tput sgr0)"

contains(concat(" ", normalize-space(@class), " "), " class ")

Example:

$ e-xidel.sh example.com '//article/p//img[has-class("wp-image")]'

这毫无意义。

contains(concat(" ",normalize-space("wp-image")," ")," wp-image ")

相同
contains("wp-image","wp-image")

如果您在将 class 属性的值与文字字符串进行比较时确实想要一个布尔值作为输出,那么这...

xidel -s example.com -e '//article/p//img/@class="wp-image"'

...会 return truefalse.

如果 wp-imageclass 属性值的子字符串:

xidel -s example.com -e '//article/p//img/contains(@class,"wp-image")'