使用正则表达式和 bash 在 xidel 中为 xpath 表达式创建别名
Creating an alias for xpath expression in xidel with regex and bash
如果您已经使用过Xidel,您将经常需要定位具有一定class的节点。为了更容易做到这一点,我想创建 has-class("class")
函数作为表达式的别名:
contains(concat(" ", normalize-space(@class), " "), " class ")
.
示例:
$ e-xidel.sh example.com '//article/p//img[has-class("wp-image")]'
e-xidel.sh 包含此代码:
#!/bin/bash
echo -e "$(tput setaf 2) Checking... $(tput sgr0)"
path=
expression=
# expression = '//article/p//img[has-class("wp-image")]'
# Regex to replace every * has-class("class") * by * contains(concat(" ", normalize-space(@class), " "), " class ") *
# ...
# ...
# expression = '//article/p//img[contains(concat(" ", normalize-space(@class), " "), " wp-image ")]'
xoutput=$(xidel $path --printed-node-format=html --output-declaration= -e "$expression")
echo -e "$(tput setaf 1) $xoutput $(tput sgr0)"
您可以使用 sed
(GNU 版本,不能保证它可以与其他实现一起使用)来实现您的需求:
sed 's/has-class("\([^)]\+\)")/contains(concat(" ", normalize-space(@class), " "), " ")/g'
解释:
s/pattern/substitution/g
:用substitution
字符串替换匹配模式的部分; g 标志用于替换行的所有部分(全局替换)
has-class("\([^)]\+\)")
:以 has-class("
开头的部分,包含除右括号 ([^)]
) 以外的任何字符,以 [=18 结尾=].围绕内部部分的转义括号捕获子部分并将其与别名关联 </code>,因为它是第一个创建的捕获组。</li>
<li><strong><code>contains(concat(" ", normalize-space(@class), " "), " ")
:用这段文字替换加工部分; </code> 将根据关联的捕获组的内容进行扩展。 </li>
</ul>
<p>您的脚本将是:</p>
<pre><code>#!/bin/bash
function expand-has-class() {
echo "" |
sed 's/has-class("\([^)]\+\)")/contains(concat(" ", normalize-space(@class), " "), " ")/g'
}
echo -e "$(tput setaf 2) Checking... $(tput sgr0)"
path=
expression="$(expand-has-class "")"
# expression = '//article/p//img[has-class("wp-image")]'
# Regex to replace every * has-class("class") * by * contains(concat(" ", normalize-space(@class), " "), " class ") *
# ...
# ...
# expression = '//article/p//img[contains(concat(" ", normalize-space(@class), " "), " wp-image ")]'
xoutput=$(xidel $path --printed-node-format=html --output-declaration= -e "$expression")
echo -e "$(tput setaf 1) $xoutput $(tput sgr0)"
contains(concat(" ", normalize-space(@class), " "), " class ")
Example:
$ e-xidel.sh example.com '//article/p//img[has-class("wp-image")]'
这毫无意义。
contains(concat(" ",normalize-space("wp-image")," ")," wp-image ")
与
相同
contains("wp-image","wp-image")
如果您在将 class
属性的值与文字字符串进行比较时确实想要一个布尔值作为输出,那么这...
xidel -s example.com -e '//article/p//img/@class="wp-image"'
...会 return true
或 false
.
如果 wp-image
是 class
属性值的子字符串:
xidel -s example.com -e '//article/p//img/contains(@class,"wp-image")'
如果您已经使用过Xidel,您将经常需要定位具有一定class的节点。为了更容易做到这一点,我想创建 has-class("class")
函数作为表达式的别名:
contains(concat(" ", normalize-space(@class), " "), " class ")
.
示例:
$ e-xidel.sh example.com '//article/p//img[has-class("wp-image")]'
e-xidel.sh 包含此代码:
#!/bin/bash
echo -e "$(tput setaf 2) Checking... $(tput sgr0)"
path=
expression=
# expression = '//article/p//img[has-class("wp-image")]'
# Regex to replace every * has-class("class") * by * contains(concat(" ", normalize-space(@class), " "), " class ") *
# ...
# ...
# expression = '//article/p//img[contains(concat(" ", normalize-space(@class), " "), " wp-image ")]'
xoutput=$(xidel $path --printed-node-format=html --output-declaration= -e "$expression")
echo -e "$(tput setaf 1) $xoutput $(tput sgr0)"
您可以使用 sed
(GNU 版本,不能保证它可以与其他实现一起使用)来实现您的需求:
sed 's/has-class("\([^)]\+\)")/contains(concat(" ", normalize-space(@class), " "), " ")/g'
解释:
s/pattern/substitution/g
:用substitution
字符串替换匹配模式的部分; g 标志用于替换行的所有部分(全局替换)has-class("\([^)]\+\)")
:以has-class("
开头的部分,包含除右括号 ([^)]
) 以外的任何字符,以 [=18 结尾=].围绕内部部分的转义括号捕获子部分并将其与别名关联</code>,因为它是第一个创建的捕获组。</li> <li><strong><code>contains(concat(" ", normalize-space(@class), " "), " ")
:用这段文字替换加工部分;</code> 将根据关联的捕获组的内容进行扩展。 </li> </ul> <p>您的脚本将是:</p> <pre><code>#!/bin/bash function expand-has-class() { echo "" | sed 's/has-class("\([^)]\+\)")/contains(concat(" ", normalize-space(@class), " "), " ")/g' } echo -e "$(tput setaf 2) Checking... $(tput sgr0)" path= expression="$(expand-has-class "")" # expression = '//article/p//img[has-class("wp-image")]' # Regex to replace every * has-class("class") * by * contains(concat(" ", normalize-space(@class), " "), " class ") * # ... # ... # expression = '//article/p//img[contains(concat(" ", normalize-space(@class), " "), " wp-image ")]' xoutput=$(xidel $path --printed-node-format=html --output-declaration= -e "$expression") echo -e "$(tput setaf 1) $xoutput $(tput sgr0)"
contains(concat(" ", normalize-space(@class), " "), " class ")
Example:
$ e-xidel.sh example.com '//article/p//img[has-class("wp-image")]'
这毫无意义。
contains(concat(" ",normalize-space("wp-image")," ")," wp-image ")
与
相同contains("wp-image","wp-image")
如果您在将 class
属性的值与文字字符串进行比较时确实想要一个布尔值作为输出,那么这...
xidel -s example.com -e '//article/p//img/@class="wp-image"'
...会 return true
或 false
.
如果 wp-image
是 class
属性值的子字符串:
xidel -s example.com -e '//article/p//img/contains(@class,"wp-image")'