希德尔提取物 number/float
Xidel extract number/float
我想使用 Xidel 从这段代码中提取 number/float 值:
<p class="price">
<span class="woocommerce-Price-amount amount">
<bdi>
304.00
<span class="woocommerce-Price-currencySymbol">
€
</span>
</bdi>
</span>
</p>
我正在尝试以下命令:
xidel -s '<p class="price"><span class="woocommerce-Price-amount amount"><bdi>304.00 <span class="woocommerce-Price-currencySymbol">€</span></bdi></span></p>' -e "//p[@class='price']/translate(normalize-space(substring-before(., '€')),' ','')"
翻译命令应该替换 space,但它不起作用,在输出中我仍然看到数字“304.00_”后有一个 space。
尝试将 xpath 表达式更改为
-e "substring-before(//p[@class='price']//bdi/normalize-space(.),' ')"
或
-e "substring-before(//p[@class='price']//bdi/.,' ')"
或使用tokenize()
-e "tokenize(//p[@class='price']//bdi/.,' ')[1]"
输出应该是
'304.00'
您将不得不使用以下查询之一单独处理 no-break space:
-e "//p[@class='price']/span/bdi/substring-before(text(),' ')"
-e "//p[@class='price']/span/bdi/translate(text(),x:cps(160),'')"
-e "//p[@class='price']/span/bdi/replace(text(),' ','')"
您不能使用 normalize-space()
,因为...
https://www.w3.org/TR/xpath-functions-31/#func-normalize-space:
The definition of whitespace is unchanged in [Extensible Markup Language (XML) 1.1 Recommendation]. It is repeated here for convenience:
S ::= (#x20 | #x9 | #xD | #xA)+
...它处理空格、制表符、回车 returns 和换行,但不处理不间断空格:
xidel -s "<x> test </x>" -e "x'[{x}]'"
[ test ]
xidel -s "<x> test </x>" -e "x'[{normalize-space(x)}]'"
[test]
xidel -s "<x> test </x>" -e "x'[{x}]'"
[ test ]
xidel -s "<x> test </x>" -e "x'[{normalize-space(x)}]'"
[ test ]
xidel -s "<x> test </x>" -e "x'[{translate(x,' ','')}]'"
xidel -s "<x> test </x>" -e "x'[{replace(x,x:cps(160),'')}]'"
xidel -s "<x> test </x>" -e "x'[{replace(x,' ','')}]'"
[test]
顺便说一句,在该网站上获取价格的替代方法:
xidel -s "https://kenzel.sk/produkt/bicykle/zivotny-styl/signora/" -e ^"^
parse-json(^
//body/script[@type='application/ld+json']^
)//priceSpecification/price^
"
304.00
我想使用 Xidel 从这段代码中提取 number/float 值:
<p class="price">
<span class="woocommerce-Price-amount amount">
<bdi>
304.00
<span class="woocommerce-Price-currencySymbol">
€
</span>
</bdi>
</span>
</p>
我正在尝试以下命令:
xidel -s '<p class="price"><span class="woocommerce-Price-amount amount"><bdi>304.00 <span class="woocommerce-Price-currencySymbol">€</span></bdi></span></p>' -e "//p[@class='price']/translate(normalize-space(substring-before(., '€')),' ','')"
翻译命令应该替换 space,但它不起作用,在输出中我仍然看到数字“304.00_”后有一个 space。
尝试将 xpath 表达式更改为
-e "substring-before(//p[@class='price']//bdi/normalize-space(.),' ')"
或
-e "substring-before(//p[@class='price']//bdi/.,' ')"
或使用tokenize()
-e "tokenize(//p[@class='price']//bdi/.,' ')[1]"
输出应该是
'304.00'
您将不得不使用以下查询之一单独处理 no-break space:
-e "//p[@class='price']/span/bdi/substring-before(text(),' ')"
-e "//p[@class='price']/span/bdi/translate(text(),x:cps(160),'')"
-e "//p[@class='price']/span/bdi/replace(text(),' ','')"
您不能使用 normalize-space()
,因为...
https://www.w3.org/TR/xpath-functions-31/#func-normalize-space:
The definition of whitespace is unchanged in [Extensible Markup Language (XML) 1.1 Recommendation]. It is repeated here for convenience:
S ::= (#x20 | #x9 | #xD | #xA)+
...它处理空格、制表符、回车 returns 和换行,但不处理不间断空格:
xidel -s "<x> test </x>" -e "x'[{x}]'"
[ test ]
xidel -s "<x> test </x>" -e "x'[{normalize-space(x)}]'"
[test]
xidel -s "<x> test </x>" -e "x'[{x}]'"
[ test ]
xidel -s "<x> test </x>" -e "x'[{normalize-space(x)}]'"
[ test ]
xidel -s "<x> test </x>" -e "x'[{translate(x,' ','')}]'"
xidel -s "<x> test </x>" -e "x'[{replace(x,x:cps(160),'')}]'"
xidel -s "<x> test </x>" -e "x'[{replace(x,' ','')}]'"
[test]
顺便说一句,在该网站上获取价格的替代方法:
xidel -s "https://kenzel.sk/produkt/bicykle/zivotny-styl/signora/" -e ^"^
parse-json(^
//body/script[@type='application/ld+json']^
)//priceSpecification/price^
"
304.00