用于 Web 抓取的 JSoup 或 XPath?
JSoup or XPath for Web-scraping?
如果提取图像和文本等内容的网站,JSoup 比 XPath 对我有何好处?或者在这种情况下 XPath 是更好的选择。
Advantage
XPath
JSoup
Handles well-formed markup
✅
✅
Handles poorly formed markup
✅
Has clean, declarative syntax
✅
Is standardized
✅
Supported by hosting language: Java
✅
✅
Supported by hosting language and utilities: C#, JavaScript, Python, PhP, VBA, Ruby, XSLT, xmlstarlet
✅
如果提取图像和文本等内容的网站,JSoup 比 XPath 对我有何好处?或者在这种情况下 XPath 是更好的选择。
Advantage | XPath | JSoup |
---|---|---|
Handles well-formed markup | ✅ | ✅ |
Handles poorly formed markup | ✅ | |
Has clean, declarative syntax | ✅ | |
Is standardized | ✅ | |
Supported by hosting language: Java | ✅ | ✅ |
Supported by hosting language and utilities: C#, JavaScript, Python, PhP, VBA, Ruby, XSLT, xmlstarlet | ✅ |