JSSoup 是否支持类似于 Beautiful Soup 或 JSoup 的 select()?
Does JSSoup support select() similar to Beautiful Soup or JSoup?
JSSoup (which itself states "JavaScript + BeautifulSoup = JSSoup") support a select()
operation similar to Beautiful Soup or JSoup到select元素是基于CSS还是select?
我没有找到它,它可能以不同的名称存在吗?
根据文档,它似乎被称为 find
or findAll
,具体取决于您是要查找一个还是多个。这是他们给出的一个例子:
var data = `
<div>
<p> hello </p>
<p> world </p>
</div>
`
var soup = new JSSoup(data);
soup.find('p')
// <p> hello </p>
Looking at the source, I don't see anything offering CSS selector functionality, but it did show that find
and findAll
accept more than one argument, and an example in the documentation for BeautifulSoup 显示使用第二个参数按 class 过滤,例如:
const JSSoup = require('jssoup').default;
const data = `
<div>
<p class="foo bar"> hello </p>
<p> world </p>
</div>
`
const soup = new JSSoup(data);
console.log(soup.find('p', 'foo').toString()); // Logs: <p class="foo bar">hello</p>
第二个参数也可用于其他属性,但 CSS 选择器似乎不是一个选项。
您还有其他选项,例如 jsdom
,其中包含所有常见的 DOM 内容,例如 querySelector
和 querySelectorAll
:
const { JSDOM } = require("jsdom");
const data = `
<div>
<p class="foo bar"> hello </p>
<p> world </p>
</div>
`;
const dom = new JSDOM(data);
const doc = dom.window.document;
console.log(doc.querySelector(".foo").outerHTML); // Logs: <p class="foo bar"> hello </p>
您将无法使用类似于 querySelector
和 querySelectorAll
的选择器查询。
这是 JSsoup 中的 findAll
定义:
{
key: 'findAll',
value: function findAll() {
var name = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : undefined;
var attrs = arguments.length > 1 && arguments[1] !== undefined ? arguments[1] : undefined;
var string = arguments.length > 2 && arguments[2] !== undefined ? arguments[2] : undefined;
// ...
var strainer = new SoupStrainer(name, attrs, string);
// ...
}
}
这里是 SoupStrainer
构造函数:
function SoupStrainer(name, attrs, string) {
_classCallCheck(this, SoupStrainer);
if (typeof attrs == 'string') {
attrs = { class: [attrs] };
} else if (Array.isArray(attrs)) {
attrs = { class: attrs };
} else if (attrs && attrs.class && typeof attrs.class == 'string') {
attrs.class = [attrs.class];
}
if (attrs && attrs.class) {
for (var i = 0; i < attrs.class.length; ++i) {
attrs.class[i] = attrs.class[i].trim();
}
}
this.name = name;
this.attrs = attrs;
this.string = string;
}
您需要将标签名称作为第一个参数传递,然后是属性。字符串被视为 class 名称。
用法示例
const JSSoup = require('jssoup').default;
const html = `
<html>
<head>
<title>Hello World</title>
</head>
<body>
<h1>Hello World</h1>
<p class="foo">First</p>
<p class="foo bar">Second</p>
<div class="foo">Third</div>
</body>
</html>
`;
const printTags = (tags) => console.log(tags.map(t => t.toString()).join(' '));
const soup = new JSSoup(html);
printTags(soup.findAll('p', 'foo'));
// <p class="foo">First</p> <p class="foo">Second</p>
printTags(soup.findAll('p', { class: 'foo' }));
// <p class="foo">First</p> <p class="foo">Second</p>
printTags(soup.findAll('p', { class: 'foo' }, 'Second'));
// <p class="foo">Second</p>
printTags(soup.findAll('p', { class: ['foo', 'bar'] }));
// <p class="foo">Second</p>
printTags(soup.findAll(null, 'bar'));
// <p class="foo bar">Second</p> <div class="foo">Third</div>
根据已经给出的答案,我只想补充一点:也可以通过将标签名称设置为 undefined
in find()
和 findAll()
:
mySoup.findAll(undefined, 'myClass');
JSSoup (which itself states "JavaScript + BeautifulSoup = JSSoup") support a select()
operation similar to Beautiful Soup or JSoup到select元素是基于CSS还是select?
我没有找到它,它可能以不同的名称存在吗?
根据文档,它似乎被称为 find
or findAll
,具体取决于您是要查找一个还是多个。这是他们给出的一个例子:
var data = `
<div>
<p> hello </p>
<p> world </p>
</div>
`
var soup = new JSSoup(data);
soup.find('p')
// <p> hello </p>
Looking at the source, I don't see anything offering CSS selector functionality, but it did show that find
and findAll
accept more than one argument, and an example in the documentation for BeautifulSoup 显示使用第二个参数按 class 过滤,例如:
const JSSoup = require('jssoup').default;
const data = `
<div>
<p class="foo bar"> hello </p>
<p> world </p>
</div>
`
const soup = new JSSoup(data);
console.log(soup.find('p', 'foo').toString()); // Logs: <p class="foo bar">hello</p>
第二个参数也可用于其他属性,但 CSS 选择器似乎不是一个选项。
您还有其他选项,例如 jsdom
,其中包含所有常见的 DOM 内容,例如 querySelector
和 querySelectorAll
:
const { JSDOM } = require("jsdom");
const data = `
<div>
<p class="foo bar"> hello </p>
<p> world </p>
</div>
`;
const dom = new JSDOM(data);
const doc = dom.window.document;
console.log(doc.querySelector(".foo").outerHTML); // Logs: <p class="foo bar"> hello </p>
您将无法使用类似于 querySelector
和 querySelectorAll
的选择器查询。
这是 JSsoup 中的 findAll
定义:
{
key: 'findAll',
value: function findAll() {
var name = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : undefined;
var attrs = arguments.length > 1 && arguments[1] !== undefined ? arguments[1] : undefined;
var string = arguments.length > 2 && arguments[2] !== undefined ? arguments[2] : undefined;
// ...
var strainer = new SoupStrainer(name, attrs, string);
// ...
}
}
这里是 SoupStrainer
构造函数:
function SoupStrainer(name, attrs, string) {
_classCallCheck(this, SoupStrainer);
if (typeof attrs == 'string') {
attrs = { class: [attrs] };
} else if (Array.isArray(attrs)) {
attrs = { class: attrs };
} else if (attrs && attrs.class && typeof attrs.class == 'string') {
attrs.class = [attrs.class];
}
if (attrs && attrs.class) {
for (var i = 0; i < attrs.class.length; ++i) {
attrs.class[i] = attrs.class[i].trim();
}
}
this.name = name;
this.attrs = attrs;
this.string = string;
}
您需要将标签名称作为第一个参数传递,然后是属性。字符串被视为 class 名称。
用法示例
const JSSoup = require('jssoup').default;
const html = `
<html>
<head>
<title>Hello World</title>
</head>
<body>
<h1>Hello World</h1>
<p class="foo">First</p>
<p class="foo bar">Second</p>
<div class="foo">Third</div>
</body>
</html>
`;
const printTags = (tags) => console.log(tags.map(t => t.toString()).join(' '));
const soup = new JSSoup(html);
printTags(soup.findAll('p', 'foo'));
// <p class="foo">First</p> <p class="foo">Second</p>
printTags(soup.findAll('p', { class: 'foo' }));
// <p class="foo">First</p> <p class="foo">Second</p>
printTags(soup.findAll('p', { class: 'foo' }, 'Second'));
// <p class="foo">Second</p>
printTags(soup.findAll('p', { class: ['foo', 'bar'] }));
// <p class="foo">Second</p>
printTags(soup.findAll(null, 'bar'));
// <p class="foo bar">Second</p> <div class="foo">Third</div>
根据已经给出的答案,我只想补充一点:也可以通过将标签名称设置为 undefined
in find()
和 findAll()
:
mySoup.findAll(undefined, 'myClass');