如何获取html w/o id或tagname的文本内容?

How to get the text content of html w/o Id or tagname?

我正在开发供个人使用的 chrome 扩展(使用 Javascript),它可以抓取一些数据并将其导出到 csv 文件,现在我在提取一些文本时遇到问题节点,因为它们的选择器(chrome css 选择器)会根据具有相同 class 但内容不同的标签的数量而变化。

这里是 Html 的例子:

<li class="sc-account-rows__row">
<div class="sc-account-rows__row__label">RTF</div> // Title Label
<div class="sc-account-rows__row__price--container">
<div class="sc-account-rows__row__price">-$ 1.485</div> // Price Label <- How to get This?
</div>
</li>

<li class="sc-account-rows__row">
<div class="sc-account-rows__row__label">some text</div> // Another Label
<div class="sc-account-rows__row__price--container">
<div class="sc-account-rows__row__price">-$ 2.418</div> // Another price which I don't need but has same class
</div>
</li>

换句话说,这个特定标签的选择器可以是:

#root-app > div > div.sc-account-section > div.sc-account-section__container > div.sc-account-module > div:nth-child(3) > ul > li:nth-child(1) > div.sc-account-rows__row__price--container > div

#root-app > div > div.sc-account-section > div.sc-account-section__container > div.sc-account-module > div:nth-child(3) > ul > li:nth-child(9) > div.sc-account-rows__row__price--container > div

As you can see there is no Id or Name assigned to this particular label, I was using (successfully) this piece of code when the selector was always the same. (注意这是在 iframe 中)

var RTF_fee = "#root-app > div > div.sc-account-section > div.sc-account-section__container > div.sc-account-module > div:nth-child(3) > ul > li:nth-child(4) > div.sc-account-rows__row__price--container > div";

    if (iframe_document.body.contains(iframe_document.querySelector(RTF_fee))) {
        RTF_value = iframe_document.querySelector(RTF_fee).textContent;
        console.log(RTF_value);
    }
    else {
        RTF_value = "0";
        console.log(RTF_value);
    }

所以,问题是:如果我没有唯一的selector/Id,如何获取价格标签中的文本内容?

我想我可以处理价格标签 class 始终为“sc-account-rows__row__price”并且价格之前的标签文本始终为“RTF”的事实,但我不知道如何对此进行编码,或者是否有更好的选择。

我为我糟糕的“编程”语言道歉,我只是一个临时程序员)

提前致谢。

如果我没有正确理解你的问题,这就是你要找的。


let labels = document.querySelectorAll(".sc-account-rows__row__label");

[...labels].forEach(label => {
  if (label.innerText === "RTF") {
    let price = label.parentElement.querySelector(".sc-account-rows__row__price").innerText;
    console.log(price);
  }
})

Console: "-$ 1.485"

如果您需要使用除 RTF 之外的额外标签:

let words = ["RTF", "something", "else"];

// change if condition to
if (words.includes[label.innerText]) {...}

另一种方法是使用 XPath,这是一种在 XML 中导航的旧标准,也适用于 HTML。在这种情况下,您正在寻找具有 class“sc-account-rows__row__price”的 div,而另一个 div 具有 class“ sc-account-rows__row__label”附近有文本“RTF”。

div 的 XPath 是

//div[@class="sc-account-rows__row__label"][text()="RTF"]/following-sibling::div[@class="sc-account-rows__row__price--container"]//div[@class="sc-account-rows__row__price"]

由于 price--container/price div 的嵌套性质,它有点复杂。

要获取文本,您需要使用 document.evaluate:

function getPriceByLabel(label) {
  // change this to a parent element if you can; it will speed up the process.
  const contextNode = document;
  const priceResults = document.evaluate(`//div[@class="sc-account-rows__row__label"][text()="${label}"]/following-sibling::div[@class="sc-account-rows__row__price--container"]//div[@class="sc-account-rows__row__price"]`, contextNode, null, XPathResult.ANY_UNORDERED_NODE_TYPE, null);
  const priceElement = priceResults.singleNodeValue;
  const price = priceElement.textContent;
  return price;
}
console.log(getPriceByLabel("RTF"));
console.log(getPriceByLabel("some text"));
<li class="sc-account-rows__row">
  <div class="sc-account-rows__row__label">RTF</div>
  <div class="sc-account-rows__row__price--container">
    <div class="sc-account-rows__row__price">-$ 1.485</div>
  </div>
</li>

<li class="sc-account-rows__row">
  <div class="sc-account-rows__row__label">some text</div>
  <div class="sc-account-rows__row__price--container">
    <div class="sc-account-rows__row__price">-$ 2.418</div>
  </div>
</li>

我把它放在一个接受标签并输出价格的函数中。