如何读取标签内的文本
How to read text inside a tag
如果你向下滚动你会看到这个
我向下滚动页面,然后为此获取 xpath。这是 xpath:
//div[@id="js-hook-description"]//p/text
这是代码
const results = xpathT.fromPageSource(data).findElements(rest);
//console.log("The href value is:", results[0].getAttribute("href"));
console.log(`Your full text is "${results[0].getText()}"`);
if (results.length > 0) {
let _results = [];
if (path.includes("href", 0)){
for (let r of results) {
_results.push(r.getAttribute("href"));
}
}
if (path.includes("text", 0)){
//console.log("inside");
//console.log(results);
for (let r of results) {
console.log(r.getText());
_results.push(r.getText());
}
当我简单地打印结果时,它给了我这个:
Your full text is "<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5" color="#ff0000">LAMBORGHİNİ GALLARDO LP560-4</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5" color="#ff0000">2009 MODEL - 38.000 KM</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">DOĞUŞ OTO <font color="#ff0000">BAYİİ</font> ÇIKIŞLI</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4"><br/></font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">AİRMATİC (LİFT)</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">SERAMİK FREN</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">GERİ GÖRÜŞ KAMERASI</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">PADDLESHİFT (F1)</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">2 BÖLGE KLİMA</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">DERİ KOLTUK</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">Bİ-ZENON FAR</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">YAĞMUR SENSÖRÜ</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">CD-USB-AUX-MP3</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4"><br/></font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">?</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">BOYA - HATA - TRAMER - HASAR KAYDI </font><font color="#ff0000"><font size="4"> </font><font size="5">YOKTUR</font></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5"><br/></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5">ARACIMIZIN TAMPONLARI DAHİL <font color="#ff0000">BOYASIZ</font></font></b><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4"><br/></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">YEDEK ANAHTARI <font color="#ff0000">MEVCUTTUR</font></font></b><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b>?</b></span><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="5">DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="5" color="#ff0000"><br/></font></b></span></p>,<p style="text-align: cente...
但是当我调用 .getText() 时,它 returns 未定义。可能的解决方案是什么?
您可以使用 page.evaluate
to get the innerText property of any DOM elements. In case you need the text by paragraphs you should use the proper CSS selector for the <p>
elements, in this case it is: #js-hook-description > div > p
. The matching elements can be collected with the page.$$
方法(它与页面上下文中的 document.querySelectorAll()
相同),然后可以迭代这些元素(参见 for..of
和 Array.map
下面的变体),在每次迭代中检索 innerText
并应用 String.trim()
来清除换行符中的段落(例如:\n
)。
// full text content into one string
const fullText = await page.evaluate(el => el.innerText, await page.$('#js-hook-description'))
console.log(fullText)
// each paragraph into an array element I.
const textArray = []
const paragraphs = await page.$$('#js-hook-description > div > p')
for (const p of paragraphs) {
const actualPara = await page.evaluate(el => el.innerText.trim(), p)
textArray.push(actualPara)
}
console.log(JSON.stringify(textArray))
另一种解决方案可以使用 page.$$eval
和 Array.map
:
// each paragraph into an array element II.
const alternativeSolution = await page.$$eval('#js-hook-description > div > p', paragraphs => paragraphs.map(p => p.innerText.trim()))
console.log(JSON.stringify(alternativeSolution))
全文输出:
LAMBORGHİNİ GALLARDO LP560-4 2009 MODEL - 38.000 KM DOĞUŞ OTO BAYİİ ÇIKIŞLI AİRMATİC (LİFT) SERAMİK FREN GERİ GÖRÜŞ KAMERASI PADDLESHİFT (F1) 2 BÖLGE KLİMA DERİ KOLTUK Bİ-ZENON FAR YAĞMUR SENSÖRÜ CD-USB-AUX-MP3 ? BOYA - HATA - TRAMER - HASAR KAYDI YOKTUR ARACIMIZIN TAMPONLARI DAHİL BOYASIZ YEDEK ANAHTARI MEVCUTTUR ? DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ 0533 239 22 77
数组逐行输出:
["LAMBORGHİNİ GALLARDO LP560-4","2009 MODEL - 38.000 KM","DOĞUŞ OTO BAYİİ ÇIKIŞLI","","AİRMATİC (LİFT)","SERAMİK FREN","GERİ GÖRÜŞ KAMERASI","PADDLESHİFT (F1)","2 BÖLGE KLİMA","DERİ KOLTUK","Bİ-ZENON FAR","YAĞMUR SENSÖRÜ","CD-USB-AUX-MP3","","?","BOYA - HATA - TRAMER - HASAR KAYDI YOKTUR","","ARACIMIZIN TAMPONLARI DAHİL BOYASIZ","","YEDEK ANAHTARI MEVCUTTUR","?","DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ","","0533 239 22 77"]
如果你向下滚动你会看到这个
我向下滚动页面,然后为此获取 xpath。这是 xpath: //div[@id="js-hook-description"]//p/text
这是代码
const results = xpathT.fromPageSource(data).findElements(rest);
//console.log("The href value is:", results[0].getAttribute("href"));
console.log(`Your full text is "${results[0].getText()}"`);
if (results.length > 0) {
let _results = [];
if (path.includes("href", 0)){
for (let r of results) {
_results.push(r.getAttribute("href"));
}
}
if (path.includes("text", 0)){
//console.log("inside");
//console.log(results);
for (let r of results) {
console.log(r.getText());
_results.push(r.getText());
}
当我简单地打印结果时,它给了我这个:
Your full text is "<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5" color="#ff0000">LAMBORGHİNİ GALLARDO LP560-4</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5" color="#ff0000">2009 MODEL - 38.000 KM</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">DOĞUŞ OTO <font color="#ff0000">BAYİİ</font> ÇIKIŞLI</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4"><br/></font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">AİRMATİC (LİFT)</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">SERAMİK FREN</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">GERİ GÖRÜŞ KAMERASI</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">PADDLESHİFT (F1)</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">2 BÖLGE KLİMA</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">DERİ KOLTUK</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">Bİ-ZENON FAR</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">YAĞMUR SENSÖRÜ</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">CD-USB-AUX-MP3</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4"><br/></font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">?</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">BOYA - HATA - TRAMER - HASAR KAYDI </font><font color="#ff0000"><font size="4"> </font><font size="5">YOKTUR</font></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5"><br/></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5">ARACIMIZIN TAMPONLARI DAHİL <font color="#ff0000">BOYASIZ</font></font></b><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4"><br/></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">YEDEK ANAHTARI <font color="#ff0000">MEVCUTTUR</font></font></b><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b>?</b></span><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="5">DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="5" color="#ff0000"><br/></font></b></span></p>,<p style="text-align: cente...
但是当我调用 .getText() 时,它 returns 未定义。可能的解决方案是什么?
您可以使用 page.evaluate
to get the innerText property of any DOM elements. In case you need the text by paragraphs you should use the proper CSS selector for the <p>
elements, in this case it is: #js-hook-description > div > p
. The matching elements can be collected with the page.$$
方法(它与页面上下文中的 document.querySelectorAll()
相同),然后可以迭代这些元素(参见 for..of
和 Array.map
下面的变体),在每次迭代中检索 innerText
并应用 String.trim()
来清除换行符中的段落(例如:\n
)。
// full text content into one string
const fullText = await page.evaluate(el => el.innerText, await page.$('#js-hook-description'))
console.log(fullText)
// each paragraph into an array element I.
const textArray = []
const paragraphs = await page.$$('#js-hook-description > div > p')
for (const p of paragraphs) {
const actualPara = await page.evaluate(el => el.innerText.trim(), p)
textArray.push(actualPara)
}
console.log(JSON.stringify(textArray))
另一种解决方案可以使用 page.$$eval
和 Array.map
:
// each paragraph into an array element II.
const alternativeSolution = await page.$$eval('#js-hook-description > div > p', paragraphs => paragraphs.map(p => p.innerText.trim()))
console.log(JSON.stringify(alternativeSolution))
全文输出:
LAMBORGHİNİ GALLARDO LP560-4 2009 MODEL - 38.000 KM DOĞUŞ OTO BAYİİ ÇIKIŞLI AİRMATİC (LİFT) SERAMİK FREN GERİ GÖRÜŞ KAMERASI PADDLESHİFT (F1) 2 BÖLGE KLİMA DERİ KOLTUK Bİ-ZENON FAR YAĞMUR SENSÖRÜ CD-USB-AUX-MP3 ? BOYA - HATA - TRAMER - HASAR KAYDI YOKTUR ARACIMIZIN TAMPONLARI DAHİL BOYASIZ YEDEK ANAHTARI MEVCUTTUR ? DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ 0533 239 22 77
数组逐行输出:
["LAMBORGHİNİ GALLARDO LP560-4","2009 MODEL - 38.000 KM","DOĞUŞ OTO BAYİİ ÇIKIŞLI","","AİRMATİC (LİFT)","SERAMİK FREN","GERİ GÖRÜŞ KAMERASI","PADDLESHİFT (F1)","2 BÖLGE KLİMA","DERİ KOLTUK","Bİ-ZENON FAR","YAĞMUR SENSÖRÜ","CD-USB-AUX-MP3","","?","BOYA - HATA - TRAMER - HASAR KAYDI YOKTUR","","ARACIMIZIN TAMPONLARI DAHİL BOYASIZ","","YEDEK ANAHTARI MEVCUTTUR","?","DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ","","0533 239 22 77"]