如何 return 使用 python 请求在网页中找到字符串匹配的整行
How to return the whole line where a string match is found in a webpage using python request
我正在抓取与我在网页中搜索的字符串相匹配的行。我尝试了一些方法,但它读取并显示了所有内容。以下是部分片段。
import requests
url = "https://bscscan.com/address/0x88c20beda907dbc60c56b71b102a133c1b29b053#code"
queries = ["Website", "Telegram", "Submitted"]
r = requests.get(url)
for q in queries:
q = q.lower()
if q in r.text.lower():
print(q, 'Found')
else:
print(q, 'Not Found')
当前输出:
website Found
telegram Found
submitted Found
想要的输出:
Submitted Found - *Submitted for verification at BscScan.com on 2021-08-08
Website Found - *Website: www.shibuttinu.com
Telegram Found - *Telegram: https://t.me/Shibuttinu
requests
正在返回一个 html 页面,您必须使用 html 解析器对其进行解析。一个问题是您的目标输出卡在一个长字符串的中间,在解析之后,您必须使用一些字符串操作来提取它。
您可以使用带有 css 选择器的 beautifulsoup 或带有 xpath:
的 lxml 来解析 html
首先,使用lxml:
import lxml.html as lh
doc = lh.fromstring(r.text)
loc = doc.xpath('//pre[@class="js-sourcecopyarea editor"]')[0]
targets = list(loc.itertext())[0].split('*')
for target in targets:
for query in queries:
if query in target:
print(target)
与beautifulsoup:
from bs4 import BeautifulSoup as bs
soup = bs(r.text,'lxml')
pre = soup.select_one('pre.js-sourcecopyarea.editor')
ss = (list(pre.stripped_strings)[0]).split('*')
for s in ss:
for query in queries:
if query in s:
print(s)
两种情况下的输出:
Submitted for verification at BscScan.com on 2021-08-08
Website: www.shibuttinu.com
Telegram: https://t.me/Shibuttinu
您只打印了 q
。你查询的是哪个。您要打印请求 r
简而言之,您应该尝试:print(q, 'Found', r)
import requests
url = "https://bscscan.com/address/0x88c20beda907dbc60c56b71b102a133c1b29b053#code"
queries = ["Website", "Telegram", "Submitted"]
req = requests.get(url).text
for r in req:
if any(q.lower() in r.lower() for q in queries):
print(q, 'Found in', r)
最后,在这个网站上您找不到任何结果,因为您要查找的文本不在 Text 标签内。您可能想过滤您的请求以查找 div 和 class="ace_line_group"
.
我正在抓取与我在网页中搜索的字符串相匹配的行。我尝试了一些方法,但它读取并显示了所有内容。以下是部分片段。
import requests
url = "https://bscscan.com/address/0x88c20beda907dbc60c56b71b102a133c1b29b053#code"
queries = ["Website", "Telegram", "Submitted"]
r = requests.get(url)
for q in queries:
q = q.lower()
if q in r.text.lower():
print(q, 'Found')
else:
print(q, 'Not Found')
当前输出:
website Found
telegram Found
submitted Found
想要的输出:
Submitted Found - *Submitted for verification at BscScan.com on 2021-08-08
Website Found - *Website: www.shibuttinu.com
Telegram Found - *Telegram: https://t.me/Shibuttinu
requests
正在返回一个 html 页面,您必须使用 html 解析器对其进行解析。一个问题是您的目标输出卡在一个长字符串的中间,在解析之后,您必须使用一些字符串操作来提取它。
您可以使用带有 css 选择器的 beautifulsoup 或带有 xpath:
的 lxml 来解析 html首先,使用lxml:
import lxml.html as lh
doc = lh.fromstring(r.text)
loc = doc.xpath('//pre[@class="js-sourcecopyarea editor"]')[0]
targets = list(loc.itertext())[0].split('*')
for target in targets:
for query in queries:
if query in target:
print(target)
与beautifulsoup:
from bs4 import BeautifulSoup as bs
soup = bs(r.text,'lxml')
pre = soup.select_one('pre.js-sourcecopyarea.editor')
ss = (list(pre.stripped_strings)[0]).split('*')
for s in ss:
for query in queries:
if query in s:
print(s)
两种情况下的输出:
Submitted for verification at BscScan.com on 2021-08-08
Website: www.shibuttinu.com
Telegram: https://t.me/Shibuttinu
您只打印了 q
。你查询的是哪个。您要打印请求 r
简而言之,您应该尝试:print(q, 'Found', r)
import requests
url = "https://bscscan.com/address/0x88c20beda907dbc60c56b71b102a133c1b29b053#code"
queries = ["Website", "Telegram", "Submitted"]
req = requests.get(url).text
for r in req:
if any(q.lower() in r.lower() for q in queries):
print(q, 'Found in', r)
最后,在这个网站上您找不到任何结果,因为您要查找的文本不在 Text 标签内。您可能想过滤您的请求以查找 div 和 class="ace_line_group"
.