网页抓取后如何分割线?
How to split lines after web scraping?
I'm now having a problem, this is the website i'm scraping : https://tw.dictionary.search.yahoo.com/search;_ylt=AwrtXGvbWIJibWYAFCp9rolQ;_ylc=X1MDMTM1MTIwMDM4MQRfcgMyBGZyA3NmcARmcjIDc2ItdG9wBGdwcmlkAwRuX3JzbHQDMARuX3N1Z2cDMARvcmlnaW4DdHcuZGljdGlvbmFyeS5zZWFyY2gueWFob28uY29tBHBvcwMwBHBxc3RyAwRwcXN0cmwDMARxc3RybAM0BHF1ZXJ5A3RhcGUEdF9zdG1wAzE2NTI3MDk5NTM-?p=take&fr2=sb-top&fr=sfp , it's a web dictionary provided by Yahoo, what I am trying to do is when you input your request to translate and the output will show the results.
import requests
from bs4 import BeautifulSoup
def searchdic():
global d
a = "https://tw.dictionary.search.yahoo.com/search;_ylt=AwrtXGvbWIJibWYAFCp9rolQ"
b = ";_ylc=X1MDMTM1MTIwMDM4MQRfcgMyBGZyA3NmcARmcjIDc2ItdG9wBGdwcmlkAwRuX3JzbHQDMARuX3N1Z2cDMARvcmlnaW4DdHcuZGljdGlvbmFyeS5zZWFyY2gueWFob28uY29tBHBvcwMwBHBxc3RyAwRwcXN0cmwDMARxc3RybAM0BHF1ZXJ5A3RhcGUEdF9zdG1wAzE2NTI3MDk5NTM-?"
c = "p="
e = "&fr2=sb-top&fr=sfp"
search = a+b+c+d+e
print(search)
resp = requests.get(search)
soup = BeautifulSoup(resp.text, 'html.parser')
#print(soup.find('','compList mb-25 p-rel'))
if soup.find('','compList mb-25 p-rel') == None:
print("Invalid query!")
else:
print(soup.find('div','compList mb-25 p-rel').text)
#divs = soup.find_all('div', 'compList mb-25 p-rel')
#for div in divs:
#print(f"{[s for s in div.stripped_strings]}""\n")
def changechinesetourl():
global d
from urllib import parse
str = d
d = parse.quote(str)
searchdic()
def is_contains_chinese():
global d
for _char in d:
if '\u4e00' <= _char <= '\u9fa5':
return True
return False
d = input("What do you want to translate: ")
is_contains_chinese()
if True:
changechinesetourl()
else:
searchdic()
Here's what i have written, and my output shows like if you type "take":
vt. 拿,取;握,抱;拿走,取走;夺取,占领;抓,捕;吸引 vi. (染料)被吸收,染上;依法获得财产 n. 一次拍摄的电影(电视)镜头[C];捕获量;收获量;收入[S1]
and i wanted to see is separated like this:
vt. 拿,取;握,抱;拿走,取走;夺取,占领;抓,捕;吸引
vi. (染料)被吸收,染上;依法获得财产
n. 一次拍摄的电影(电视)镜头[C];捕获量;收获量;收入[S1]
I've tried to use
#divs = soup.find_all('div', 'compList mb-25 p-rel')
#for div in divs:
#print(f"{[s for s in div.stripped_strings]}""\n")
but the results is the same but only with [ at the beginning and ] at the ending.
I'm not sure if it is because the original web html didn't split lines.
this is a part of the original page code:
<div class="compList mb-25 p-rel" ><ul ><li class="lh-22 mh-22 mt-12 mb-12 mr-25"><div class=" pos_button fz-14 fl-l mr-12">vt.</div> <div class=" fz-16 fl-l dictionaryExplanation">拿,取;握,抱;拿走,取走;奪取,佔領;抓,捕;吸引</div> </li><li class="lh-22 mh-22 mt-12 mb-12 mr-25"><div class=" pos_button fz-14 fl-l mr-12">vi.</div> <div class=" fz-16 fl-l dictionaryExplanation">(染料)被吸收,染上;依法獲得財產</div> </li><li class="lh-22 mh-22 mt-12 mb-12 mr-25 last"><div class=" pos_button fz-14 fl-l mr-12">n.</div> <div class=" fz-16 fl-l dictionaryExplanation">一次拍攝的電影(電視)鏡頭[C];捕獲量;收穫量;收入[S1]</div>
要按照您想要的方式格式化该文本,我必须这样做:
for div in divs:
lis = div.find_all('li')
for li in lis:
print(li.text.replace('\n', ''))
I'm now having a problem, this is the website i'm scraping : https://tw.dictionary.search.yahoo.com/search;_ylt=AwrtXGvbWIJibWYAFCp9rolQ;_ylc=X1MDMTM1MTIwMDM4MQRfcgMyBGZyA3NmcARmcjIDc2ItdG9wBGdwcmlkAwRuX3JzbHQDMARuX3N1Z2cDMARvcmlnaW4DdHcuZGljdGlvbmFyeS5zZWFyY2gueWFob28uY29tBHBvcwMwBHBxc3RyAwRwcXN0cmwDMARxc3RybAM0BHF1ZXJ5A3RhcGUEdF9zdG1wAzE2NTI3MDk5NTM-?p=take&fr2=sb-top&fr=sfp , it's a web dictionary provided by Yahoo, what I am trying to do is when you input your request to translate and the output will show the results.
import requests
from bs4 import BeautifulSoup
def searchdic():
global d
a = "https://tw.dictionary.search.yahoo.com/search;_ylt=AwrtXGvbWIJibWYAFCp9rolQ"
b = ";_ylc=X1MDMTM1MTIwMDM4MQRfcgMyBGZyA3NmcARmcjIDc2ItdG9wBGdwcmlkAwRuX3JzbHQDMARuX3N1Z2cDMARvcmlnaW4DdHcuZGljdGlvbmFyeS5zZWFyY2gueWFob28uY29tBHBvcwMwBHBxc3RyAwRwcXN0cmwDMARxc3RybAM0BHF1ZXJ5A3RhcGUEdF9zdG1wAzE2NTI3MDk5NTM-?"
c = "p="
e = "&fr2=sb-top&fr=sfp"
search = a+b+c+d+e
print(search)
resp = requests.get(search)
soup = BeautifulSoup(resp.text, 'html.parser')
#print(soup.find('','compList mb-25 p-rel'))
if soup.find('','compList mb-25 p-rel') == None:
print("Invalid query!")
else:
print(soup.find('div','compList mb-25 p-rel').text)
#divs = soup.find_all('div', 'compList mb-25 p-rel')
#for div in divs:
#print(f"{[s for s in div.stripped_strings]}""\n")
def changechinesetourl():
global d
from urllib import parse
str = d
d = parse.quote(str)
searchdic()
def is_contains_chinese():
global d
for _char in d:
if '\u4e00' <= _char <= '\u9fa5':
return True
return False
d = input("What do you want to translate: ")
is_contains_chinese()
if True:
changechinesetourl()
else:
searchdic()
Here's what i have written, and my output shows like if you type "take":
vt. 拿,取;握,抱;拿走,取走;夺取,占领;抓,捕;吸引 vi. (染料)被吸收,染上;依法获得财产 n. 一次拍摄的电影(电视)镜头[C];捕获量;收获量;收入[S1]
and i wanted to see is separated like this:
vt. 拿,取;握,抱;拿走,取走;夺取,占领;抓,捕;吸引
vi. (染料)被吸收,染上;依法获得财产
n. 一次拍摄的电影(电视)镜头[C];捕获量;收获量;收入[S1]
I've tried to use
#divs = soup.find_all('div', 'compList mb-25 p-rel')
#for div in divs:
#print(f"{[s for s in div.stripped_strings]}""\n")
but the results is the same but only with [ at the beginning and ] at the ending.
I'm not sure if it is because the original web html didn't split lines.
this is a part of the original page code:
<div class="compList mb-25 p-rel" ><ul ><li class="lh-22 mh-22 mt-12 mb-12 mr-25"><div class=" pos_button fz-14 fl-l mr-12">vt.</div> <div class=" fz-16 fl-l dictionaryExplanation">拿,取;握,抱;拿走,取走;奪取,佔領;抓,捕;吸引</div> </li><li class="lh-22 mh-22 mt-12 mb-12 mr-25"><div class=" pos_button fz-14 fl-l mr-12">vi.</div> <div class=" fz-16 fl-l dictionaryExplanation">(染料)被吸收,染上;依法獲得財產</div> </li><li class="lh-22 mh-22 mt-12 mb-12 mr-25 last"><div class=" pos_button fz-14 fl-l mr-12">n.</div> <div class=" fz-16 fl-l dictionaryExplanation">一次拍攝的電影(電視)鏡頭[C];捕獲量;收穫量;收入[S1]</div>
要按照您想要的方式格式化该文本,我必须这样做:
for div in divs:
lis = div.find_all('li')
for li in lis:
print(li.text.replace('\n', ''))