使用脚本搜索 bing 结果会导致编码问题
Searching bing results with a script results in an encoding issue
为了在我的 wordlist
中获得每个词的搜索结果数,我写了以下内容:
with open ("C:\wordslist.txt") as f:
lines = f.readlines()
def bingSearch(word):
r = requests.get('http://www.bing.com/search',
params={'q':'"'+word+'"'}
)
soup = BeautifulSoup(r.text, "html.parser")
return (soup.find('span',{'class':'sb_count'}))
matches = [re.search(regex,line).groups() for line in lines]
for match in matches:
searchWord = match[0]
found = bingSearch(searchWord)
print (found.text)
效果很好,我得到了准确的结果,除了包含特殊字符的单词,例如单词:"número"
.
如果我调用 bingSearch("número")
,我会得到准确的结果。
如果我调用 bingSearch(match[0])
(其中打印 match[0]
会产生 "número"
),我会得到不准确的结果。
我试过 str(match[0])
、match[0].encode(encoding="UTF-8")
,但没有成功。
有什么想法吗?
尝试在打开文件时直接给出编码,这会有所不同
with open ("C:\wordslist.txt", encoding="utf-8") as f:
为了在我的 wordlist
中获得每个词的搜索结果数,我写了以下内容:
with open ("C:\wordslist.txt") as f:
lines = f.readlines()
def bingSearch(word):
r = requests.get('http://www.bing.com/search',
params={'q':'"'+word+'"'}
)
soup = BeautifulSoup(r.text, "html.parser")
return (soup.find('span',{'class':'sb_count'}))
matches = [re.search(regex,line).groups() for line in lines]
for match in matches:
searchWord = match[0]
found = bingSearch(searchWord)
print (found.text)
效果很好,我得到了准确的结果,除了包含特殊字符的单词,例如单词:"número"
.
如果我调用 bingSearch("número")
,我会得到准确的结果。
如果我调用 bingSearch(match[0])
(其中打印 match[0]
会产生 "número"
),我会得到不准确的结果。
我试过 str(match[0])
、match[0].encode(encoding="UTF-8")
,但没有成功。
有什么想法吗?
尝试在打开文件时直接给出编码,这会有所不同
with open ("C:\wordslist.txt", encoding="utf-8") as f: