如何获得独特的搜索结果?
How to get unique search results?
我正在使用这个(https://github.com/thibauts/duckduckgo)模块来抓取 duckduckgo 搜索结果:
>>> import duckduckgo
>>> for links in duckduckgo.search('Yellow Chris Martin',max_results=20):
... print links
在我得到搜索结果的输出中,似乎有
重复 4 次相同的 link
输出:
http://www.youtube.com/watch?v=ZTEKsbLl64w
http://www.youtube.com/watch?v=ZTEKsbLl64w
http://www.youtube.com/watch?v=ZTEKsbLl64w
http://www.youtube.com/watch?v=ZTEKsbLl64w
https://en.wikipedia.org/wiki/Yellow_(Coldplay_song)
https://en.wikipedia.org/wiki/Yellow_(Coldplay_song)
https://en.wikipedia.org/wiki/Yellow_(Coldplay_song)
https://en.wikipedia.org/wiki/Yellow_(Coldplay_song)
http://www.youtube.com/watch?v=1MwjX4dG72s
http://www.youtube.com/watch?v=1MwjX4dG72s
http://www.youtube.com/watch?v=1MwjX4dG72s
http://www.youtube.com/watch?v=1MwjX4dG72s
如何解决此问题并获得与使用搜索引擎时相同的结果。
您可以将 duckduckgo 对象转换为列表,然后使用 set() :
count = 10
while( set(list(duckduckgo.search('Yellow Chris Martin',max_results=count)) ) < some_val ):
count = count + 1
for links in set(list(duckduckgo.search('Yellow Chris Martin',max_results=count)) :
print links
我正在使用这个(https://github.com/thibauts/duckduckgo)模块来抓取 duckduckgo 搜索结果:
>>> import duckduckgo
>>> for links in duckduckgo.search('Yellow Chris Martin',max_results=20):
... print links
在我得到搜索结果的输出中,似乎有
重复 4 次相同的 link
输出:
http://www.youtube.com/watch?v=ZTEKsbLl64w
http://www.youtube.com/watch?v=ZTEKsbLl64w
http://www.youtube.com/watch?v=ZTEKsbLl64w
http://www.youtube.com/watch?v=ZTEKsbLl64w
https://en.wikipedia.org/wiki/Yellow_(Coldplay_song)
https://en.wikipedia.org/wiki/Yellow_(Coldplay_song)
https://en.wikipedia.org/wiki/Yellow_(Coldplay_song)
https://en.wikipedia.org/wiki/Yellow_(Coldplay_song)
http://www.youtube.com/watch?v=1MwjX4dG72s
http://www.youtube.com/watch?v=1MwjX4dG72s
http://www.youtube.com/watch?v=1MwjX4dG72s
http://www.youtube.com/watch?v=1MwjX4dG72s
如何解决此问题并获得与使用搜索引擎时相同的结果。
您可以将 duckduckgo 对象转换为列表,然后使用 set() :
count = 10
while( set(list(duckduckgo.search('Yellow Chris Martin',max_results=count)) ) < some_val ):
count = count + 1
for links in set(list(duckduckgo.search('Yellow Chris Martin',max_results=count)) :
print links