Python Elasticsearch:使用来自 search_exists 的回复
Python Elasticsearch: using responses from search_exists
我正在尝试从文本文件中获取一个 url 列表,看看它们是否已经存储在 elasticsearch 中。这是代码:
import fileinput
import sys
import urllib2
import os
from urlparse import urlparse
from elasticsearch import Elasticsearch
es = Elasticsearch()
for line_number, line in enumerate(fileinput.input('bangersandmash_items.csv', inplace=1)):
if len(line) > 4:
sys.stdout.write(line)
#open file to load URLs
with open('bangersandmash_items.csv') as urls:
for line in urls:
#strip out http:// as this seems to cause elasticsearch to return no results
url = line.rstrip()
prefix = 'http://'
if url.startswith(prefix):
url = url[len(prefix):]
#query elasticsearch to see if url already exists in library's 'link' fied
response = es.search_exists(index="websearch", doc_type="site", body={"query": {"match_phrase": {"link": url}}}, ignore=[400, 404])
print url
print response
#Is url in library?
if response == "{u'exists': true}":
print url
print "bingo!"
else:
print url
print "nuthin."
它按照第 19-22 行的格式打印出 url,但它似乎不处理错误代码。第 25 和 26 行打印出 URL 和 elasticsearch 的响应。第 28-33 行似乎没有正确地根据此信息采取行动。对我在这里做错了什么有什么想法吗?
想通了。必须调整 if/else 语句,以便将 elasticsearch 的响应作为字典中的字符串读取:
state = str(response['exists'])
if state == 'True':
print url
print "bingo!"
[etc].
我正在尝试从文本文件中获取一个 url 列表,看看它们是否已经存储在 elasticsearch 中。这是代码:
import fileinput
import sys
import urllib2
import os
from urlparse import urlparse
from elasticsearch import Elasticsearch
es = Elasticsearch()
for line_number, line in enumerate(fileinput.input('bangersandmash_items.csv', inplace=1)):
if len(line) > 4:
sys.stdout.write(line)
#open file to load URLs
with open('bangersandmash_items.csv') as urls:
for line in urls:
#strip out http:// as this seems to cause elasticsearch to return no results
url = line.rstrip()
prefix = 'http://'
if url.startswith(prefix):
url = url[len(prefix):]
#query elasticsearch to see if url already exists in library's 'link' fied
response = es.search_exists(index="websearch", doc_type="site", body={"query": {"match_phrase": {"link": url}}}, ignore=[400, 404])
print url
print response
#Is url in library?
if response == "{u'exists': true}":
print url
print "bingo!"
else:
print url
print "nuthin."
它按照第 19-22 行的格式打印出 url,但它似乎不处理错误代码。第 25 和 26 行打印出 URL 和 elasticsearch 的响应。第 28-33 行似乎没有正确地根据此信息采取行动。对我在这里做错了什么有什么想法吗?
想通了。必须调整 if/else 语句,以便将 elasticsearch 的响应作为字典中的字符串读取:
state = str(response['exists'])
if state == 'True':
print url
print "bingo!"
[etc].