为 Elasticsearch 文档解析 Google 自定义搜索 API
Parsing Google Custom Search API for Elasticsearch Documents
从 Google Custom Search API 检索结果并将其写入 JSON 后,我想解析 JSON 以生成有效的 Elasticsearch 文档。您可以为嵌套结果配置父子关系。然而,这种关系似乎不是由数据结构本身推断出来的。我试过自动加载,但没有结果。
下面是一些示例输入,不包括 id 或 index 等内容。我试图专注于创建正确的数据结构。我试过修改深度优先搜索等图形算法,但 运行 遇到了不同数据结构的问题。
这是一些示例输入:
# mock data structure
google = {"content": "foo",
"results": {"result_one": {"persona": "phone",
"personb": "phone",
"personc": "phone"
},
"result_two": ["thing1",
"thing2",
"thing3"
],
"result_three": "none"
},
"query": ["Taylor Swift", "Bob Dole", "Rocketman"]
}
# correctly formatted documents for _source of elasticsearch entry
correct_documents = [
{"content":"foo"},
{"results": ["result_one", "result_two", "result_three"]},
{"result_one": ["persona", "personb", "personc"]},
{"persona": "phone"},
{"personb": "phone"},
{"personc": "phone"},
{"result_two":["thing1","thing2","thing3"]},
{"result_three": "none"},
{"query": ["Taylor Swift", "Bob Dole", "Rocketman"]}
]
这是我目前的方法,仍在进行中:
def recursive_dfs(graph, start, path=[]):
'''recursive depth first search from start'''
path=path+[start]
for node in graph[start]:
if not node in path:
path=recursive_dfs(graph, node, path)
return path
def branching(google):
""" Get branches as a starting point for dfs"""
branch = 0
while branch < len(google):
if google[google.keys()[branch]] is dict:
#recursive_dfs(google, google[google.keys()[branch]])
pass
else:
print("branch {}: result {}\n".format(branch, google[google.keys()[branch]]))
branch += 1
branching(google)
你可以看到recursive_dfs()
仍然需要修改以处理字符串和列表数据结构。
我会继续这样做,但如果您有任何想法、建议或解决方案,我将不胜感激。感谢您的宝贵时间。
这是您的问题的可能答案。
def myfunk( inHole, outHole):
for keys in inHole.keys():
is_list = isinstance(inHole[keys],list);
is_dict = isinstance(inHole[keys],dict);
if is_list:
element = inHole[keys];
new_element = {keys:element};
outHole.append(new_element);
if is_dict:
element = inHole[keys].keys();
new_element = {keys:element};
outHole.append(new_element);
myfunk(inHole[keys], outHole);
if not(is_list or is_dict):
new_element = {keys:inHole[keys]};
outHole.append(new_element);
return outHole.sort();
从 Google Custom Search API 检索结果并将其写入 JSON 后,我想解析 JSON 以生成有效的 Elasticsearch 文档。您可以为嵌套结果配置父子关系。然而,这种关系似乎不是由数据结构本身推断出来的。我试过自动加载,但没有结果。
下面是一些示例输入,不包括 id 或 index 等内容。我试图专注于创建正确的数据结构。我试过修改深度优先搜索等图形算法,但 运行 遇到了不同数据结构的问题。
这是一些示例输入:
# mock data structure
google = {"content": "foo",
"results": {"result_one": {"persona": "phone",
"personb": "phone",
"personc": "phone"
},
"result_two": ["thing1",
"thing2",
"thing3"
],
"result_three": "none"
},
"query": ["Taylor Swift", "Bob Dole", "Rocketman"]
}
# correctly formatted documents for _source of elasticsearch entry
correct_documents = [
{"content":"foo"},
{"results": ["result_one", "result_two", "result_three"]},
{"result_one": ["persona", "personb", "personc"]},
{"persona": "phone"},
{"personb": "phone"},
{"personc": "phone"},
{"result_two":["thing1","thing2","thing3"]},
{"result_three": "none"},
{"query": ["Taylor Swift", "Bob Dole", "Rocketman"]}
]
这是我目前的方法,仍在进行中:
def recursive_dfs(graph, start, path=[]):
'''recursive depth first search from start'''
path=path+[start]
for node in graph[start]:
if not node in path:
path=recursive_dfs(graph, node, path)
return path
def branching(google):
""" Get branches as a starting point for dfs"""
branch = 0
while branch < len(google):
if google[google.keys()[branch]] is dict:
#recursive_dfs(google, google[google.keys()[branch]])
pass
else:
print("branch {}: result {}\n".format(branch, google[google.keys()[branch]]))
branch += 1
branching(google)
你可以看到recursive_dfs()
仍然需要修改以处理字符串和列表数据结构。
我会继续这样做,但如果您有任何想法、建议或解决方案,我将不胜感激。感谢您的宝贵时间。
这是您的问题的可能答案。
def myfunk( inHole, outHole):
for keys in inHole.keys():
is_list = isinstance(inHole[keys],list);
is_dict = isinstance(inHole[keys],dict);
if is_list:
element = inHole[keys];
new_element = {keys:element};
outHole.append(new_element);
if is_dict:
element = inHole[keys].keys();
new_element = {keys:element};
outHole.append(new_element);
myfunk(inHole[keys], outHole);
if not(is_list or is_dict):
new_element = {keys:inHole[keys]};
outHole.append(new_element);
return outHole.sort();