将 Python-Scopus API 结果导出为 CSV
Export Python-Scopus API results into CSV
我是 Python 的新手,所以不确定这是否可以完成,但我希望可以!
我访问了 Scopus API 并设法 运行 了一个搜索查询,它在 pandas 数据框中给出了以下结果:
search-results
entry [{'@_fa': 'true', 'affiliation': [{'@_fa': 'tr...
link [{'@_fa': 'true', '@ref': 'self', '@type': 'ap...
opensearch:Query {'@role': 'request', '@searchTerms': 'AFFIL(un...
opensearch:itemsPerPage 200
opensearch:startIndex 0
opensearch:totalResults 106652
如果可能的话,我想将 106652 个结果导出到一个 csv 文件中,以便对其进行分析。这可能吗?
首先你需要得到所有的结果(见问题评论)。
您需要的数据(搜索结果)在 "entry" 列表中。
您可以提取该列表并将其附加到支持列表,迭代直到获得所有结果。在这里我循环并且在每一轮我从结果总数中减去下载的项目(计数)。
found_items_num = 1
start_item = 0
items_per_query = 25
max_items = 2000
JSON = []
print ('GET data from Search API...')
while found_items_num > 0:
resp = requests.get(self._url,
headers={'Accept': 'application/json', 'X-ELS-APIKey': MY_API_KEY},
params={'query': query, 'view': view, 'count': items_per_query,
'start': start_item})
print ('Current query url:\n\t{}\n'.format(resp.url))
if resp.status_code != 200:
# error
raise Exception('ScopusSearchApi status {0}, JSON dump:\n{1}\n'.format(resp.status_code, resp.json()))
# we set found_items_num=1 at initialization, on the first call it has to be set to the actual value
if found_items_num == 1:
found_items_num = int(resp.json().get('search-results').get('opensearch:totalResults'))
print ('GET returned {} articles.'.format(found_items_num))
if found_items_num == 0:
pass
else:
# write fetched JSON data to a file.
out_file = os.path.join(str(start_item) + '.json')
with open(out_file, 'w') as f:
json.dump(resp.json(), f, indent=4)
f.close()
# check if results number exceed the given limit
if found_items_num > max_items:
print('WARNING: too many results, truncating to {}'.format(max_items))
found_items_num = max_items
# check if returned some result
if 'entry' in resp.json().get('search-results', []):
# combine entries to make a single JSON
JSON += resp.json()['search-results']['entry']
# set counters for the next cycle
self._found_items_num -= self._items_per_query
self._start_item += self._items_per_query
print ('Still {} results to be downloaded'.format(self._found_items_num if self._found_items_num > 0 else 0))
# end while - finished downloading JSON data
然后,在一段时间之外,您可以像这样保存完整的文件...
out_file = os.path.join('articles.json')
with open(out_file, 'w') as f:
json.dump(JSON, f, indent=4)
f.close()
或者您可以按照 this guide i found online(未测试,您可以搜索 'json to cvs python' 并获得许多指南)将 json 数据转换为 csv
我是 Python 的新手,所以不确定这是否可以完成,但我希望可以!
我访问了 Scopus API 并设法 运行 了一个搜索查询,它在 pandas 数据框中给出了以下结果:
search-results
entry [{'@_fa': 'true', 'affiliation': [{'@_fa': 'tr...
link [{'@_fa': 'true', '@ref': 'self', '@type': 'ap...
opensearch:Query {'@role': 'request', '@searchTerms': 'AFFIL(un...
opensearch:itemsPerPage 200
opensearch:startIndex 0
opensearch:totalResults 106652
如果可能的话,我想将 106652 个结果导出到一个 csv 文件中,以便对其进行分析。这可能吗?
首先你需要得到所有的结果(见问题评论)。 您需要的数据(搜索结果)在 "entry" 列表中。 您可以提取该列表并将其附加到支持列表,迭代直到获得所有结果。在这里我循环并且在每一轮我从结果总数中减去下载的项目(计数)。
found_items_num = 1
start_item = 0
items_per_query = 25
max_items = 2000
JSON = []
print ('GET data from Search API...')
while found_items_num > 0:
resp = requests.get(self._url,
headers={'Accept': 'application/json', 'X-ELS-APIKey': MY_API_KEY},
params={'query': query, 'view': view, 'count': items_per_query,
'start': start_item})
print ('Current query url:\n\t{}\n'.format(resp.url))
if resp.status_code != 200:
# error
raise Exception('ScopusSearchApi status {0}, JSON dump:\n{1}\n'.format(resp.status_code, resp.json()))
# we set found_items_num=1 at initialization, on the first call it has to be set to the actual value
if found_items_num == 1:
found_items_num = int(resp.json().get('search-results').get('opensearch:totalResults'))
print ('GET returned {} articles.'.format(found_items_num))
if found_items_num == 0:
pass
else:
# write fetched JSON data to a file.
out_file = os.path.join(str(start_item) + '.json')
with open(out_file, 'w') as f:
json.dump(resp.json(), f, indent=4)
f.close()
# check if results number exceed the given limit
if found_items_num > max_items:
print('WARNING: too many results, truncating to {}'.format(max_items))
found_items_num = max_items
# check if returned some result
if 'entry' in resp.json().get('search-results', []):
# combine entries to make a single JSON
JSON += resp.json()['search-results']['entry']
# set counters for the next cycle
self._found_items_num -= self._items_per_query
self._start_item += self._items_per_query
print ('Still {} results to be downloaded'.format(self._found_items_num if self._found_items_num > 0 else 0))
# end while - finished downloading JSON data
然后,在一段时间之外,您可以像这样保存完整的文件...
out_file = os.path.join('articles.json')
with open(out_file, 'w') as f:
json.dump(JSON, f, indent=4)
f.close()
或者您可以按照 this guide i found online(未测试,您可以搜索 'json to cvs python' 并获得许多指南)将 json 数据转换为 csv