NeuroMorpho.org - 从多个 API 页面获取结果

NeuroMorpho.org - getting results from multiple API pages

提前抱歉,因为这是我的第一个 post,我对 Python 编码完全陌生。 我想使用 NeuroMorpho API (http://neuromorpho.org/apiReference.html) 来查找和获取有关某些神经元的信息(在查询行中添加了过滤器)。

我使用了以下代码:

import requests
import json
import csv
import pandas as pd
from pandas import DataFrame
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

response = requests.get("http://neuromorpho.org/api")
response

query = (
    "http://neuromorpho.org/api/neuron/select?q=species:rat&fq=brain_region:hippocampus, CA1&fq=experiment_condition:Control&fq=cell_type:Pyramidal, principal cell"
)

response = requests.get(query)
json_data = response.json()
rat_data = json_data
rat_data

我得到了大量的数据,最后它说了以下内容:

'page': {'size': 50, 'totalElements': 1115, 'totalPages': 23, 'number': 0}}

然后我想根据该数据创建字典并使用以下代码:

df_dict = {}
df_dict['NeuronID'] = []
df_dict['Archive'] = []
df_dict['Strain'] = []
df_dict['Cell'] = []
df_dict['Region'] = []
for i in rat_data['_embedded']['neuronResources']:
    df_dict['NeuronID'].append(str(i['neuron_id']))
    df_dict['Archive'].append(str(i['archive']))
    df_dict['Strain'].append(str(i['strain']))
    df_dict['Cell'].append(str(i['cell_type']))
    df_dict['Region'].append(str(i['brain_region']))

rat_df = DataFrame(df_dict)
print(rat_df)

最后当我检查字典的长度时:

len(rat_df)

输出是 50。

所以我最后发现程序只从第一个(第 0 页)拉出前 50 个神经元。根据开始时的输出,我还剩下 23 页。 我怎样才能将所有这些结果放入一个字典或 class,即有没有办法列出所有这些页面?我尝试了几个循环选项,但没有成功。

如果这是一个简单的问题或者我犯了一些错误,我很抱歉,但在过去的几天里我一直在尝试所有的方法,但我没有得到任何结果。

免责声明:我不是 HTTP 或 Requests 库的专家,也没有使用 neuromorpho.org之前,所以请三思而后行。

您可以使用第一个请求查询页面数,然后循环遍历各个页面。在循环中,您必须将请求的页面作为参数包含在 HTTP GET 方法中,例如?page=42&...,像这样:

url = 'http://neuromorpho.org/api/neuron/select'
params = {
        'page' : 0,
        'q' : 'species:rat',
        'fq' : [
            'brain_region:hippocampus,CA1',
            'experiment_condition:Control',
            'cell_type:Pyramidal,principal cell' ] }

totalPages = requests.get(url, params).json()['page']['totalPages']

df_dict = {
        'NeuronID' : list(),
        'Archive' : list(),
        'Strain' :  list(),
        'Cell' : list(),
        'Region' : list() }

for pageNum in range(totalPages):
    params['page'] = pageNum
    response = requests.get(url, params)
    print('Querying page {} -> status code: {}'.format(
        pageNum, response.status_code))
    if (response.status_code == 200):    #only parse successful requests
        data = response.json()
        for row in data['_embedded']['neuronResources']:
            df_dict['NeuronID'].append(str(row['neuron_id']))
            df_dict['Archive'].append(str(row['archive']))
            df_dict['Strain'].append(str(row['strain']))
            df_dict['Cell'].append(str(row['cell_type']))
            df_dict['Region'].append(str(row['brain_region']))

rat_df = pd.DataFrame(df_dict)
print(rat_df)

您可以在控制台输出中看到结果 DataFrame 以及请求的页码如何变化:

Querying page 0 -> status code: 200
Querying page 1 -> status code: 200
Querying page 2 -> status code: 200
Querying page 3 -> status code: 200
Querying page 4 -> status code: 200
Querying page 5 -> status code: 200
Querying page 6 -> status code: 200
Querying page 7 -> status code: 200
Querying page 8 -> status code: 200
Querying page 9 -> status code: 200
Querying page 10 -> status code: 200
Querying page 11 -> status code: 200
Querying page 12 -> status code: 200
Querying page 13 -> status code: 200
Querying page 14 -> status code: 200
Querying page 15 -> status code: 200
Querying page 16 -> status code: 200
Querying page 17 -> status code: 200
Querying page 18 -> status code: 200
Querying page 19 -> status code: 200
Querying page 20 -> status code: 200
Querying page 21 -> status code: 200
Querying page 22 -> status code: 200
     NeuronID    Archive          Strain                             Cell                          Region
0         100     Turner     Fischer 344  ['pyramidal', 'principal cell']          ['hippocampus', 'CA1']
1         101     Turner     Fischer 344  ['pyramidal', 'principal cell']          ['hippocampus', 'CA1']
2        1016     Ascoli  Sprague-Dawley  ['pyramidal', 'principal cell']                 ['hippocampus']
3        1019     Ascoli  Sprague-Dawley  ['pyramidal', 'principal cell']                 ['hippocampus']
4         102     Turner     Fischer 344  ['pyramidal', 'principal cell']          ['hippocampus', 'CA1']
...       ...        ...             ...                              ...                             ...
1110    99614  Guizzetti  Sprague-Dawley  ['principal cell', 'pyramidal']  ['hippocampus', 'CA1', 'left']
1111    99615  Guizzetti  Sprague-Dawley  ['principal cell', 'pyramidal']  ['hippocampus', 'CA1', 'left']
1112    99616  Guizzetti  Sprague-Dawley  ['principal cell', 'pyramidal']  ['hippocampus', 'CA1', 'left']
1113    99617  Guizzetti  Sprague-Dawley  ['principal cell', 'pyramidal']  ['hippocampus', 'CA1', 'left']
1114    99618  Guizzetti  Sprague-Dawley  ['principal cell', 'pyramidal']  ['hippocampus', 'CA1', 'left']

[1115 rows x 5 columns]

更新 #1:

我通过添加您的代码的修改版本来更改我发布的代码以解析循环中的响应。我认为 neuromorpho.org API 中有一个小错误,因为它对最后一页(第 22 号)的响应是 size: 50,而它JSON 响应中仅包含 15 个(索引 0-14)对象。您可以通过遍历 JSON 对象并忽略报告的大小来避免该问题。

更新#2:

意识到 GET 参数不必在 URL 中编码,但是 Requests 在将它们作为 dict(更新了代码)。

希望对您有所帮助!