如何通过 API 调用解析数据集的完整记录集?
How to parse the complete set of records for a dataset through an API call?
如何通过 foundry API 调用获取完整的数据集记录?
我想在 Foundry 之外的另一个 Python 应用程序中使用数据集,并且只使用 requests
前 300 行记录。
我的 requests
API 终点是使用轮廓 dataset-preview
.
在 Foundry 中查询数据集有不同的可能性,具体取决于数据集大小和用例。
可能最容易开始的是数据代理查询 sql,因为您不必担心数据集的基础文件格式。
import requests
import pandas as pd
def query_foundry_sql(query, token, branch='master', base_url='https://foundry-instance.com') -> (list, list):
"""
Queries the dataproxy query API with spark SQL.
Example: query_foundry_sql("SELECT * FROM `/path/to/dataset` Limit 5000", "ey...")
Args:
query: the sql query
branch: the branch of the dataset / query
Returns: (columns, data) tuple. data contains the data matrix, columns the list of columns
Can be converted to a pandas Dataframe:
pd.DataFrame(data, columns)
"""
response = requests.post(f"{base_url}/foundry-data-proxy/api/dataproxy/queryWithFallbacks",
headers={'Authorization': f'Bearer {token}'},
params={'fallbackBranchIds': [branch]},
json={'query': query})
response.raise_for_status()
json = response.json()
columns = [e['name'] for e in json['foundrySchema']['fieldSchemaList']]
return columns, json['rows']
columns, data = query_foundry_sql("SELECT * FROM `/Global/Foundry Operations/Foundry Support/iris` Limit 5000",
"ey...",
base_url="https://foundry-instance.com")
df = pd.DataFrame(data=data, columns=columns)
df.head(5)
如何通过 foundry API 调用获取完整的数据集记录?
我想在 Foundry 之外的另一个 Python 应用程序中使用数据集,并且只使用 requests
前 300 行记录。
我的 requests
API 终点是使用轮廓 dataset-preview
.
在 Foundry 中查询数据集有不同的可能性,具体取决于数据集大小和用例。 可能最容易开始的是数据代理查询 sql,因为您不必担心数据集的基础文件格式。
import requests
import pandas as pd
def query_foundry_sql(query, token, branch='master', base_url='https://foundry-instance.com') -> (list, list):
"""
Queries the dataproxy query API with spark SQL.
Example: query_foundry_sql("SELECT * FROM `/path/to/dataset` Limit 5000", "ey...")
Args:
query: the sql query
branch: the branch of the dataset / query
Returns: (columns, data) tuple. data contains the data matrix, columns the list of columns
Can be converted to a pandas Dataframe:
pd.DataFrame(data, columns)
"""
response = requests.post(f"{base_url}/foundry-data-proxy/api/dataproxy/queryWithFallbacks",
headers={'Authorization': f'Bearer {token}'},
params={'fallbackBranchIds': [branch]},
json={'query': query})
response.raise_for_status()
json = response.json()
columns = [e['name'] for e in json['foundrySchema']['fieldSchemaList']]
return columns, json['rows']
columns, data = query_foundry_sql("SELECT * FROM `/Global/Foundry Operations/Foundry Support/iris` Limit 5000",
"ey...",
base_url="https://foundry-instance.com")
df = pd.DataFrame(data=data, columns=columns)
df.head(5)