如何在数据科学体验项目中创建到 Bluemix 上的对象存储的连接?
How do I create a connection to Object Storage on Bluemix in a Data Science Experience Project?
我正在尝试为一个与项目创建的默认项目不同的项目设置与 Bluemix 对象存储的连接。这是一个问题,因为:
1) 当我去添加新连接时,我要使用的对象存储实例不在数据服务中。
2) 当我去添加一个 Softlayer 对象存储时,我被要求提供的凭据是(登录 URL、访问密钥和秘密密钥),但是我的实例的凭据是("auth_url":"project":"projectId":"region":"userId":"username":"password":"domainId":"domainName":"role")
3) 我有一个很好的占位符对象存储接口,但我想用另一个实例替换它。
请帮助我访问不同的 Bluemix Object Storage 实例中的数据,而不是默认附加到项目的实例。
您可以使用 insert to code
功能生成的函数并插入来自其他对象存储的凭据。例如:
from io import StringIO
import requests
import json
import pandas as pd
# @hidden_cell
# This function accesses a file in your Object Storage. The definition contains your credentials.
# You might want to remove those credentials before you share your notebook.
def get_object_storage_file_with_credentials(container, filename):
"""This functions returns a StringIO object containing
the file content from Bluemix Object Storage."""
url1 = ''.join(['https://identity.open.softlayer.com', '/v3/auth/tokens'])
data = {'auth': {'identity': {'methods': ['password'],
'password': {'user': {'name': 'admin_xxxx','domain': {'id': 'xxxxxxxxxxx'},
'password': 'xxxxxxxxxx'}}}}}
headers1 = {'Content-Type': 'application/json'}
resp1 = requests.post(url=url1, data=json.dumps(data), headers=headers1)
resp1_body = resp1.json()
for e1 in resp1_body['token']['catalog']:
if(e1['type']=='object-store'):
for e2 in e1['endpoints']:
if(e2['interface']=='public'and e2['region']=='dallas'):
url2 = ''.join([e2['url'],'/', container, '/', filename])
s_subject_token = resp1.headers['x-subject-token']
headers2 = {'X-Auth-Token': s_subject_token, 'accept': 'application/json'}
resp2 = requests.get(url=url2, headers=headers2)
return StringIO(resp2.text)
在这里,从您的下一个 Bluemix Object Store Credentials 中替换 user name
、domain id
和 password
的值。之后,您可以通过以下方式简单地从该对象存储中的容器访问文件:
cars_df = pd.read_csv(get_object_storage_file_with_credentials('<containerName>', '<filename>.csv'))
cars_df.head()
在 R 中:
## Credentials and libraries to write to object storage
## Install necessary library
install_github('IBMDataScience/objectStoreR')
library('objectStoreR')
## Provide Credentials (fill in with your details from Bluemix)
credentials <-list(auth_url = "https://identity.open.softlayer.com",
project = "object_storage_d7a568f8_ac53_4bc4_8834_f0e9962068f9",
project_id = "e0c826f12030487493z2df3957621744",
region = "dallas",
user_id = "694102a676ef4252u19492c45fbebc4b",
domain_id = "47ea410d2b51478d9f119fade708fbefe4",
domain_name = "1004827",
username = "admin_9c5c874ed726b5a41c7bb4f8b55f45e3e2c35778",
password = "Tj^d9rZoDhy5eb]U",
container = "mycontainer",
filename = "myfile.csv")
要将文件写出到对象存储:
## Status '201' is a successful signal
write.csv(outputDF,'myOutputFile.csv', row.names = F)
status <- objectStore.put(credentials,'myOutputFile.csv')
paste("Status for final output CSV:", status, sep = " ")
同样,要保存模型对象(请注意,您必须更改凭证列表中的文件名或创建第二个凭证变量):
saveRDS(object = finalMod, file = "myModel.rds")
status <- objectStore.put(credentials, "myModel.rds")
paste("Status for model object:", status, sep = " ")
希望对您有所帮助!
除了@Sumit Goyal 回答的内容。
您需要在本地 gpfs 中下载文件,以便使用不支持从 swift 对象存储读取或换句话说仅支持从本地 storage/file 系统读取的 api 或库。
objStorCred = {
"auth_url": "https://identity.open.softlayer.com",
"project": "object_storage_XXXXX",
"projectId": "XXXXX5a3",
"region": "dallas",
"userId": "XXXXXX98a15e0",
"username": "admin_fXXXXX9",
"password": "XXXXX",
"domainId": "aXXXX5a",
"domainName": "XXXX",
"role": "admin"
}
from io import StringIO
import requests
import json
import pandas as pd
# @hidden_cell
# This function accesses a file in your Object Storage. The definition contains your credentials.
# You might want to remove those credentials before you share your notebook.
def get_object_storage_file(container, filename):
"""This functions returns a StringIO object containing
the file content from Bluemix Object Storage."""
url1 = ''.join(['https://identity.open.softlayer.com', '/v3/auth/tokens'])
data = {'auth': {'identity': {'methods': ['password'],
'password': {'user': {'name': objStorCred['username'],'domain': {'id': objStorCred['domainId']},
'password': objStorCred['password']}}}}}
headers1 = {'Content-Type': 'application/json'}
resp1 = requests.post(url=url1, data=json.dumps(data), headers=headers1)
resp1_body = resp1.json()
for e1 in resp1_body['token']['catalog']:
if(e1['type']=='object-store'):
for e2 in e1['endpoints']:
if(e2['interface']=='public'and e2['region']=='dallas'):
url2 = ''.join([e2['url'],'/', container, '/', filename])
s_subject_token = resp1.headers['x-subject-token']
headers2 = {'X-Auth-Token': s_subject_token, 'accept': 'application/json'}
resp2 = requests.get(url=url2, headers=headers2)
return resp2
请注意,我们获取的不是 stringIO 对象,而是响应对象。
现在您可以使用中间本地存储来存储 .mat 文件。
然后调用这个函数。
r = get_object_storage_file("containerr1", "example.mat")
with open('example.mat', 'wb') as file:
file.write(r.content)
现在使用 h5py 读取文件。
您可能需要使用 pip install h5py 安装 h5py。
import h5py
f = h5py.File('example.mat')
f.keys()
谢谢,
查尔斯.
我强烈建议您查看 https://github.com/ibm-cds-labs/ibmos2spark(适用于 Python、R 和 Scala)。
对于 Python + SoftLayer 凭据,它具体是此代码:
slos = ibmos2spark.softlayer(sc, configuration_name, auth_url, tenant, username, password)
data = sc.textFile(slos.url(container_name, object_name))
(取自https://github.com/ibm-cds-labs/ibmos2spark/tree/master/python#softlayer)
下一个问题是如何加载 .mat 文件 - 这似乎绕过了 Read .mat files in Python 并首先使用 "sc.binaryFiles()" 将它们存入内存。
我正在尝试为一个与项目创建的默认项目不同的项目设置与 Bluemix 对象存储的连接。这是一个问题,因为:
1) 当我去添加新连接时,我要使用的对象存储实例不在数据服务中。
2) 当我去添加一个 Softlayer 对象存储时,我被要求提供的凭据是(登录 URL、访问密钥和秘密密钥),但是我的实例的凭据是("auth_url":"project":"projectId":"region":"userId":"username":"password":"domainId":"domainName":"role")
3) 我有一个很好的占位符对象存储接口,但我想用另一个实例替换它。
请帮助我访问不同的 Bluemix Object Storage 实例中的数据,而不是默认附加到项目的实例。
您可以使用 insert to code
功能生成的函数并插入来自其他对象存储的凭据。例如:
from io import StringIO
import requests
import json
import pandas as pd
# @hidden_cell
# This function accesses a file in your Object Storage. The definition contains your credentials.
# You might want to remove those credentials before you share your notebook.
def get_object_storage_file_with_credentials(container, filename):
"""This functions returns a StringIO object containing
the file content from Bluemix Object Storage."""
url1 = ''.join(['https://identity.open.softlayer.com', '/v3/auth/tokens'])
data = {'auth': {'identity': {'methods': ['password'],
'password': {'user': {'name': 'admin_xxxx','domain': {'id': 'xxxxxxxxxxx'},
'password': 'xxxxxxxxxx'}}}}}
headers1 = {'Content-Type': 'application/json'}
resp1 = requests.post(url=url1, data=json.dumps(data), headers=headers1)
resp1_body = resp1.json()
for e1 in resp1_body['token']['catalog']:
if(e1['type']=='object-store'):
for e2 in e1['endpoints']:
if(e2['interface']=='public'and e2['region']=='dallas'):
url2 = ''.join([e2['url'],'/', container, '/', filename])
s_subject_token = resp1.headers['x-subject-token']
headers2 = {'X-Auth-Token': s_subject_token, 'accept': 'application/json'}
resp2 = requests.get(url=url2, headers=headers2)
return StringIO(resp2.text)
在这里,从您的下一个 Bluemix Object Store Credentials 中替换 user name
、domain id
和 password
的值。之后,您可以通过以下方式简单地从该对象存储中的容器访问文件:
cars_df = pd.read_csv(get_object_storage_file_with_credentials('<containerName>', '<filename>.csv'))
cars_df.head()
在 R 中:
## Credentials and libraries to write to object storage
## Install necessary library
install_github('IBMDataScience/objectStoreR')
library('objectStoreR')
## Provide Credentials (fill in with your details from Bluemix)
credentials <-list(auth_url = "https://identity.open.softlayer.com",
project = "object_storage_d7a568f8_ac53_4bc4_8834_f0e9962068f9",
project_id = "e0c826f12030487493z2df3957621744",
region = "dallas",
user_id = "694102a676ef4252u19492c45fbebc4b",
domain_id = "47ea410d2b51478d9f119fade708fbefe4",
domain_name = "1004827",
username = "admin_9c5c874ed726b5a41c7bb4f8b55f45e3e2c35778",
password = "Tj^d9rZoDhy5eb]U",
container = "mycontainer",
filename = "myfile.csv")
要将文件写出到对象存储:
## Status '201' is a successful signal
write.csv(outputDF,'myOutputFile.csv', row.names = F)
status <- objectStore.put(credentials,'myOutputFile.csv')
paste("Status for final output CSV:", status, sep = " ")
同样,要保存模型对象(请注意,您必须更改凭证列表中的文件名或创建第二个凭证变量):
saveRDS(object = finalMod, file = "myModel.rds")
status <- objectStore.put(credentials, "myModel.rds")
paste("Status for model object:", status, sep = " ")
希望对您有所帮助!
除了@Sumit Goyal 回答的内容。 您需要在本地 gpfs 中下载文件,以便使用不支持从 swift 对象存储读取或换句话说仅支持从本地 storage/file 系统读取的 api 或库。
objStorCred = {
"auth_url": "https://identity.open.softlayer.com",
"project": "object_storage_XXXXX",
"projectId": "XXXXX5a3",
"region": "dallas",
"userId": "XXXXXX98a15e0",
"username": "admin_fXXXXX9",
"password": "XXXXX",
"domainId": "aXXXX5a",
"domainName": "XXXX",
"role": "admin"
}
from io import StringIO
import requests
import json
import pandas as pd
# @hidden_cell
# This function accesses a file in your Object Storage. The definition contains your credentials.
# You might want to remove those credentials before you share your notebook.
def get_object_storage_file(container, filename):
"""This functions returns a StringIO object containing
the file content from Bluemix Object Storage."""
url1 = ''.join(['https://identity.open.softlayer.com', '/v3/auth/tokens'])
data = {'auth': {'identity': {'methods': ['password'],
'password': {'user': {'name': objStorCred['username'],'domain': {'id': objStorCred['domainId']},
'password': objStorCred['password']}}}}}
headers1 = {'Content-Type': 'application/json'}
resp1 = requests.post(url=url1, data=json.dumps(data), headers=headers1)
resp1_body = resp1.json()
for e1 in resp1_body['token']['catalog']:
if(e1['type']=='object-store'):
for e2 in e1['endpoints']:
if(e2['interface']=='public'and e2['region']=='dallas'):
url2 = ''.join([e2['url'],'/', container, '/', filename])
s_subject_token = resp1.headers['x-subject-token']
headers2 = {'X-Auth-Token': s_subject_token, 'accept': 'application/json'}
resp2 = requests.get(url=url2, headers=headers2)
return resp2
请注意,我们获取的不是 stringIO 对象,而是响应对象。
现在您可以使用中间本地存储来存储 .mat 文件。
然后调用这个函数。
r = get_object_storage_file("containerr1", "example.mat")
with open('example.mat', 'wb') as file:
file.write(r.content)
现在使用 h5py 读取文件。 您可能需要使用 pip install h5py 安装 h5py。
import h5py
f = h5py.File('example.mat')
f.keys()
谢谢, 查尔斯.
我强烈建议您查看 https://github.com/ibm-cds-labs/ibmos2spark(适用于 Python、R 和 Scala)。
对于 Python + SoftLayer 凭据,它具体是此代码:
slos = ibmos2spark.softlayer(sc, configuration_name, auth_url, tenant, username, password) data = sc.textFile(slos.url(container_name, object_name))
(取自https://github.com/ibm-cds-labs/ibmos2spark/tree/master/python#softlayer)
下一个问题是如何加载 .mat 文件 - 这似乎绕过了 Read .mat files in Python 并首先使用 "sc.binaryFiles()" 将它们存入内存。