如何在数据科学体验项目中创建到 Bluemix 上的对象存储的连接?

How do I create a connection to Object Storage on Bluemix in a Data Science Experience Project?

我正在尝试为一个与项目创建的默认项目不同的项目设置与 Bluemix 对象存储的连接。这是一个问题,因为:

1) 当我去添加新连接时,我要使用的对象存储实例不在数据服务中。

2) 当我去添加一个 Softlayer 对象存储时,我被要求提供的凭据是(登录 URL、访问密钥和秘密密钥),但是我的实例的凭据是("auth_url":"project":"projectId":"region":"userId":"username":"password":"domainId":"domainName":"role")

3) 我有一个很好的占位符对象存储接口,但我想用另一个实例替换它。

请帮助我访问不同的 Bluemix Object Storage 实例中的数据,而不是默认附加到项目的实例。

您可以使用 insert to code 功能生成的函数并插入来自其他对象存储的凭据。例如:

from io import StringIO
import requests
import json
import pandas as pd

# @hidden_cell
# This function accesses a file in your Object Storage. The definition contains your credentials.
# You might want to remove those credentials before you share your notebook.
def get_object_storage_file_with_credentials(container, filename):
"""This functions returns a StringIO object containing
the file content from Bluemix Object Storage."""

url1 = ''.join(['https://identity.open.softlayer.com', '/v3/auth/tokens'])
data = {'auth': {'identity': {'methods': ['password'],
        'password': {'user': {'name': 'admin_xxxx','domain': {'id': 'xxxxxxxxxxx'},
        'password': 'xxxxxxxxxx'}}}}}
headers1 = {'Content-Type': 'application/json'}
resp1 = requests.post(url=url1, data=json.dumps(data), headers=headers1)
resp1_body = resp1.json()
for e1 in resp1_body['token']['catalog']:
    if(e1['type']=='object-store'):
        for e2 in e1['endpoints']:
                    if(e2['interface']=='public'and e2['region']=='dallas'):
                        url2 = ''.join([e2['url'],'/', container, '/', filename])
s_subject_token = resp1.headers['x-subject-token']
headers2 = {'X-Auth-Token': s_subject_token, 'accept': 'application/json'}
resp2 = requests.get(url=url2, headers=headers2)
return StringIO(resp2.text)

在这里,从您的下一个 Bluemix Object Store Credentials 中替换 user namedomain idpassword 的值。之后,您可以通过以下方式简单地从该对象存储中的容器访问文件:

cars_df = pd.read_csv(get_object_storage_file_with_credentials('<containerName>', '<filename>.csv'))
cars_df.head()

在 R 中:

## Credentials and libraries to write to object storage

## Install necessary library
install_github('IBMDataScience/objectStoreR')
library('objectStoreR')  

## Provide Credentials (fill in with your details from Bluemix)
credentials <-list(auth_url = "https://identity.open.softlayer.com",
         project = "object_storage_d7a568f8_ac53_4bc4_8834_f0e9962068f9",
         project_id = "e0c826f12030487493z2df3957621744",
         region = "dallas",
         user_id = "694102a676ef4252u19492c45fbebc4b",
         domain_id = "47ea410d2b51478d9f119fade708fbefe4",
         domain_name =  "1004827",
         username = "admin_9c5c874ed726b5a41c7bb4f8b55f45e3e2c35778",
         password = "Tj^d9rZoDhy5eb]U",
         container = "mycontainer", 
         filename = "myfile.csv")

要将文件写出到对象存储:

## Status '201' is a successful signal
write.csv(outputDF,'myOutputFile.csv', row.names = F)
status <- objectStore.put(credentials,'myOutputFile.csv')
paste("Status for final output CSV:", status, sep = " ")

同样,要保存模型对象(请注意,您必须更改凭证列表中的文件名或创建第二个凭证变量):

saveRDS(object = finalMod, file = "myModel.rds")
status <- objectStore.put(credentials, "myModel.rds")
paste("Status for model object:", status, sep = " ")

希望对您有所帮助!

除了@Sumit Goyal 回答的内容。 您需要在本地 gpfs​​ 中下载文件,以便使用不支持从 swift 对象存储读取或换句话说仅支持从本地 storage/file 系统读取的 api 或库。

objStorCred = { "auth_url": "https://identity.open.softlayer.com", "project": "object_storage_XXXXX", "projectId": "XXXXX5a3", "region": "dallas", "userId": "XXXXXX98a15e0", "username": "admin_fXXXXX9", "password": "XXXXX", "domainId": "aXXXX5a", "domainName": "XXXX", "role": "admin" }

from io import StringIO import requests import json import pandas as pd

# @hidden_cell

# This function accesses a file in your Object Storage. The definition contains your credentials.

# You might want to remove those credentials before you share your notebook.

def get_object_storage_file(container, filename):

"""This functions returns a StringIO object containing the file content from Bluemix Object Storage."""

url1 = ''.join(['https://identity.open.softlayer.com', '/v3/auth/tokens'])
data = {'auth': {'identity': {'methods': ['password'],
        'password': {'user': {'name': objStorCred['username'],'domain': {'id': objStorCred['domainId']},
        'password': objStorCred['password']}}}}}
headers1 = {'Content-Type': 'application/json'}
resp1 = requests.post(url=url1, data=json.dumps(data), headers=headers1)
resp1_body = resp1.json()
for e1 in resp1_body['token']['catalog']:
    if(e1['type']=='object-store'):
        for e2 in e1['endpoints']:
                    if(e2['interface']=='public'and e2['region']=='dallas'):
                        url2 = ''.join([e2['url'],'/', container, '/', filename])
s_subject_token = resp1.headers['x-subject-token']
headers2 = {'X-Auth-Token': s_subject_token, 'accept': 'application/json'}
resp2 = requests.get(url=url2, headers=headers2)
return resp2

请注意,我们获取的不是 stringIO 对象,而是响应对象。

现在您可以使用中间本地存储来存储 .mat 文件。

然后调用这个函数。

r = get_object_storage_file("containerr1", "example.mat")

with open('example.mat', 'wb') as file:  
file.write(r.content)

现在使用 h5py 读取文件。 您可能需要使用 pip install h5py 安装 h5py。

import h5py

f = h5py.File('example.mat') f.keys()

谢谢, 查尔斯.

我强烈建议您查看 https://github.com/ibm-cds-labs/ibmos2spark(适用于 Python、R 和 Scala)。

对于 Python + SoftLayer 凭据,它具体是此代码:

slos = ibmos2spark.softlayer(sc, configuration_name, auth_url, tenant, username, password) data = sc.textFile(slos.url(container_name, object_name))

(取自https://github.com/ibm-cds-labs/ibmos2spark/tree/master/python#softlayer

下一个问题是如何加载 .mat 文件 - 这似乎绕过了 Read .mat files in Python 并首先使用 "sc.binaryFiles()" 将它们存入内存。