ibm_boto3 Mac OS 上与 scikit-learn 的兼容性问题

Question

我有一个使用 scikit-learn 的 Python 3.6 应用程序，已部署到 IBM Cloud (Cloud Foundry)。它工作正常。我本地的开发环境是 Mac OS High Sierra.

最近，我向应用程序添加了 IBM Cloud 对象存储功能 (ibm_boto3)。 COS 功能本身工作正常。我可以使用 ibm_boto3 库上传、下载、列出和删除对象。

奇怪的是，应用程序中使用 scikit-learn 的部分现在冻结了。

如果我注释掉 ibm_boto3 import 语句（和相应的代码），scikit-learn 代码可以正常工作。

更令人费解的是，这个问题只发生在本地开发机器运行 OS X 上。当应用程序部署到 IBM Cloud 时，它工作正常——scikit-learn 和ibm_boto3 并肩工作。

此时我们唯一的假设是 ibm_boto3 库以某种方式在 scikit-learn 中发现了一个已知问题（请参阅 this -- K-means 算法的并行版本已损坏当 numpy 在 OS X 上使用加速器时）。请注意，只有在将 ibm_boto3 添加到项目后，我们才会遇到此问题。

但是，在部署到 IBM Cloud 之前，我们需要能够在本地主机上进行测试。在 Mac OS 上 ibm_boto3 和 scikit-learn 之间是否存在任何已知的兼容性问题？

关于我们如何在开发机器上避免这种情况的任何建议？

干杯。

Answer 1

到目前为止，还没有任何已知的兼容性问题。 :)

在某些时候，OSX 附带的 vanilla SSL 库存在一些问题，但如果您能够读写数据，那不是问题。

您在使用 HMAC credentials 吗？如果是这样，我很好奇如果您使用原始 boto3 库而不是 IBM 分支，这种行为是否会继续。

这里有一个简单的示例，展示了如何将 pandas 与原始 boto3 一起使用：

import boto3  # package used to connect to IBM COS using the S3 API
import io  # python package used to stream data
import pandas as pd  # lightweight data analysis package

access_key = '<access key>'
secret_key = '<secret key>'
pub_endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
pvt_endpoint = 'https://s3-api.us-geo.objectstorage.service.networklayer.com'
bucket = 'demo'  # the bucket holding the objects being worked on.
object_key = 'demo-data'  # the name of the data object being analyzed.
result_key = 'demo-data-results'  # the name of the output data object.


# First, we need to open a session and create a client that can connect to IBM COS.
# This client needs to know where to connect, the credentials to use,
# and what signature protocol to use for authentication. The endpoint
# can be specified to be public or private.
cos = boto3.client('s3', endpoint_url=pub_endpoint,
                   aws_access_key_id=access_key,
                   aws_secret_access_key=secret_key,
                   region_name='us',
                   config=boto3.session.Config(signature_version='s3v4'))

# Since we've already uploaded the dataset to be worked on into cloud storage,
# now we just need to identify which object we want to use. This creates a JSON
# representation of request's response headers.
obj = cos.get_object(Bucket=bucket, Key=object_key)

# Now, because this is all REST API based, the actual contents of the file are
# transported in the request body, so we need to identify where to find the
# data stream containing the actual CSV file we want to analyze.
data = obj['Body'].read()

# Now we can read that data stream into a pandas dataframe.
df = pd.read_csv(io.BytesIO(data))

# This is just a trivial example, but we'll take that dataframe and just
# create a JSON document that contains the mean values for each column.
output = df.mean(axis=0, numeric_only=True).to_json()

# Now we can write that JSON file to COS as a new object in the same bucket.
cos.put_object(Bucket=bucket, Key=result_key, Body=output)

ibm_boto3 Mac OS 上与 scikit-learn 的兼容性问题

ibm_boto3 compatibility issue with scikit-learn on Mac OS

python

scikit-learn

object-storage

ibm-cloud-storage