IllegalArgumentException,从 Spark (Scala) 将 ML 模型写入 s3 时错误的 FS
IllegalArgumentException, Wrong FS when writing ML model to s3 from Spark (Scala)
我创建了一个模型:
val model = pipeline.fit(commentLower)
我正在尝试将其写入 s3:
sc.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "MYACCESSKEY")
sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "MYSECRETKEY")
model.write.overwrite().save("s3n://sparkstore/model")
但我收到此错误:
Name: java.lang.IllegalArgumentException
Message: Wrong FS: s3n://sparkstore/model, expected: file:///
StackTrace: org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:529)
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:80)
我也尝试使用我的内联访问密钥:
model.write.overwrite().save("s3n://MYACCESSKEY:MYSECRETKEY@/sparkstore/model")
如何从 Spark 向 s3 写入模型(或与此相关的任何文件)?
我没有要测试的 S3 连接。
但这是我的想法,你应该使用:-
val hconf=sc.hadoopConfiguration
hconf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hconf.set("fs.s3.awsAccessKeyId", "MYACCESSKEY")
hconf.set("fs.s3.awsSecretAccessKey", "MYSECRETKEY")
当我这样做时 df.write.save("s3://sparkstore/model")
我得到 Name: org.apache.hadoop.fs.s3.S3Exception
Message: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/model' - ResponseCode=403, ResponseMessage=Forbidden
StackTrace: org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleServiceException(Jets3tNativeFileSystemStore.java:229)
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:111)
s
这让我相信它确实识别了 s3 fs 的 s3 协议。
但是很明显,它没有通过身份验证。
希望它能解决您的问题。
谢谢,
查尔斯.
这不是我想要做的,但我发现了一个有类似问题的类似线程:
How to save models from ML Pipeline to S3 or HDFS?
这就是我最后做的事情:
sc.parallelize(Seq(model), 1).saveAsObjectFile("swift://RossL.keystone/model")
val modelx = sc.objectFile[PipelineModel]("swift://RossL.keystone/model").first()
我创建了一个模型:
val model = pipeline.fit(commentLower)
我正在尝试将其写入 s3:
sc.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "MYACCESSKEY")
sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "MYSECRETKEY")
model.write.overwrite().save("s3n://sparkstore/model")
但我收到此错误:
Name: java.lang.IllegalArgumentException
Message: Wrong FS: s3n://sparkstore/model, expected: file:///
StackTrace: org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:529)
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:80)
我也尝试使用我的内联访问密钥:
model.write.overwrite().save("s3n://MYACCESSKEY:MYSECRETKEY@/sparkstore/model")
如何从 Spark 向 s3 写入模型(或与此相关的任何文件)?
我没有要测试的 S3 连接。 但这是我的想法,你应该使用:-
val hconf=sc.hadoopConfiguration
hconf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hconf.set("fs.s3.awsAccessKeyId", "MYACCESSKEY")
hconf.set("fs.s3.awsSecretAccessKey", "MYSECRETKEY")
当我这样做时 df.write.save("s3://sparkstore/model")
我得到 Name: org.apache.hadoop.fs.s3.S3Exception
Message: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/model' - ResponseCode=403, ResponseMessage=Forbidden
StackTrace: org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleServiceException(Jets3tNativeFileSystemStore.java:229)
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:111)
s
这让我相信它确实识别了 s3 fs 的 s3 协议。 但是很明显,它没有通过身份验证。
希望它能解决您的问题。
谢谢, 查尔斯.
这不是我想要做的,但我发现了一个有类似问题的类似线程:
How to save models from ML Pipeline to S3 or HDFS?
这就是我最后做的事情:
sc.parallelize(Seq(model), 1).saveAsObjectFile("swift://RossL.keystone/model")
val modelx = sc.objectFile[PipelineModel]("swift://RossL.keystone/model").first()