使用 spark 独立集群部署 predictionio

deploy predictionio with spark standalone cluster

我用官方的Recommendation作为测试。 我成功完成了这些步骤:

  1. 事件服务器安装在 docker 容器中。(成功)
  2. 配置事件数据、元数据和所有东西都存储在mysql中。(成功)
  3. 在另一个 docker 容器中训练和部署服务器。(成功)
  4. spark独立集群安装在另一个容器中。(成功)
  5. 创建新应用。(成功)
  6. 导入足够的事件数据。(成功)

当我按如下方式训练和部署时,按照文档描述的那样就可以了:

pio train
pio deploy

但是当我使用spark cluster,按如下方式训练和部署时,train没问题(新模型已存储在mysql),但deploy不成功。

pio train -v engine.json -- --master spark://predictionspark:7077 --executor-memory 2G --driver-memory 2G --total-executor-cores 1
pio deploy -v engine.json --feedback --event-server-ip predictionevent --event-server-port 7070 --accesskey Th7k5gE5yEu9ZdTdM6KdAj0InDrLNJQ1U3qEBy7dbMnYgTxWx5ALNAa2hKjqaHSK -- --master spark://predictionspark:7077 --executor-memory 2G --driver-memory 2G --total-executor-cores 1

所有日志:

[INFO] [Runner$] Submission command: /spark/bin/spark-submit --master spark://predictionspark:7077 --executor-memory 2G --driver-memory 2G --total-executor-cores 1 --class org.apache.predictionio.workflow.CreateWorkflow --jars file:/PredictionIO/lib/mysql-connector-java-5.1.46.jar,file:/ebsa/app/cf/target/scala-2.11/template-scala-parallel-recommendation_2.11-0.1-SNAPSHOT.jar,file:/ebsa/app/cf/target/scala-2.11/template-scala-parallel-recommendation-assembly-0.1-SNAPSHOT-deps.jar,file:/PredictionIO/lib/spark/pio-data-elasticsearch-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-hbase-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-jdbc-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-localfs-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-s3-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-hdfs-assembly-0.12.1.jar --files file:/PredictionIO/conf/log4j.properties --driver-class-path /PredictionIO/conf:/PredictionIO/lib/mysql-connector-java-5.1.46.jar --driver-java-options -Dpio.log.dir=/root file:/PredictionIO/lib/pio-assembly-0.12.1.jar --engine-id org.example.recommendation.RecommendationEngine --engine-version 0387c097c02018fa29109a8990b03d163249be00 --engine-variant file:/ebsa/app/cf/engine.json --verbosity 0 --json-extractor Both --env PIO_ENV_LOADED=1,PIO_STORAGE_SOURCES_MYSQL_PASSWORD=***,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/root/.pio_store,PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://***:3306/predictionio,PIO_HOME=/PredictionIO,PIO_FS_ENGINESDIR=/root/.pio_store/engines,PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=MYSQL,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=MYSQL,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_MYSQL_USERNAME=***,PIO_FS_TMPDIR=/root/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=MYSQL,PIO_CONF_DIR=/PredictionIO/conf
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(cf,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @6069ms
[INFO] [Server] jetty-9.3.z-SNAPSHOT
[INFO] [Server] Started @6184ms
[WARN] [Utils] Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
[INFO] [AbstractConnector] Started ServerConnector@2b53840a{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@422ad5e2{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1b3ab4f9{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1c8f6c66{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@151732fb{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@40ed1802{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@feb098f{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@31e739bf{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7f42e06e{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2efd2f21{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@316cda31{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@17d2b075{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@310b2b6f{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@6b5ab2f2{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@6b2dd3df{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@73c48264{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5bcec67e{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7a2fce12{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4bb1b96b{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1f66d8e1{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3421debd{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@68b7d0ef{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@319642db{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@35bfa1bb{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2eda4eeb{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@309dcdf3{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@6a2d867d{/metrics/json,null,AVAILABLE,@Spark}
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: org.example.recommendation.DataSource@5db3d57c
[INFO] [Engine$] Preparator: org.example.recommendation.Preparator@395f52ed
[INFO] [Engine$] AlgorithmList: List(org.example.recommendation.ALSAlgorithm@26e0d39c)
[INFO] [Engine$] Data sanity check is on.
[INFO] [Engine$] org.example.recommendation.TrainingData does not support data sanity check. Skipping check.
[INFO] [Engine$] org.example.recommendation.PreparedData does not support data sanity check. Skipping check.
[WARN] [BLAS] Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
[WARN] [BLAS] Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
[INFO] [Engine$] org.apache.spark.mllib.recommendation.ALSModel does not support data sanity check. Skipping check.
[INFO] [Engine$] EngineWorkflow.train completed
[INFO] [Engine] engineInstanceId=0ac606dc-9959-40f8-9f40-d32354ebf221
[WARN] [TaskSetManager] Stage 1403 contains a task of very large size (1217 KB). The maximum recommended task size is 100 KB.
[WARN] [TaskSetManager] Stage 1404 contains a task of very large size (1767 KB). The maximum recommended task size is 100 KB.
[INFO] [CoreWorkflow$] Inserting persistent model
[INFO] [CoreWorkflow$] Updating engine instance
[INFO] [CoreWorkflow$] Training completed successfully.
[INFO] [AbstractConnector] Stopped Spark@2b53840a{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
[INFO] [Runner$] Submission command: /spark/bin/spark-submit --master spark://predictionspark:7077 --executor-memory 2G --driver-memory 2G --total-executor-cores 1 --class org.apache.predictionio.workflow.CreateServer --jars file:/PredictionIO/lib/mysql-connector-java-5.1.46.jar,file:/ebsa/app/cf/target/scala-2.11/template-scala-parallel-recommendation_2.11-0.1-SNAPSHOT.jar,file:/ebsa/app/cf/target/scala-2.11/template-scala-parallel-recommendation-assembly-0.1-SNAPSHOT-deps.jar,file:/PredictionIO/lib/spark/pio-data-elasticsearch-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-hbase-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-jdbc-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-localfs-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-s3-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-hdfs-assembly-0.12.1.jar --files file:/PredictionIO/conf/log4j.properties --driver-class-path /PredictionIO/conf:/PredictionIO/lib/mysql-connector-java-5.1.46.jar --driver-java-options -Dpio.log.dir=/root file:/PredictionIO/lib/pio-assembly-0.12.1.jar --engineInstanceId 0ac606dc-9959-40f8-9f40-d32354ebf221 --engine-variant file:/ebsa/app/cf/engine.json --ip 0.0.0.0 --port 8000 --event-server-ip predictionevent --event-server-port 7070 --accesskey Th7k5gE5yEu9ZdTdM6KdAj0InDrLNJQ1U3qEBy7dbMnYgTxWx5ALNAa2hKjqaHSK --feedback --json-extractor Both --env PIO_ENV_LOADED=1,PIO_STORAGE_SOURCES_MYSQL_PASSWORD=***,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/root/.pio_store,PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://***:3306/predictionio,PIO_HOME=/PredictionIO,PIO_FS_ENGINESDIR=/root/.pio_store/engines,PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=MYSQL,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=MYSQL,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_MYSQL_USERNAME=***,PIO_FS_TMPDIR=/root/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=MYSQL,PIO_CONF_DIR=/PredictionIO/conf
[WARN] [WorkflowUtils$] Non-empty parameters supplied to org.example.recommendation.Preparator, but its constructor does not accept any arguments. Stubbing with empty parameters.
[WARN] [WorkflowUtils$] Non-empty parameters supplied to org.example.recommendation.Serving, but its constructor does not accept any arguments. Stubbing with empty parameters.
[INFO] [log] Logging initialized @6953ms
[INFO] [Server] jetty-9.3.z-SNAPSHOT
[INFO] [Server] Started @7086ms
[WARN] [Utils] Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
[INFO] [AbstractConnector] Started ServerConnector@d8ed4d9{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@307b5956{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@62d50094{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@8a644df{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5a9054e7{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@15402e55{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@295b1de5{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@e7ac843{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@da15f73{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@506a8fc2{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@fc4cf4d{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@255cef05{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4e8f6bce{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4e6427d4{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5fca5109{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2acbd47f{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@39004878{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@785b7109{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@f0fce80{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@19ab67fc{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@644558a3{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@40fa6a20{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@238b2adb{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4bbba0ce{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3d1e4c06{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@70f8bf47{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@558311ee{/metrics/json,null,AVAILABLE,@Spark}
[INFO] [Engine] Using persisted model
[INFO] [Engine] Custom-persisted model detected for algorithm org.example.recommendation.ALSAlgorithm
[ERROR] [OneForOneStrategy] empty collection

我不知道为什么。

更多: 我没有使用安装在另一个 docker 容器上的独立集群,而是在带有 train&deploy server(@user2906838 mentioned) 的同一容器上启动了一个带有 spark 的本地集群,它成功了。 我不明白为什么会这样。本地spark用不了,很奇怪


更多:

/tmp文件夹,两种不同情况,文件大小不同

success image failed image

更多:

很有趣。我在 spark-worker 容器中找到了模型数据。

spark-worker image

step-deploy 取决于写入 file://tmp 或 hdfs://tmp 的 step-train 数据。