预测io pio train说appId不存在

Prediction io pio train says appId does not exist

我正在试用最新版本的 prediction.io(版本 0.9.1)。我已经按照本页中的教程安装了预测 io 及其依赖项:http://docs.prediction.io/install/install-linux/

我已将 predictionio/bin 目录的路径添加到我的 .bashrc 文件中,这样我就可以从我的终端使用命令行工具:

export PATH=$PATH:/home/wern/PredictionIO-0.9.1/bin
export JAVA_HOME="/usr/lib/jvm/java-8-oracle"

我在执行 pio-start-all 时得到以下信息:

Starting Elasticsearch...
Starting HBase...
starting master, logging to /home/wern/hbase-0.98.11-hadoop2/bin/../logs/hbase-me-master-mycomputer.out
Waiting 10 seconds for HBase to fully initialize...
Starting PredictionIO Event Server...

正在执行 java -version returns 以下内容:

java version "1.8.0_40"
Java(TM) SE Runtime Environment (build 1.8.0_40-b25)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)

正在执行 pio status returns 以下内容:

PredictionIO
  Installed at: /home/me/PredictionIO-0.9.1
  Version: 0.9.1

Apache Spark
  Installed at: /home/wern/spark-1.2.1-bin-hadoop2.4
  Version: 1.2.1 (meets minimum requirement of 1.2.0)

Storage Backend Connections
  Verifying Meta Data Backend
  Verifying Model Data Backend
  Verifying Event Data Backend
[WARN] [NativeCodeLoader] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  Test write Event Store (App Id 0)
[INFO] [HBLEvents] The table predictionio_eventdata:events_0 doesn't exist yet. Creating now...
[INFO] [HBLEvents] Removing table predictionio_eventdata:events_0...

(sleeping 5 seconds for all messages to show up...)
Your system is all ready to go.

接下来我得到一个通用模板。我从主目录执行了这个命令,所以我在完成后得到了一个 RecommendationApp 目录:

pio template get PredictionIO/template-scala-parallel-recommendation RecommendationApp

接下来我创建了一个新的预测 io 应用程序:

pio app new MyGenericRecommendationApp

这个 returns 以下:

[WARN] [NativeCodeLoader] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[INFO] [HBLEvents] The table predictionio_eventdata:events_3 doesn't exist yet. Creating now...
[INFO] [App$] Initialized Event Store for this app ID: 3.
[INFO] [App$] Created new app:
[INFO] [App$]       Name: MyGenericRecommendationApp
[INFO] [App$]         ID: 3
[INFO] [App$] Access Key: C7vfcipXd0baQcZYzqr73EwSPT2Bd0YW1OTLgEdlUA9FOeBja6dyBVIKaYnQbsUO

接下来我导航到 RecommendationApp 引擎目录并下载示例数据:

curl https://raw.githubusercontent.com/apache/spark/master/data/mllib/sample_movielens_data.txt --create-dirs -o data/sample_movielens_data.txt

然后我使用 python:

导入它
python data/import_eventserver.py --access_key C7vfcipXd0baQcZYzqr73EwSPT2Bd0YW1OTLgEdlUA9FOeBja6dyBVIKaYnQbsUO

数据导入成功。

接下来我更新了 engine.json 文件以匹配我之前创建的应用程序的 ID。

  "datasource": {
    "params" : {
      "appId": 3
    }
  },

然后我执行了pio build。这花了一段时间,但最终返回了以下内容:

[INFO] [Console$] Your engine is ready for training.

我的问题终于来了。执行 pio train 结果如下:

[INFO] [Console$] Using existing engine manifest JSON at /home/wern/RecommendationApp/manifest.json
[WARN] [NativeCodeLoader] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[INFO] [RunWorkflow$] Submission command: /home/wern/spark-1.2.1-bin-hadoop2.4/bin/spark-submit --class io.prediction.workflow.CreateWorkflow --name PredictionIO Training: RTn3BZbRfxOlOkDQCHBmOaMBHTP1gmOg 92c46ac3197f8bf4696281a1f76eaaa943495d3f () --jars file:/home/wern/.pio_store/engines/RTn3BZbRfxOlOkDQCHBmOaMBHTP1gmOg/92c46ac3197f8bf4696281a1f76eaaa943495d3f/template-scala-parallel-recommendation-assembly-0.1-SNAPSHOT-deps.jar,file:/home/wern/.pio_store/engines/RTn3BZbRfxOlOkDQCHBmOaMBHTP1gmOg/92c46ac3197f8bf4696281a1f76eaaa943495d3f/template-scala-parallel-recommendation_2.10-0.1-SNAPSHOT.jar --files /home/wern/PredictionIO-0.9.1/conf/log4j.properties,/home/wern/PredictionIO-0.9.1/conf/hbase-site.xml --driver-class-path /home/wern/PredictionIO-0.9.1/conf:/home/wern/PredictionIO-0.9.1/conf /home/wern/PredictionIO-0.9.1/lib/pio-assembly-0.9.1.jar --env PIO_STORAGE_SOURCES_HBASE_TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_SOURCES_HBASE_HOSTS=0,PIO_STORAGE_REPOSITORIES_METADATA_NAME=predictionio_metadata,PIO_FS_BASEDIR=/home/wern/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost,PIO_STORAGE_SOURCES_HBASE_HOME=/home/wern/hbase-0.98.11-hadoop2,PIO_HOME=/home/wern/PredictionIO-0.9.1,PIO_FS_ENGINESDIR=/home/wern/.pio_store/engines,PIO_STORAGE_SOURCES_HBASE_PORTS=0,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=predictionio_eventdata,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/home/wern/elasticsearch-1.4.4,PIO_FS_TMPDIR=/home/wern/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_,PIO_STORAGE_SOURCES_LOCALFS_HOSTS=/home/wern/.pio_store/models,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_CONF_DIR=/home/wern/PredictionIO-0.9.1/conf,PIO_STORAGE_SOURCES_LOCALFS_PORTS=0,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs --engine-id RTn3BZbRfxOlOkDQCHBmOaMBHTP1gmOg --engine-version 92c46ac3197f8bf4696281a1f76eaaa943495d3f --engine-variant /home/wern/RecommendationApp/engine.json --verbosity 0
Spark assembly has been built with Hive, including Datanucleus jars on classpath
[WARN] [NativeCodeLoader] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(3))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[WARN] [Utils] Your hostname, fraukojiro resolves to a loopback address: 127.0.1.1; using 192.168.254.105 instead (on interface wlan0)
[WARN] [Utils] Set SPARK_LOCAL_IP if you need to bind to another address
[INFO] [Remoting] Starting remoting
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.254.105:37397]
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.wern.DataSource@653fb8d1
[INFO] [Engine$] Preparator: com.wern.Preparator@93501be
[INFO] [Engine$] AlgorithmList: List(com.wern.ALSAlgorithm@3c25cfe1)
[INFO] [Engine$] Data santiy check is on.
[ERROR] [HBPEvents] The appId 3 does not exist. Please use valid appId.
Exception in thread "main" java.lang.Exception: HBase table not found for appId 3.
    at io.prediction.data.storage.hbase.HBPEvents.checkTableExists(HBPEvents.scala:54)
    at io.prediction.data.storage.hbase.HBPEvents.find(HBPEvents.scala:70)
    at com.wern.DataSource.readTraining(DataSource.scala:32)
    at com.wern.DataSource.readTraining(DataSource.scala:18)
    at io.prediction.controller.PDataSource.readTrainingBase(DataSource.scala:41)
    at io.prediction.controller.Engine$.train(Engine.scala:518)
    at io.prediction.controller.Engine.train(Engine.scala:147)
    at io.prediction.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:61)
    at io.prediction.workflow.CreateWorkflow$.main(CreateWorkflow.scala:258)
    at io.prediction.workflow.CreateWorkflow.main(CreateWorkflow.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

基本上它无法识别我提供的appId。然而执行 pio app list 显示 ID 确实是 3.

[INFO] [App$]                 Name |   ID |                                                       Access Key | Allowed Event(s)
[INFO] [App$]  TestRecommendation |    2 | GJBuFYODWTwFBVQ2D2nbBFW5C0iKClNLEMbYGGhDGoZGEtLre62BLwLJlioTEeJP | (all)
[INFO] [App$] MyGenericRecommendationApp |    3 | C7vfcipXd0baQcZYzqr73EwSPT2Bd0YW1OTLgEdlUA9FOeBja6dyBVIKaYnQbsUO | (all)
[INFO] [App$] Finished listing 2 app(s).

有什么想法吗?

看来你的问题已经在这里得到解答 https://groups.google.com/forum/#!topic/predictionio-user/W1P4T2tTreQ