PredictionIO UniversalRecommender 弹性搜索错误
PredictionIO UniversalRecommender elasticsearch error
我正在使用 Prediction.io 附带的通用推荐器,当我 运行 ./examples/integration-test
脚本(找到 here)时出现以下错误。
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@6ec63f8{/jobs,null,UNAVAILABLE,@Spark}
Exception in thread "main" java.lang.IllegalStateException: No Elasticsearch client configuration detected, check your pio-env.sh forproper configuration settings
at com.actionml.EsClient$$anonfun$client.apply(EsClient.scala:86)
at com.actionml.EsClient$$anonfun$client.apply(EsClient.scala:86)
at scala.Option.getOrElse(Option.scala:121)
at com.actionml.EsClient$.client$lzycompute(EsClient.scala:85)
at com.actionml.EsClient$.client(EsClient.scala:85)
at com.actionml.EsClient$.createIndex(EsClient.scala:174)
at com.actionml.EsClient$.hotSwap(EsClient.scala:271)
at com.actionml.URModel.save(URModel.scala:82)
at com.actionml.URAlgorithm.calcAll(URAlgorithm.scala:367)
at com.actionml.URAlgorithm.train(URAlgorithm.scala:295)
at com.actionml.URAlgorithm.train(URAlgorithm.scala:180)
at org.apache.predictionio.controller.P2LAlgorithm.trainBase(P2LAlgorithm.scala:49)
at org.apache.predictionio.controller.Engine$$anonfun.apply(Engine.scala:690)
at org.apache.predictionio.controller.Engine$$anonfun.apply(Engine.scala:690)
at scala.collection.TraversableLike$$anonfun$map.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.predictionio.controller.Engine$.train(Engine.scala:690)
at org.apache.predictionio.controller.Engine.train(Engine.scala:176)
at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)
at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251)
at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
我的配置 (PredictionIO/conf/pio-env.sh
) 看起来像:
#!/usr/bin/env bash
#
# PredictionIO Main Configuration
#
# This section controls core behavior of PredictionIO. It is very likely that
# you need to change these to fit your site.
# SPARK_HOME: Apache Spark is a hard dependency and must be configured.
# SPARK_HOME=$PIO_HOME/vendors/spark-2.0.2-bin-hadoop2.7
SPARK_HOME=$PIO_HOME/vendors/spark-2.1.1-bin-hadoop2.6
POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.0.0.jar
MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar
# ES_CONF_DIR: You must configure this if you have advanced configuration for
# your Elasticsearch setup.
# ES_CONF_DIR=/opt/elasticsearch
# HADOOP_CONF_DIR: You must configure this if you intend to run PredictionIO
# with Hadoop 2.
# HADOOP_CONF_DIR=/opt/hadoop
# HBASE_CONF_DIR: You must configure this if you intend to run PredictionIO
# with HBase on a remote cluster.
# HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf
# Filesystem paths where PredictionIO uses as block storage.
PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
# PredictionIO Storage Configuration
#
# This section controls programs that make use of PredictionIO's built-in
# storage facilities. Default values are shown below.
#
# For more information on storage configuration please refer to
# http://predictionio.apache.org/system/anotherdatastore/
# Storage Repositories
# Default is to use PostgreSQL
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL
PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL
# Storage Data Sources
# PostgreSQL Default Settings
# Please change "pio" to your database name in PIO_STORAGE_SOURCES_PGSQL_URL
# Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and
# PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly
PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc
PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio
PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio
PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio
# MySQL Example
# PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc
# PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio
# PIO_STORAGE_SOURCES_MYSQL_USERNAME=pio
# PIO_STORAGE_SOURCES_MYSQL_PASSWORD=pio
# Elasticsearch Example
# PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
# PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200
# PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-5.5.2
# Optional basic HTTP auth
# PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name
# PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret
# Elasticsearch 1.x Example
# PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
# PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=<elasticsearch_cluster_name>
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
# PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-1.7.6
# Local File System Example
# PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
# PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models
# HBase Example
# PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
# PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.0.0
# AWS S3 Example
# PIO_STORAGE_SOURCES_S3_TYPE=s3
# PIO_STORAGE_SOURCES_S3_BUCKET_NAME=pio_bucket
# PIO_STORAGE_SOURCES_S3_BASE_PATH=pio_model
我正在尝试将 PSQL 用于所有三种类型的存储(元、事件和模型),所以不确定为什么我会收到 RE elasticsearch 错误?
我需要在某处安装 elasticsearch 运行ning 吗?
- 列表项
actionml-user群论坛提供的反馈:https://groups.google.com/forum/#!topic/actionml-user/9gPlf5iWDWQ
总而言之 - 虽然 predictionio 为 3 "repositories" 的不同数据源提供了许多选项,但通用推荐器 (UR) 引擎需要 elasticsearch 作为元数据存储。事件数据存储库最好设置为 HBASE(尽管我想我看到 post 有人让它与 Postgres 一起工作)。 UR 并没有真正使用模型存储库,因此也可以将其配置为使用 LOCALFS,这是我成功使用的配置。
我正在使用 Prediction.io 附带的通用推荐器,当我 运行 ./examples/integration-test
脚本(找到 here)时出现以下错误。
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@6ec63f8{/jobs,null,UNAVAILABLE,@Spark}
Exception in thread "main" java.lang.IllegalStateException: No Elasticsearch client configuration detected, check your pio-env.sh forproper configuration settings
at com.actionml.EsClient$$anonfun$client.apply(EsClient.scala:86)
at com.actionml.EsClient$$anonfun$client.apply(EsClient.scala:86)
at scala.Option.getOrElse(Option.scala:121)
at com.actionml.EsClient$.client$lzycompute(EsClient.scala:85)
at com.actionml.EsClient$.client(EsClient.scala:85)
at com.actionml.EsClient$.createIndex(EsClient.scala:174)
at com.actionml.EsClient$.hotSwap(EsClient.scala:271)
at com.actionml.URModel.save(URModel.scala:82)
at com.actionml.URAlgorithm.calcAll(URAlgorithm.scala:367)
at com.actionml.URAlgorithm.train(URAlgorithm.scala:295)
at com.actionml.URAlgorithm.train(URAlgorithm.scala:180)
at org.apache.predictionio.controller.P2LAlgorithm.trainBase(P2LAlgorithm.scala:49)
at org.apache.predictionio.controller.Engine$$anonfun.apply(Engine.scala:690)
at org.apache.predictionio.controller.Engine$$anonfun.apply(Engine.scala:690)
at scala.collection.TraversableLike$$anonfun$map.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.predictionio.controller.Engine$.train(Engine.scala:690)
at org.apache.predictionio.controller.Engine.train(Engine.scala:176)
at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)
at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251)
at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
我的配置 (PredictionIO/conf/pio-env.sh
) 看起来像:
#!/usr/bin/env bash
#
# PredictionIO Main Configuration
#
# This section controls core behavior of PredictionIO. It is very likely that
# you need to change these to fit your site.
# SPARK_HOME: Apache Spark is a hard dependency and must be configured.
# SPARK_HOME=$PIO_HOME/vendors/spark-2.0.2-bin-hadoop2.7
SPARK_HOME=$PIO_HOME/vendors/spark-2.1.1-bin-hadoop2.6
POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.0.0.jar
MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar
# ES_CONF_DIR: You must configure this if you have advanced configuration for
# your Elasticsearch setup.
# ES_CONF_DIR=/opt/elasticsearch
# HADOOP_CONF_DIR: You must configure this if you intend to run PredictionIO
# with Hadoop 2.
# HADOOP_CONF_DIR=/opt/hadoop
# HBASE_CONF_DIR: You must configure this if you intend to run PredictionIO
# with HBase on a remote cluster.
# HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf
# Filesystem paths where PredictionIO uses as block storage.
PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
# PredictionIO Storage Configuration
#
# This section controls programs that make use of PredictionIO's built-in
# storage facilities. Default values are shown below.
#
# For more information on storage configuration please refer to
# http://predictionio.apache.org/system/anotherdatastore/
# Storage Repositories
# Default is to use PostgreSQL
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL
PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL
# Storage Data Sources
# PostgreSQL Default Settings
# Please change "pio" to your database name in PIO_STORAGE_SOURCES_PGSQL_URL
# Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and
# PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly
PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc
PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio
PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio
PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio
# MySQL Example
# PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc
# PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio
# PIO_STORAGE_SOURCES_MYSQL_USERNAME=pio
# PIO_STORAGE_SOURCES_MYSQL_PASSWORD=pio
# Elasticsearch Example
# PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
# PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200
# PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-5.5.2
# Optional basic HTTP auth
# PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name
# PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret
# Elasticsearch 1.x Example
# PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
# PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=<elasticsearch_cluster_name>
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
# PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-1.7.6
# Local File System Example
# PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
# PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models
# HBase Example
# PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
# PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.0.0
# AWS S3 Example
# PIO_STORAGE_SOURCES_S3_TYPE=s3
# PIO_STORAGE_SOURCES_S3_BUCKET_NAME=pio_bucket
# PIO_STORAGE_SOURCES_S3_BASE_PATH=pio_model
我正在尝试将 PSQL 用于所有三种类型的存储(元、事件和模型),所以不确定为什么我会收到 RE elasticsearch 错误?
我需要在某处安装 elasticsearch 运行ning 吗?
- 列表项
actionml-user群论坛提供的反馈:https://groups.google.com/forum/#!topic/actionml-user/9gPlf5iWDWQ
总而言之 - 虽然 predictionio 为 3 "repositories" 的不同数据源提供了许多选项,但通用推荐器 (UR) 引擎需要 elasticsearch 作为元数据存储。事件数据存储库最好设置为 HBASE(尽管我想我看到 post 有人让它与 Postgres 一起工作)。 UR 并没有真正使用模型存储库,因此也可以将其配置为使用 LOCALFS,这是我成功使用的配置。