RowMatrix 的 columnSimilarities() returns 错误架构:初始化数据库失败
columnSimilarities() of RowMatrix returns ERROR Schema: Failed initialising database
在 spark 2.2.0 下,我在使用 columnSimilarities() 时遇到错误。
这是要重现的代码。
from pyspark.mllib.linalg.distributed import RowMatrix
rdd = sc.parallelize([[1.0,2.0,1.0],[1.0,5.0,1.0],[1.0,2.0,1.0],[4.0,2.0,4.0]])
mat = RowMatrix(rdd)
sim = mat.columnSimilarities(0.1)
sim.entries.collect()
错误就是这样(截断。太长。完整日志是 here)。
17/08/13 10:15:19 ERROR Schema: Failed initialising database.
Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon@3234df5e, see the next exception for details.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source)
此代码运行良好。
from pyspark.mllib.linalg.distributed import IndexedRow, IndexedRowMatrix
rdd = sc.parallelize([IndexedRow(0, [1.0,2.0,1.0]),
IndexedRow(1, [1.0,5.0,1.0]),
IndexedRow(2, [1.0,2.0,1.0]),
IndexedRow(3, [4.0,2.0,4.0])])
mat = IndexedRowMatrix(rdd).toRowMatrix()
sim = mat.columnSimilarities(0.1)
sim.entries.collect()
这是 Spark 的 bug 吗?
这是 jdbc
连接的问题 - 而不是 columnSimilarities
- 或一般的 MLlib。
您可能需要做一些工作才能获得 derby
连接 运行。这是一个起点:
在 spark 2.2.0 下,我在使用 columnSimilarities() 时遇到错误。
这是要重现的代码。
from pyspark.mllib.linalg.distributed import RowMatrix
rdd = sc.parallelize([[1.0,2.0,1.0],[1.0,5.0,1.0],[1.0,2.0,1.0],[4.0,2.0,4.0]])
mat = RowMatrix(rdd)
sim = mat.columnSimilarities(0.1)
sim.entries.collect()
错误就是这样(截断。太长。完整日志是 here)。
17/08/13 10:15:19 ERROR Schema: Failed initialising database.
Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon@3234df5e, see the next exception for details.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source)
此代码运行良好。
from pyspark.mllib.linalg.distributed import IndexedRow, IndexedRowMatrix
rdd = sc.parallelize([IndexedRow(0, [1.0,2.0,1.0]),
IndexedRow(1, [1.0,5.0,1.0]),
IndexedRow(2, [1.0,2.0,1.0]),
IndexedRow(3, [4.0,2.0,4.0])])
mat = IndexedRowMatrix(rdd).toRowMatrix()
sim = mat.columnSimilarities(0.1)
sim.entries.collect()
这是 Spark 的 bug 吗?
这是 jdbc
连接的问题 - 而不是 columnSimilarities
- 或一般的 MLlib。
您可能需要做一些工作才能获得 derby
连接 运行。这是一个起点: