线程异常 "main" java.lang.NullPointerException com.databricks.dbutils_v1.DBUtilsHolder$$anon$1.invoke
Exception in thread "main" java.lang.NullPointerException com.databricks.dbutils_v1.DBUtilsHolder$$anon$1.invoke
我想读取 Azure Blob 中的镶木地板文件,所以我使用 dbultils.fs.mount
将数据从 Azure Blob 装载到本地
但是我得到了错误 Exception in thread "main" java.lang.NullPointerException
以下是我的日志:
hello big data
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/06/10 23:20:10 INFO SparkContext: Running Spark version 2.1.0
20/06/10 23:20:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/06/10 23:20:11 INFO SecurityManager: Changing view acls to: Admin
20/06/10 23:20:11 INFO SecurityManager: Changing modify acls to: Admin
20/06/10 23:20:11 INFO SecurityManager: Changing view acls groups to:
20/06/10 23:20:11 INFO SecurityManager: Changing modify acls groups to:
20/06/10 23:20:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Admin); groups with view permissions: Set(); users with modify permissions: Set(Admin); groups with modify permissions: Set()
20/06/10 23:20:12 INFO Utils: Successfully started service 'sparkDriver' on port 4725.
20/06/10 23:20:12 INFO SparkEnv: Registering MapOutputTracker
20/06/10 23:20:13 INFO SparkEnv: Registering BlockManagerMaster
20/06/10 23:20:13 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/06/10 23:20:13 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/06/10 23:20:13 INFO DiskBlockManager: Created local directory at C:\Users\Admin\AppData\Local\Temp\blockmgr-c023c3b8-fd70-461a-ac69-24ce9c770efe
20/06/10 23:20:13 INFO MemoryStore: MemoryStore started with capacity 894.3 MB
20/06/10 23:20:13 INFO SparkEnv: Registering OutputCommitCoordinator
20/06/10 23:20:13 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/06/10 23:20:13 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.102:4040
20/06/10 23:20:13 INFO Executor: Starting executor ID driver on host localhost
20/06/10 23:20:13 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 4738.
20/06/10 23:20:13 INFO NettyBlockTransferService: Server created on 192.168.0.102:4738
20/06/10 23:20:13 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/06/10 23:20:13 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.0.102, 4738, None)
20/06/10 23:20:13 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.102:4738 with 894.3 MB RAM, BlockManagerId(driver, 192.168.0.102, 4738, None)
20/06/10 23:20:13 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.0.102, 4738, None)
20/06/10 23:20:13 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.0.102, 4738, None)
20/06/10 23:20:14 INFO SharedState: Warehouse path is 'file:/E:/sparkdemo/sparkdemo/spark-warehouse/'.
Exception in thread "main" java.lang.NullPointerException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.databricks.dbutils_v1.DBUtilsHolder$$anon.invoke(DBUtilsHolder.scala:17)
at com.sun.proxy.$Proxy7.fs(Unknown Source)
at Transform$.main(Transform.scala:19)
at Transform.main(Transform.scala)
20/06/10 23:20:14 INFO SparkContext: Invoking stop() from shutdown hook
20/06/10 23:20:14 INFO SparkUI: Stopped Spark web UI at http://192.168.0.102:4040
20/06/10 23:20:14 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/06/10 23:20:14 INFO MemoryStore: MemoryStore cleared
20/06/10 23:20:14 INFO BlockManager: BlockManager stopped
20/06/10 23:20:14 INFO BlockManagerMaster: BlockManagerMaster stopped
20/06/10 23:20:14 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/06/10 23:20:14 INFO SparkContext: Successfully stopped SparkContext
20/06/10 23:20:14 INFO ShutdownHookManager: Shutdown hook called
20/06/10 23:20:14 INFO ShutdownHookManager: Deleting directory C:\Users\Admin\AppData\Local\Temp\spark-cbdbcfe7-bc70-4d34-ad8e-5baed8308ae2
我的代码:
import com.databricks.dbutils_v1.DBUtilsHolder.dbutils
import org.apache.spark.sql.SparkSession
object Demo {
def main(args:Array[String]): Unit = {
println("hello big data")
val containerName = "container1"
val storageAccountName = "storageaccount1"
val sas = "saskey"
val url = "wasbs://" + containerName + "@" + storageAccountName + ".blob.core.windows.net/"
var config = "fs.azure.sas." + containerName + "." + storageAccountName + ".blob.core.windows.net"
//Spark session
val spark : SparkSession = SparkSession.builder
.appName("SpartDemo")
.master("local[1]")
.getOrCreate()
//Mount data
dbutils.fs.mount(
source = url,
mountPoint = "/mnt/container1",
extraConfigs = Map(config -> sas))
val parquetFileDF = spark.read.parquet("/mnt/container1/test1.parquet")
parquetFileDF.show()
}
}
我的 sbt 文件:
name := "sparkdemo1"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"com.databricks" % "dbutils-api_2.11" % "0.0.3",
"org.apache.spark" % "spark-core_2.11" % "2.1.0",
"org.apache.spark" % "spark-sql_2.11" % "2.1.0"
)
您是否运行将其整合到 Databricks 实例中?
如果不是,那就是问题所在:dbutils 由 Databricks 执行上下文提供。
在那种情况下,据我所知,你有三个选择:
- 将您的应用程序打包到一个 jar 文件中,然后 运行 使用 Databricks 作业
- 使用databricks-connect
尝试在 Databricks 之外模拟模拟的 dbutils 实例,如图所示 here:
com.databricks.dbutils_v1.DBUtilsHolder.dbutils0.set(
new com.databricks.dbutils_v1.DBUtilsV1{
...
}
)
无论如何,我会说选项 1 和 2 比第三个好。此外,通过选择其中之一,您不需要包含 "dbutils-api_2.11" 依赖项,因为它由 Databricks 集群提供。
我想读取 Azure Blob 中的镶木地板文件,所以我使用 dbultils.fs.mount
将数据从 Azure Blob 装载到本地
但是我得到了错误 Exception in thread "main" java.lang.NullPointerException
以下是我的日志:
hello big data
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/06/10 23:20:10 INFO SparkContext: Running Spark version 2.1.0
20/06/10 23:20:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/06/10 23:20:11 INFO SecurityManager: Changing view acls to: Admin
20/06/10 23:20:11 INFO SecurityManager: Changing modify acls to: Admin
20/06/10 23:20:11 INFO SecurityManager: Changing view acls groups to:
20/06/10 23:20:11 INFO SecurityManager: Changing modify acls groups to:
20/06/10 23:20:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Admin); groups with view permissions: Set(); users with modify permissions: Set(Admin); groups with modify permissions: Set()
20/06/10 23:20:12 INFO Utils: Successfully started service 'sparkDriver' on port 4725.
20/06/10 23:20:12 INFO SparkEnv: Registering MapOutputTracker
20/06/10 23:20:13 INFO SparkEnv: Registering BlockManagerMaster
20/06/10 23:20:13 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/06/10 23:20:13 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/06/10 23:20:13 INFO DiskBlockManager: Created local directory at C:\Users\Admin\AppData\Local\Temp\blockmgr-c023c3b8-fd70-461a-ac69-24ce9c770efe
20/06/10 23:20:13 INFO MemoryStore: MemoryStore started with capacity 894.3 MB
20/06/10 23:20:13 INFO SparkEnv: Registering OutputCommitCoordinator
20/06/10 23:20:13 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/06/10 23:20:13 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.102:4040
20/06/10 23:20:13 INFO Executor: Starting executor ID driver on host localhost
20/06/10 23:20:13 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 4738.
20/06/10 23:20:13 INFO NettyBlockTransferService: Server created on 192.168.0.102:4738
20/06/10 23:20:13 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/06/10 23:20:13 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.0.102, 4738, None)
20/06/10 23:20:13 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.102:4738 with 894.3 MB RAM, BlockManagerId(driver, 192.168.0.102, 4738, None)
20/06/10 23:20:13 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.0.102, 4738, None)
20/06/10 23:20:13 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.0.102, 4738, None)
20/06/10 23:20:14 INFO SharedState: Warehouse path is 'file:/E:/sparkdemo/sparkdemo/spark-warehouse/'.
Exception in thread "main" java.lang.NullPointerException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.databricks.dbutils_v1.DBUtilsHolder$$anon.invoke(DBUtilsHolder.scala:17)
at com.sun.proxy.$Proxy7.fs(Unknown Source)
at Transform$.main(Transform.scala:19)
at Transform.main(Transform.scala)
20/06/10 23:20:14 INFO SparkContext: Invoking stop() from shutdown hook
20/06/10 23:20:14 INFO SparkUI: Stopped Spark web UI at http://192.168.0.102:4040
20/06/10 23:20:14 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/06/10 23:20:14 INFO MemoryStore: MemoryStore cleared
20/06/10 23:20:14 INFO BlockManager: BlockManager stopped
20/06/10 23:20:14 INFO BlockManagerMaster: BlockManagerMaster stopped
20/06/10 23:20:14 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/06/10 23:20:14 INFO SparkContext: Successfully stopped SparkContext
20/06/10 23:20:14 INFO ShutdownHookManager: Shutdown hook called
20/06/10 23:20:14 INFO ShutdownHookManager: Deleting directory C:\Users\Admin\AppData\Local\Temp\spark-cbdbcfe7-bc70-4d34-ad8e-5baed8308ae2
我的代码:
import com.databricks.dbutils_v1.DBUtilsHolder.dbutils
import org.apache.spark.sql.SparkSession
object Demo {
def main(args:Array[String]): Unit = {
println("hello big data")
val containerName = "container1"
val storageAccountName = "storageaccount1"
val sas = "saskey"
val url = "wasbs://" + containerName + "@" + storageAccountName + ".blob.core.windows.net/"
var config = "fs.azure.sas." + containerName + "." + storageAccountName + ".blob.core.windows.net"
//Spark session
val spark : SparkSession = SparkSession.builder
.appName("SpartDemo")
.master("local[1]")
.getOrCreate()
//Mount data
dbutils.fs.mount(
source = url,
mountPoint = "/mnt/container1",
extraConfigs = Map(config -> sas))
val parquetFileDF = spark.read.parquet("/mnt/container1/test1.parquet")
parquetFileDF.show()
}
}
我的 sbt 文件:
name := "sparkdemo1"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"com.databricks" % "dbutils-api_2.11" % "0.0.3",
"org.apache.spark" % "spark-core_2.11" % "2.1.0",
"org.apache.spark" % "spark-sql_2.11" % "2.1.0"
)
您是否运行将其整合到 Databricks 实例中? 如果不是,那就是问题所在:dbutils 由 Databricks 执行上下文提供。 在那种情况下,据我所知,你有三个选择:
- 将您的应用程序打包到一个 jar 文件中,然后 运行 使用 Databricks 作业
- 使用databricks-connect
尝试在 Databricks 之外模拟模拟的 dbutils 实例,如图所示 here:
com.databricks.dbutils_v1.DBUtilsHolder.dbutils0.set( new com.databricks.dbutils_v1.DBUtilsV1{ ... } )
无论如何,我会说选项 1 和 2 比第三个好。此外,通过选择其中之一,您不需要包含 "dbutils-api_2.11" 依赖项,因为它由 Databricks 集群提供。