将 Scio 类型的 bigquery api 与 apache-beam 一起使用时编译管道时出错

Question

我正在尝试使用类型 bigquery api，如 scio website:

所示

@BigQueryType.fromTable("sandbox-data:Users.uid")
class UIDTable

我在命令行中运行sbt pack -Dbigquery.project=sandbox-data得到如下错误：

exception during macro expansion:
[error] java.lang.RuntimeException: Property bigquery.project not set. Use - 
Dbigquery.project=<BILLING_PROJECT>
[error]         at com.spotify.scio.bigquery.client.BigQuery$$anonfun$instance$lzycompute$$anonfun$apply.apply(BigQuery.scala:156)
[error]         at com.spotify.scio.bigquery.client.BigQuery$$anonfun$instance$lzycompute$$anonfun$apply.apply(BigQuery.scala:154)
[error]         at scala.Option.getOrElse(Option.scala:138)
[error]         at com.spotify.scio.bigquery.client.BigQuery$$anonfun$instance$lzycompute.apply(BigQuery.scala:154)
[error]         at com.spotify.scio.bigquery.client.BigQuery$$anonfun$instance$lzycompute.apply(BigQuery.scala:154)
[error]         at scala.Option.getOrElse(Option.scala:138)
[error]         at com.spotify.scio.bigquery.client.BigQuery$.instance$lzycompute(BigQuery.scala:154)
[error]         at com.spotify.scio.bigquery.client.BigQuery$.instance(BigQuery.scala:150)
[error]         at com.spotify.scio.bigquery.client.BigQuery$.defaultInstance(BigQuery.scala:176)
[error]         at com.spotify.scio.bigquery.types.TypeProvider$.bigquery$lzycompute(TypeProvider.scala:42)
[error]         at com.spotify.scio.bigquery.types.TypeProvider$.bigquery(TypeProvider.scala:42)
[error]         at com.spotify.scio.bigquery.types.TypeProvider$.tableImpl(TypeProvider.scala:53)
[error]   @BigQueryType.fromTable("sandbox-data:Users.uid")
[error]    ^
[error] one error found
[error] (Compile / compileIncremental) Compilation failed

我的 build.sbt 文件是：

import sbt.Keys._
import sbt.{util, _}

val scioVersion = "0.7.4"
val beamVersion = "2.11.0"
val scalaMacrosVersion = "2.1.1"

//logLevel := util.Level.Debug

lazy val commonSettings = Defaults.coreDefaultSettings ++ Seq(
  organization := "haaretz",
  // Semantic versioning http://semver.org/
  version := "0.1.0-SNAPSHOT",
  scalaVersion := "2.12.8",
  scalacOptions ++= Seq("-target:jvm-1.8",
    "-deprecation",
    "-feature",
    "-unchecked"),
  javacOptions ++= Seq("-source", "1.8", "-target", "1.8")
)

lazy val paradiseDependency =
  "org.scalamacros" % "paradise" % scalaMacrosVersion cross CrossVersion.full
lazy val macroSettings = Seq(
  libraryDependencies += "org.scala-lang" % "scala-reflect" % scalaVersion.value,
  addCompilerPlugin(paradiseDependency)
)

lazy val root: Project = project
  .in(file("."))
  .settings(commonSettings)
  .settings(macroSettings)
  .settings(
    name := "htz-dataflow",
    description := "DataFlow pipelines for htz projects",
    publish / skip := true,
    libraryDependencies ++= Seq(
      "com.spotify" %% "scio-core" % scioVersion,
      "com.spotify" %% "scio-bigquery" % scioVersion,
      "com.spotify" %% "scio-test" % scioVersion % Test,
      "org.apache.beam" % "beam-runners-direct-java" % beamVersion,
      "org.apache.beam" % "beam-runners-google-cloud-dataflow-java" % beamVersion,
      "org.slf4j" % "slf4j-simple" % "1.7.25",
      "org.jsoup" % "jsoup" % "1.11.3",
      "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.9.8",
      "com.typesafe.slick" %% "slick" % "3.3.1",
      "org.slf4j" % "slf4j-nop" % "1.7.26",
    )
  )
  .enablePlugins(PackPlugin)

lazy val repl: Project = project
  .in(file(".repl"))
  .settings(commonSettings)
  .settings(macroSettings)
  .settings(
    name := "repl",
    description := "Scio REPL for POC",
    libraryDependencies ++= Seq(
      "com.spotify" %% "scio-repl" % scioVersion
    ),
    Compile / mainClass := Some("com.spotify.scio.repl.ScioShell"),
    publish / skip := true
  )
  .dependsOn(root)

如果您需要任何其他信息，请发表评论，我会提供。

Answer 1

它应该是 sbt -Dbigquery.project=... {compile,pack}，即 JVM 标志应该在 sbt 任务之前。或者，您可以使用 gcloud config set project [PROJECT].

设置默认项目

将 Scio 类型的 bigquery api 与 apache-beam 一起使用时编译管道时出错

Error compiling pipeline when using Scio typed bigquery api with apache-beam

scala

google-cloud-dataflow

apache-beam

spotify-scio