SCIO用scio parquet读取parquet文件,生成的case class not found
SCIO read parquet file with scio parquet, the generated case class not found
我有问题。
我通过 sbt 原型创建了一个 SCIO (Apache Beam) 项目:sbt new spotify/scio.g8
此作业的目标是从 GS
读取镶木地板文件
当我直接在 SCIO 中使用 Apache Beam 提供的 ParquetIO 时,这项工作 (GenericRecord) :
val SCHEMA: Schema =
new Schema.Parser()
.parse(
"{\n"
+ " \"namespace\": \"ioitavro\",\n"
+ " \"type\": \"record\",\n"
+ " \"name\": \"TestObject\",\n"
+ " \"fields\": [\n"
+ " {\"name\": \"field1\", \"type\": \"int\"}\n,"
+ " {\"name\": \"field2\", \"type\": \"int\"}\n"
+ " ]\n"
+ "}")
val inputIO = ParquetIO
.read(SCHEMA)
.from(options.getInputParquet)
sc.customInput("input", inputIO)
.map(file => file.toString)
.saveAsTextFile(args(outputTopic))
但我想使用 scio-parquet:https://spotify.github.io/scio/io/Parquet.html#read-avro-parquet-files
我在 build.sbt 文件中有以下依赖项:
val scioVersion = "0.8.0-beta2"
val beamVersion = "2.16.0"
val scalaMacrosVersion = "2.1.1"
"com.spotify" %% "scio-core" % scioVersion,
"com.spotify" %% "scio-extra" % scioVersion,
"com.spotify" %% "scio-parquet" % scioVersion,
"com.spotify" %% "scio-avro" % scioVersion,
我使用模式
object TestModel {
@AvroType.fromSchema(
"""{
| "type":"record",
| "name":"TestObject",
| "namespace":"com.spotify.scio.avro",
| "doc":"Record for an account",
| "fields":[
| {"name":"field1","type":"int"},
| {"name":"field2","type":"int"}
| ]}
""".stripMargin)
class TestSchema
}
然后在我使用的作业中:
import com.spotify.scio.avro.TestObject
val projection = Projection[TestObject](_.getIntField, _.getIntField)
sc.parquetAvroFile[TestObject](args(inputParquet), projection)
.map(file => file.toString)
.saveAsTextFile(args(outputTopic))
但是我有以下错误:
object TestObject is not a member of package com.spotify.scio.avro
[error] import com.spotify.scio.avro.TestObject
[error] ^
[error] /mypath/my-job/src/main/scala/com/test/MyJob.scala:41:33: not found: type TestObject
[error] val projection = Projection[TestObject](_.getIntField, _.getIntField)
[error] ^
[error] /my-path/my-job/src/main/scala/test/MyJob.scala:59:26: not found: type TestObject
[error] sc.parquetAvroFile[TestObject](args(inputParquet), projection)
[error] ^
我不明白为什么,但也许 class TestObject 未由架构 @AvroType.fromSchema.
正确生成
或者我没有正确使用 api,但我遵循了 link:https://spotify.github.io/scio/io/Parquet.html#read-avro-parquet-files
感谢您的帮助。
使用 sbt-avro 插件,这项工作。
我有问题。
我通过 sbt 原型创建了一个 SCIO (Apache Beam) 项目:sbt new spotify/scio.g8
此作业的目标是从 GS
读取镶木地板文件当我直接在 SCIO 中使用 Apache Beam 提供的 ParquetIO 时,这项工作 (GenericRecord) :
val SCHEMA: Schema =
new Schema.Parser()
.parse(
"{\n"
+ " \"namespace\": \"ioitavro\",\n"
+ " \"type\": \"record\",\n"
+ " \"name\": \"TestObject\",\n"
+ " \"fields\": [\n"
+ " {\"name\": \"field1\", \"type\": \"int\"}\n,"
+ " {\"name\": \"field2\", \"type\": \"int\"}\n"
+ " ]\n"
+ "}")
val inputIO = ParquetIO
.read(SCHEMA)
.from(options.getInputParquet)
sc.customInput("input", inputIO)
.map(file => file.toString)
.saveAsTextFile(args(outputTopic))
但我想使用 scio-parquet:https://spotify.github.io/scio/io/Parquet.html#read-avro-parquet-files
我在 build.sbt 文件中有以下依赖项:
val scioVersion = "0.8.0-beta2"
val beamVersion = "2.16.0"
val scalaMacrosVersion = "2.1.1"
"com.spotify" %% "scio-core" % scioVersion,
"com.spotify" %% "scio-extra" % scioVersion,
"com.spotify" %% "scio-parquet" % scioVersion,
"com.spotify" %% "scio-avro" % scioVersion,
我使用模式
object TestModel {
@AvroType.fromSchema(
"""{
| "type":"record",
| "name":"TestObject",
| "namespace":"com.spotify.scio.avro",
| "doc":"Record for an account",
| "fields":[
| {"name":"field1","type":"int"},
| {"name":"field2","type":"int"}
| ]}
""".stripMargin)
class TestSchema
}
然后在我使用的作业中:
import com.spotify.scio.avro.TestObject
val projection = Projection[TestObject](_.getIntField, _.getIntField)
sc.parquetAvroFile[TestObject](args(inputParquet), projection)
.map(file => file.toString)
.saveAsTextFile(args(outputTopic))
但是我有以下错误:
object TestObject is not a member of package com.spotify.scio.avro
[error] import com.spotify.scio.avro.TestObject
[error] ^
[error] /mypath/my-job/src/main/scala/com/test/MyJob.scala:41:33: not found: type TestObject
[error] val projection = Projection[TestObject](_.getIntField, _.getIntField)
[error] ^
[error] /my-path/my-job/src/main/scala/test/MyJob.scala:59:26: not found: type TestObject
[error] sc.parquetAvroFile[TestObject](args(inputParquet), projection)
[error] ^
我不明白为什么,但也许 class TestObject 未由架构 @AvroType.fromSchema.
正确生成或者我没有正确使用 api,但我遵循了 link:https://spotify.github.io/scio/io/Parquet.html#read-avro-parquet-files
感谢您的帮助。
使用 sbt-avro 插件,这项工作。