如何将行rdd转换为类型化rdd
How to convert row rdd to typed rdd
是否可以将 Row RDD 转换为 Typed RDD。在下面的代码中,我可以将行 JavaRDD 转换为计数器类型 JavaRDD
代码:
JavaRDD<Counter> rdd = sc.parallelize(counters);
Dataset<Counter> ds = sqlContext.createDataset(rdd.rdd(), encoder);
DataFrame df = ds.toDF();
df.show()
df.write().parquet(path);
DataFrame newDataDF = sqlContext.read().parquet(path);
newDataDF.toJavaRDD(); // This gives a row type rdd
在 Scala 中:
case class A(countId: Long, bytes: Array[Byte], blist: List[B])
case class B(id: String, count: Long)
val b1 = B("a", 1L)
val b2 = B("b", 2L)
val a1 = A(1L, Array(1.toByte,2.toByte), List(a1, a2))
val rdd = sc.parallelize(List(a1))
val dataSet: Dataset[A] = sqlContext.createDataset(rdd)
val df = dataSet.toDF()
// this shows, so this last entry is for List[B] in which it is storing string as null
|1|[01 02]| [[null,3984726108...|]
df.show
df.write.parquet(path)
val roundTripRDD = sqlContext.read.parquet(path).as[A].rdd
//throws error here when run show on df
Caused by: org.codehaus.commons.compiler.CompileException: File 'generated.java',
Line 300, Column 68:
No applicable constructor/method found for actual parameters
"long, byte[], scala.collection.Seq"; candidates are:
"test.data.A(long, byte[], scala.collection.immutable.List)"
roundTripRDD.toDF.show
assertEquals(roundTripRDD, rdd)
我是否需要为案例 class 提供某种构造函数?
尝试:
sqlContext.read().parquet(path).as(encoder).rdd().toJavaRDD();
是否可以将 Row RDD 转换为 Typed RDD。在下面的代码中,我可以将行 JavaRDD 转换为计数器类型 JavaRDD
代码:
JavaRDD<Counter> rdd = sc.parallelize(counters);
Dataset<Counter> ds = sqlContext.createDataset(rdd.rdd(), encoder);
DataFrame df = ds.toDF();
df.show()
df.write().parquet(path);
DataFrame newDataDF = sqlContext.read().parquet(path);
newDataDF.toJavaRDD(); // This gives a row type rdd
在 Scala 中:
case class A(countId: Long, bytes: Array[Byte], blist: List[B])
case class B(id: String, count: Long)
val b1 = B("a", 1L)
val b2 = B("b", 2L)
val a1 = A(1L, Array(1.toByte,2.toByte), List(a1, a2))
val rdd = sc.parallelize(List(a1))
val dataSet: Dataset[A] = sqlContext.createDataset(rdd)
val df = dataSet.toDF()
// this shows, so this last entry is for List[B] in which it is storing string as null
|1|[01 02]| [[null,3984726108...|]
df.show
df.write.parquet(path)
val roundTripRDD = sqlContext.read.parquet(path).as[A].rdd
//throws error here when run show on df
Caused by: org.codehaus.commons.compiler.CompileException: File 'generated.java',
Line 300, Column 68:
No applicable constructor/method found for actual parameters
"long, byte[], scala.collection.Seq"; candidates are:
"test.data.A(long, byte[], scala.collection.immutable.List)"
roundTripRDD.toDF.show
assertEquals(roundTripRDD, rdd)
我是否需要为案例 class 提供某种构造函数?
尝试:
sqlContext.read().parquet(path).as(encoder).rdd().toJavaRDD();