光束 SQL 未发射
Beam SQL Not Firing
我正在构建一个简单的原型,其中我从 Pubsub 读取数据并使用 BeamSQL,代码片段如下
val eventStream: SCollection[String] = sc.pubsubSubscription[String]("projects/jayadeep-etl-platform/subscriptions/orders-dataflow")
.withFixedWindows(Duration.standardSeconds(10))
val events: SCollection[DemoEvents] = eventStream.applyTransform(ParDo.of(new DoFnExample()))
events.map(row=>println("Input Stream:" + row))
val pickup_events = SideOutput[DemoEvents]()
val delivery_events = SideOutput[DemoEvents]()
val (mainOutput: SCollection[DemoEvents], sideOutputs: SideOutputCollections)= events
.withSideOutputs(pickup_events, delivery_events)
.flatMap {
case (evts, ctx) =>
evts.eventType match {
// Send to side outputs via `SideOutputContext`
case "pickup" => ctx.output(pickup_events,evts)
case "delivery" => ctx.output(delivery_events,evts)
}
Some(evts)
}
val pickup: SCollection[DemoEvents] = sideOutputs(pickup_events)
val dropoff = sideOutputs(delivery_events)
pickup.map(row=>println("Pickup:" + row))
dropoff.map(row=>println("Delivery:" + row))
val consolidated_view = tsql"select $pickup.order_id as orderId, $pickup.area as pickup_location, $dropoff.area as dropoff_location , $pickup.restaurant_id as resturantId from $pickup as pickup left outer join $dropoff as dropoff ON $pickup.order_id = $dropoff.order_id ".as[Output]
consolidated_view.map(row => println("Output:" + row))
sc.run().waitUntilFinish()
()
我正在使用 Directrunner 在本地对其进行测试,并且在执行光束 sql 之前我能够立即看到结果。 beam sql 的输出未打印。
Input Stream:DemoEvents(false,pickup,Bangalore,Indiranagar,1566382242,49457442008,1566382242489,7106576,1566382242000,178258,7406545542,,false,null,htr23e22-329a-4b05-99c1-606a3ccf6a48,972)
Pickup:DemoEvents(false,pickup,Bangalore,Indiranagar,1566382242,49457442008,1566382242489,7106576,1566382242000,178258,7406545542,,false,null,htr23e22-329a-4b05-99c1-606a3ccf6a48,972)
Input Stream:DemoEvents(false,delivery,Bangalore,Indiranagar,1566382242,49457442008,2566382242489,7106576,1566382242000,178258,7406545542,,false,null,htr23e22-329a-4b05-99c1-606a3ccf6a48,972)
Delivery:DemoEvents(false,delivery,Bangalore,Indiranagar,1566382242,49457442008,2566382242489,7106576,1566382242000,178258,7406545542,,false,null,htr23e22-329a-4b05-99c1-606a3ccf6a48,972)
这个问题与 DirectRunner 中的一个错误有关,当我将运行器更改为 DataflowRunner 时,代码 运行 如预期的那样。
我正在构建一个简单的原型,其中我从 Pubsub 读取数据并使用 BeamSQL,代码片段如下
val eventStream: SCollection[String] = sc.pubsubSubscription[String]("projects/jayadeep-etl-platform/subscriptions/orders-dataflow")
.withFixedWindows(Duration.standardSeconds(10))
val events: SCollection[DemoEvents] = eventStream.applyTransform(ParDo.of(new DoFnExample()))
events.map(row=>println("Input Stream:" + row))
val pickup_events = SideOutput[DemoEvents]()
val delivery_events = SideOutput[DemoEvents]()
val (mainOutput: SCollection[DemoEvents], sideOutputs: SideOutputCollections)= events
.withSideOutputs(pickup_events, delivery_events)
.flatMap {
case (evts, ctx) =>
evts.eventType match {
// Send to side outputs via `SideOutputContext`
case "pickup" => ctx.output(pickup_events,evts)
case "delivery" => ctx.output(delivery_events,evts)
}
Some(evts)
}
val pickup: SCollection[DemoEvents] = sideOutputs(pickup_events)
val dropoff = sideOutputs(delivery_events)
pickup.map(row=>println("Pickup:" + row))
dropoff.map(row=>println("Delivery:" + row))
val consolidated_view = tsql"select $pickup.order_id as orderId, $pickup.area as pickup_location, $dropoff.area as dropoff_location , $pickup.restaurant_id as resturantId from $pickup as pickup left outer join $dropoff as dropoff ON $pickup.order_id = $dropoff.order_id ".as[Output]
consolidated_view.map(row => println("Output:" + row))
sc.run().waitUntilFinish()
()
我正在使用 Directrunner 在本地对其进行测试,并且在执行光束 sql 之前我能够立即看到结果。 beam sql 的输出未打印。
Input Stream:DemoEvents(false,pickup,Bangalore,Indiranagar,1566382242,49457442008,1566382242489,7106576,1566382242000,178258,7406545542,,false,null,htr23e22-329a-4b05-99c1-606a3ccf6a48,972)
Pickup:DemoEvents(false,pickup,Bangalore,Indiranagar,1566382242,49457442008,1566382242489,7106576,1566382242000,178258,7406545542,,false,null,htr23e22-329a-4b05-99c1-606a3ccf6a48,972)
Input Stream:DemoEvents(false,delivery,Bangalore,Indiranagar,1566382242,49457442008,2566382242489,7106576,1566382242000,178258,7406545542,,false,null,htr23e22-329a-4b05-99c1-606a3ccf6a48,972)
Delivery:DemoEvents(false,delivery,Bangalore,Indiranagar,1566382242,49457442008,2566382242489,7106576,1566382242000,178258,7406545542,,false,null,htr23e22-329a-4b05-99c1-606a3ccf6a48,972)
这个问题与 DirectRunner 中的一个错误有关,当我将运行器更改为 DataflowRunner 时,代码 运行 如预期的那样。