如何使用 Apache Beam DSL API?

How to use Apache Beam DSL APIs?

我正在尝试实现 Apache Beam documentation 中的 DSL API 示例。我正在使用最新版本的 apache beam 库 (2.4.0)

我运行ning的代码和文档中的一样:

@Rule
public final transient TestPipeline p = TestPipeline.create();

@Test
public void dslTest() {
    RowType appType = RowSqlType
            .builder()
            .withIntegerField("appId")
            .withVarcharField("description")
            .withTimestampField("rowtime")
            .build();

    // Create a concrete row with that type.
    Row row = Row.withRowType(appType)
            .addValues(1, "Some cool app", new Date())
            .build();

    // Create a source PCollection containing only that row
    PCollection<Row> testApps = PBegin
            .in(p)
            .apply(Create
                    .of(row)
                    .withCoder(appType.getRowCoder()));

    PCollection<Row> filteredNames = testApps.apply(
            BeamSql.query(
                    "SELECT appId, description, rowtime "
                            + "FROM PCOLLECTION "
                            + "WHERE appId=1"));
}

这总是失败并出现以下错误:

java.lang.AssertionError
at org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:546)
at org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:365)
at org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:336)
at org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner.convertToBeamRel(BeamQueryPlanner.java:165)
at org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner.validateAndConvert(BeamQueryPlanner.java:156)
at org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner.convertToBeamRel(BeamQueryPlanner.java:144)
at org.apache.beam.sdk.extensions.sql.QueryTransform.expand(QueryTransform.java:73)
at org.apache.beam.sdk.extensions.sql.QueryTransform.expand(QueryTransform.java:47)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:472)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:286)
at PipelineTest.dslTest(PipelineTest.java:42)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.apache.beam.sdk.testing.TestPipeline.evaluate(TestPipeline.java:324)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access[=11=]0(ParentRunner.java:58)
at org.junit.runners.ParentRunner.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

运行此测试的正确方法是什么,或者这是一个错误并且 dsl api 无法正常工作?

这看起来像是一个已知问题,Beam 无法满足规划器中的断言,请参阅此 jira。目前尚未准备好对其进行修复。

当前的解决方法是禁用断言,具体取决于您的构建系统:

  • 如果您使用的是 gradle,那么 in build.gradle 您将得到如下内容:

    test {
        jvmArgs "-da"
    }
    
  • 如果你使用的是 maven,那么 in pom.xml 你会得到这样的东西:

    
     <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <configuration>
          <argLine>-da</argLine>