在 apache beam java 中使用带有自定义 POJO Java class 的列表时收到如此多的警告

Getting so many warning while using List with custom POJO Java class in apache beam java

我是 Apache Beam 的新手,我正在使用 Apache Beam 并且作为运行者在 GCP.I 中使用 Dataflow 在执行管道时遇到以下错误。

coder of type class org.apache.beam.sdk.coders.ListCoder has a #structuralValue method which does not return true when the encoding of the elements is equal. Element [Person [businessDay=01042020, departmentId=101, endTime=2020-04-01T09:06:02.000Z, companyId=242, startTime=2020-04-01T09:00:33.000Z], Person [businessDay=01042020, departmentId=101, endTime=2020-04-01T09:07:47.000Z, companyId=242, startTime=2020-04-01T09:06:03.000Z], Person [businessDay=01042020, departmentId=101, endTime=2020-04-01T09:48:25.000Z, companyId=242, startTime=2020-04-01T09:07:48.000Z]]

PCollection 类似于 PCollection< KV < String,List < Person > > > 和 PCollection< KV < String,Iterable < List < Person > > >

我已经将 Person 实现为可序列化的 POJO class 并覆盖 equals 和 hash 方法 also.But 我想我还需要为 person 编写自定义 ListCoder 并在管道中注册。 我不知道如何解决这个问题,请帮助。

这是一个有效的 example。 如果你克隆repo,在playground根目录下,运行./gradlew run,你可以验证效果。您还可以 运行 与 ./gradlew run --args='--runner=DataflowRunner --project=$YOUR_PROJECT_ID --tempLocation=gs://xxx/staging --stagingLocation=gs://xxx/staging' 到 运行 它在 Dataflow 上。

如果您从头开始构建 Person class 应该如下所示:

class Person implements Serializable {
  public Person(
      String businessDay,
      String departmentId,
      String companyId
  ) {
    this.businessDay = businessDay;
    this.departmentId = departmentId;
    this.companyId = companyId;
  }

  public String companyId() {
    return companyId;
  }

  public String businessDay() {
    return businessDay;
  }

  public String departmentId() {
    return departmentId;
  }

  @Override
  public boolean equals(Object other) {
    if (this == other) {
      return true;
    }
    if (other == null) {
      return false;
    }
    if (getClass() != other.getClass()) {
      return false;
    }
    Person otherPerson = (Person) other;
    return this.businessDay.equals(otherPerson.businessDay)
        && this.departmentId.equals(otherPerson.departmentId)
        && this.companyId.equals(otherPerson.companyId);
  }

  @Override
  public int hashCode(){
    return Objects.hash(this.businessDay, this.departmentId, this.companyId);
  }

  private final String businessDay;
  private final String departmentId;
  private final String companyId;
}

我推荐

  • 使用 AutoValue instead of creating POJO from scratch. Here are some examples. You can view the whole project here。优点是您不必在每次创建新对象类型时都从头开始实现 equalshashCode

  • 在KV中,如果key是一个可迭代对象,如List,将其包裹在一个对象中,明确地确定性序列化(example),因为[=36=中的序列化] 不确定。