从 Java 中的多模块项目生成数据流模板

Generate a Dataflow template from multi module project in Java

我在 Java 方面没有太多经验,尤其是在多模块项目方面,所以我无法从多模块项目创建数据流模板。

要从 Dataflow 模板生成模板,您必须使用如下内容:

mvn compile exec:java \
     -Dexec.mainClass=com.example.myclass \
     -Dexec.args="--runner=DataflowRunner \
                  --project=YOUR_PROJECT_ID \
                  --stagingLocation=gs://YOUR_BUCKET_NAME/staging \
                  --templateLocation=gs://YOUR_BUCKET_NAME/templates/YOUR_TEMPLATE_NAME"

这在一个简单的 Java 项目中对我来说效果很好,但目前我需要在具有以下简化结构的项目中使用以下内容:

C:.
|   pom.xml
|
+---configuration
|   |   dependency-reduced-pom.xml
|   |   pom.xml
|   |
|   +---src
|   |   \---main
|   |       \---java
|   |           \---com
|   |               \---xxx
|   |                   \---gcp
|   |                       \---dataflow
|   |                           \---yyy
|   |                               +---package
|   |                               |   |   java files
|   |
+---pipeline
|   |   dependency-reduced-pom.xml
|   |   pom.xml
|   |
|   +---src
|   |   \---main
|   |       \---java
|   |           \---com
|   |               \---xxx
|   |                   \---gcp
|   |                       \---dataflow
|   |                           \---yyy
|   |                               \---package
|   |                                       MAINJAVACLASS.java
|   |
\---transform
|   |   dependency-reduced-pom.xml
|   |   pom.xml
|   |
|   +---src
|   |   \---main
|   |       +---java
|   |       |   +---com
|   |       |   |   \---xxx
|   |       |   |       \---gcp
|   |       |   |           \---dataflow
|   |       |   |               \---yyy
|   |       |   |                   +---package
|   |       |   |                   |       java files

我已经执行了 mvn package,没有任何错误,输出如下:

[INFO] Reactor Build Order:
[INFO]
[INFO] pipeline-framework                                                 [pom]
[INFO] configuration                                                      [jar]
[INFO] transform                                                          [jar]
[INFO] pipeline                                                           [jar]

<...>

[INFO] Reactor Summary for pipeline-framework 0.1:
[INFO]
[INFO] pipeline-framework ................................. SUCCESS [ 19.076 s]
[INFO] configuration ...................................... SUCCESS [ 25.070 s]
[INFO] transform .......................................... SUCCESS [ 21.625 s]
[INFO] pipeline ........................................... SUCCESS [ 19.365 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS

但是当我尝试执行时:

mvn compile exec:java -Dexec.mainClass=com.xxx.gcp.dataflow.yyy.pipeline.MAINJAVACLASS -Dexec.args=...

我有以下错误:

如果我从根目录执行:

[INFO] Reactor Summary for pipeline-framework 0.1:
[INFO]
[INFO] pipeline-framework ................................. FAILURE [  5.287 s]
[INFO] configuration ...................................... SKIPPED
[INFO] transform .......................................... SKIPPED
[INFO] pipeline ........................................... SKIPPED

<...>

Caused by: java.lang.ClassNotFoundException: com.xxx.gcp.dataflow.yyy.pipeline.MAINJAVACLASS

我也试过:

mvn compile exec:java -pl pipeline <...>

如果我在管道目录中执行它:

Could not resolve dependencies for project com.xxx.gcp.dataflow:pipeline:jar:0.1: The following artifacts could not be resolved: com.xxx.gcp.dataflow:transform:jar:0.1, com.xxx.gcp.dataflow:configuration:jar:0.1: Failure to find com.xxx.gcp.dataflow:transform:jar:0.1 in https://repo.maven.apache.org/maven2

我应该执行哪个命令来构建模板?


主pom.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.xxx.gcp.dataflow</groupId>
  <artifactId>pipeline-framework</artifactId>
  <version>0.1</version>
  <packaging>pom</packaging>

  <modules>
    <module>configuration</module>
    <module>transform</module>
    <module>pipeline</module>
  </modules>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>

    <beam.version>2.16.0</beam.version>

    <maven-compiler-plugin.version>3.7.0</maven-compiler-plugin.version>
    <maven-exec-plugin.version>1.6.0</maven-exec-plugin.version>
    <maven-jar-plugin.version>3.1.2</maven-jar-plugin.version>
    <slf4j.version>1.7.25</slf4j.version>

    <autovalue.annotations.version>1.6</autovalue.annotations.version>
    <autovalue.version>1.6.2</autovalue.version>
  </properties>

  <repositories>
    <repository>
      <id>apache.snapshots</id>
      <name>Apache Development Snapshot Repository</name>
      <url>https://repository.apache.org/content/repositories/snapshots/</url>
      <releases>
        <enabled>false</enabled>
      </releases>
      <snapshots>
        <enabled>true</enabled>
      </snapshots>
    </repository>
  </repositories>

  <dependencyManagement>
    <dependencies>
      <dependency>
        <groupId>${project.groupId}</groupId>
        <artifactId>configuration</artifactId>
        <version>${project.version}</version>
      </dependency>
      <dependency>
        <groupId>${project.groupId}</groupId>
        <artifactId>transform</artifactId>
        <version>${project.version}</version>
      </dependency>
      <dependency>
        <groupId>${project.groupId}</groupId>
        <artifactId>pipeline</artifactId>
        <version>${project.version}</version>
      </dependency>
    </dependencies>
  </dependencyManagement>

  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>${maven-compiler-plugin.version}</version>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <version>2.22.1</version>
        <configuration>
          <useSystemClassLoader>false</useSystemClassLoader>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-jar-plugin</artifactId>
        <version>${maven-jar-plugin.version}</version>
        <configuration>
          <archive>
            <manifest>
              <mainClass>com.xxx.gcp.dataflow.yyy.pipeline.TerraformPipeline</mainClass>
            </manifest>
          </archive>
        </configuration>
      </plugin>

      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.0.0</version>
        <executions>
          <execution>
            <id>bundle-and-repackage</id>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>

              <artifactSet>
                <includes>
                  <include>*:*</include>
                </includes>
              </artifactSet>

              <transformers>
                <transformer
                        implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
              </transformers>

            </configuration>
          </execution>
        </executions>
      </plugin>

    </plugins>

    <pluginManagement>
      <plugins>
        <plugin>
          <groupId>org.codehaus.mojo</groupId>
          <artifactId>exec-maven-plugin</artifactId>
          <version>${maven-exec-plugin.version}</version>
          <configuration>
            <cleanupDaemonThreads>false</cleanupDaemonThreads>
          </configuration>
        </plugin>


      </plugins>


    </pluginManagement>
  </build>

  <dependencies>
    <...>
  </dependencies>
</project>

我认为这是多模块 Maven 项目的常见问题,并非特定于数据流。也许其他线程可以提供帮助:Maven exec:java goal on a multi-module project

那个人提到了您遇到的 MAINJAVACLASS 找不到的问题。另一半我不太确定,我认为罐子丢失的原因是因为 package 生命周期阶段还没有 运行 在你需要 .jar 的模块上.据我所知,exec 插件在构建生命周期的任何特定阶段都没有 运行,因此根据您的信息,我猜它只是在 compile 之后 运行阶段,不产生任何罐子(发生在 package)。

关于构建生命周期的信息:http://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html 关于

的信息

最后,为了解决这个问题,我不得不在编译之前执行:mvn clean install。有了这个,所有的依赖项都安装在我的电脑上,然后使用如下命令:

mvn compile exec:java \
     -Dexec.mainClass=com.example.myclass \
     -Dexec.args="--runner=DataflowRunner \
                  --project=YOUR_PROJECT_ID \
                  --stagingLocation=gs://YOUR_BUCKET_NAME/staging \
                  --templateLocation=gs://YOUR_BUCKET_NAME/templates/YOUR_TEMPLATE_NAME"

模板已创建并上传到 GCS

如果您想使用 Cloud Build 创建模板,可以按照 this 步骤