Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/RecordReader

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/RecordReader

我正在尝试将我的 Json 文件转换为 Parquet 格式。

以下是我的 pom 文件。

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.mypackage</groupId>
    <artifactId>JSONToParquet</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>

    <repositories>
        <repository>
            <id>wso2</id>
            <url>http://dist.wso2.org/maven2/</url>
        </repository>
    </repositories>
    <dependencies>
        <dependency>
            <groupId>org.kitesdk</groupId>
            <artifactId>kite-data-core</artifactId>
            <version>1.1.0</version>
        </dependency>

        <dependency>
            <groupId>org.kitesdk</groupId>
            <artifactId>kite-morphlines-all</artifactId>
            <version>1.0.0</version> <!-- or whatever the latest version is -->
            <type>pom</type>
        </dependency>

        <!-- https://mvnrepository.com/artifact/ua_parser/ua-parser -->
        <dependency>
            <groupId>ua_parser</groupId>
            <artifactId>ua-parser</artifactId>
            <version>1.3.0</version>
            <type>pom</type>
        </dependency>

    </dependencies>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>


</project>

转换代码如下:

Schema jsonSchema = JsonUtil.inferSchema(inputstream, "Movie", 10);
try (JSONFileReader<Movie> reader = new JSONFileReader<>(
        inputstream, jsonSchema, Movie.class)) {

    reader.initialize();

    ParquetWriter parquetWriter
            = new AvroParquetWriter(outputPath, jsonSchema, compressionCodecName, ParquetWriter.DEFAULT_BLOCK_SIZE, ParquetWriter.DEFAULT_PAGE_SIZE);

    for (Movie record : reader) {
        parquetWriter.write(record);
    }

上面代码中Movie是我的POJO class.

当我 运行 我面临以下异常的程序时:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/RecordReader
    at com.mypackage.jsontoparquet.JsonToParquet.main(JsonToParquet.java:34)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.RecordReader
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 1 more

我正在使用 JDK : 8.

我没有任何hadoop背景,所以我无法理解它的根本原因。

问题是什么?

根据 Kite-SDK 文档,JSONFileReaderParquetWriterAvroParquetWriter 使用 Hadoop 库工作。需要在你的 pom 中添加 Hadoop 依赖。您至少需要以下依赖项。将它们添加到您的 pom.xml:

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-core</artifactId>
    <version>2.6.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.6.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
    <version>2.6.0</version>
</dependency>

你的风筝不见了 hadoop dependencies

there are some cases where you may have to provide the relevant Hadoop component dependencies yourself, and Kite has grouping dependencies for this purpose.

对于 Haddop2(默认)添加到您的 pom:

 <dependency>
   <groupId>org.kitesdk</groupId>
   <artifactId>kite-hadoop2-dependencies</artifactId>
    <version>1.0.0</version>
   <type>pom</type>
   <scope>compile</scope>
 </dependency>