如何在 Hadoop 中对自定义可写类型进行排序

How to sort a custom writable type in Hadoop

我有一个自定义类型,其中包含 Hadoop 本机类型的字段(例如 TextIntWritable),需要将其用作键并在 shuffle/sort 阶段。类似的问题还有 this one and this one,但它们是关于使用原生类型的。如何用自定义类型达到同样的效果,需要满足什么要求?


  1. 首先,自定义类型必须实现 WritableComparable instead of just Writable,当然,还要定义 compareTo() 方法。
  2. 来自Hadoop: The Definitive Guide的非常重要的说明:

    All Writable implementations must have a default constructor so that the MapReduce framework can instantiate them, then populate their fields by calling readFields().


  3. 这一点是关于创建自定义比较器的,如果你对默认排序不满意的话。在这种情况下,您需要创建一个新的 class,它扩展了 WritableComparator and override its compare() method. After this you have two approaches of using this comparator instead of the default one: or you set this class to be used with the help of Job's setSortComparatorClass 方法:



    static {  
        WritableComparator.define(CustomType.class, new YourComparator());

    The static block registers the raw comparator so that whenever MapReduce sees the class, it knows to use the raw comparator as its default comparator.

Here 是带有静态嵌套比较器的 class 的示例。