Hadoop 文本是可变的

Question

我无法理解他这段代码的逻辑：

    Text text = new Text("hadoop");
    System.out.println(text.getLength());
    System.out.println(text.getBytes().length);
    text.set(new Text("pig"));
    System.out.println(text.getLength());
    System.out.println(text.getBytes().length);

为什么最后的打印语句给出 6 而不是 3？请解释 - 完全糊涂了。

Answer 1

文本由字节数组支持以保存其实际值。更新 Text 的值时，如果新值适合当前字节数组（否则会创建一个新字节数组），则字节数组的内容将被覆盖，而不是替换整个字节数组对象。在您的示例中，您使用值 "hadoop" 初始化 Text，这需要一个长度为 6 的字节数组。当您将新值设置为 "pig" 时，"pig" 被复制到现有的字节数组中长度为 6，即 Text 不会为此创建长度为 3 的新字节数组。我假设，字节数组被重用以减少对象实例化的数量并减轻垃圾收集器的压力。

Text.getBytes() 的 JavaDocs 说：

Returns the raw bytes; however, only data up to getLength() is valid. Please use copyBytes() if you need the returned array to be precisely the length of the data.

Hadoop 文本是可变的

Hadoop Text is mutable

java

hadoop

mapreduce

bigdata