我将如何解析 Java class 文件常量池?

How would I go about parsing the Java class file constant pool?

根据 https://en.wikipedia.org/wiki/Java_class_file#General_layout - class 文件的 Java 常量池从文件的 10 个字节开始。

到目前为止,我已经能够解析之前的所有内容(检查它是否是 class 文件、major/minor 版本、恒定池大小的魔法)但我仍然不明白究竟如何解析常量池。比如,是否有用于指定方法引用和其他内容的操作码?

在以十六进制表示文本之前,有什么方法可以引用每个十六进制值,以找出以下值是什么?

我是否应该通过 NOP (0x00) 拆分每组条目,然后解析不是文本值的每个字节?

例如,我怎样才能准确计算出每个值代表什么?

您需要的 class 个文件的唯一相关文档是 The Java® Virtual Machine Specification, especially Chapter 4. The class File Format and, if you are going to parse more than the constant pool, Chapter 6. The Java Virtual Machine Instruction Set.

常量池由可变长度的项目组成,其第一个字节决定了它的类型,而类型又决定了大小。大多数项目由一个或两个指向其他项目的索引组成。不需要任何第三方库的简单解析代码可能如下所示:

public static final int HEAD=0xcafebabe;
// Constant pool types
public static final byte CONSTANT_Utf8               = 1;
public static final byte CONSTANT_Integer            = 3;
public static final byte CONSTANT_Float              = 4;
public static final byte CONSTANT_Long               = 5;
public static final byte CONSTANT_Double             = 6;
public static final byte CONSTANT_Class              = 7;
public static final byte CONSTANT_String             = 8;
public static final byte CONSTANT_FieldRef           = 9;
public static final byte CONSTANT_MethodRef          =10;
public static final byte CONSTANT_InterfaceMethodRef =11;
public static final byte CONSTANT_NameAndType        =12;
public static final byte CONSTANT_MethodHandle       =15;
public static final byte CONSTANT_MethodType         =16;
public static final byte CONSTANT_InvokeDynamic      =18;
public static final byte CONSTANT_Module             =19;
public static final byte CONSTANT_Package            =20;

static void parseRtClass(Class<?> clazz) throws IOException, URISyntaxException {
    URL url = clazz.getResource(clazz.getSimpleName()+".class");
    if(url==null) throw new IOException("can't access bytecode of "+clazz);
    Path p = Paths.get(url.toURI());
    if(!Files.exists(p))
        p = p.resolve("/modules").resolve(p.getRoot().relativize(p));
    parse(ByteBuffer.wrap(Files.readAllBytes(p)));
}
static void parseClassFile(Path path) throws IOException {
    ByteBuffer bb;
    try(FileChannel ch=FileChannel.open(path, StandardOpenOption.READ)) {
        bb=ch.map(FileChannel.MapMode.READ_ONLY, 0, ch.size());
    }
    parse(bb);
}
static void parse(ByteBuffer buf) {
    if(buf.order(ByteOrder.BIG_ENDIAN).getInt()!=HEAD) {
        System.out.println("not a valid class file");
        return;
    }
    int minor=buf.getChar(), ver=buf.getChar();
    System.out.println("version "+ver+'.'+minor);
    for(int ix=1, num=buf.getChar(); ix<num; ix++) {
        String s; int index1=-1, index2=-1;
        byte tag = buf.get();
        switch(tag) {
            default:
                System.out.println("unknown pool item type "+buf.get(buf.position()-1));
                return;
            case CONSTANT_Utf8: decodeString(ix, buf); continue;
            case CONSTANT_Class: case CONSTANT_String: case CONSTANT_MethodType:
            case CONSTANT_Module: case CONSTANT_Package:
                s="%d:\t%s ref=%d%n"; index1=buf.getChar();
                break;
            case CONSTANT_FieldRef: case CONSTANT_MethodRef:
            case CONSTANT_InterfaceMethodRef: case CONSTANT_NameAndType:
                s="%d:\t%s ref1=%d, ref2=%d%n";
                index1=buf.getChar(); index2=buf.getChar();
                break;
            case CONSTANT_Integer: s="%d:\t%s value="+buf.getInt()+"%n"; break;
            case CONSTANT_Float: s="%d:\t%s value="+buf.getFloat()+"%n"; break;
            case CONSTANT_Double: s="%d:\t%s value="+buf.getDouble()+"%n"; ix++; break;
            case CONSTANT_Long: s="%d:\t%s value="+buf.getLong()+"%n"; ix++; break;
            case CONSTANT_MethodHandle:
                s="%d:\t%s kind=%d, ref=%d%n"; index1=buf.get(); index2=buf.getChar();
                break;
             case CONSTANT_InvokeDynamic:
                s="%d:\t%s bootstrap_method_attr_index=%d, ref=%d%n";
                index1=buf.getChar(); index2=buf.getChar();
                break;
        }
        System.out.printf(s, ix, FMT[tag], index1, index2);
    }
}
private static String[] FMT= {
    null, "Utf8", null, "Integer", "Float", "Long", "Double", "Class",
    "String", "Field", "Method", "Interface Method", "Name and Type",
    null, null, "MethodHandle", "MethodType", null, "InvokeDynamic",
    "Module", "Package"
};

private static void decodeString(int poolIndex, ByteBuffer buf) {
    int size=buf.getChar(), oldLimit=buf.limit();
    buf.limit(buf.position()+size);
    StringBuilder sb=new StringBuilder(size+(size>>1)+16)
        .append(poolIndex).append(":\tUtf8 ");
    while(buf.hasRemaining()) {
        byte b=buf.get();
        if(b>0) sb.append((char)b);
        else
        {
            int b2 = buf.get();
            if((b&0xf0)!=0xe0)
                sb.append((char)((b&0x1F)<<6 | b2&0x3F));
            else
            {
                int b3 = buf.get();
                sb.append((char)((b&0x0F)<<12 | (b2&0x3F)<<6 | b3&0x3F));
            }
        }
    }
    buf.limit(oldLimit);
    System.out.println(sb);
}

不要对 getChar() 调用感到困惑,我将它们用作获取无符号短整数的便捷方式,而不是 getShort()&0xffff

上面的代码简单地打印了对其他池项的引用索引。为了解码项目,您可以首先将所有项目的数据存储到随机访问数据结构中,即数组或 List 因为项目可能引用具有更高索引号的项目。请注意从索引 1 开始...