如何使用 AST 从 Java 方法中提取元数据?

How to extract metadata from Java methods using AST?

是否有 AST 工具可以轻松地从 Java 方法中提取元数据?

例如,使用下面的代码片段

/*
 Checks if a target integer is present in the list of integers.
*/
public Boolean contains(Integer target, List<Integer> numbers) {
    for(Integer number: numbers){
        if(number.equals(target)){
            return true;
        }
    }
    return false;
}

元数据将是:

metadata = {
    "comment": "Checks if a target integer is present in the list of integers.",
    "identifier": "contains",
    "parameters": "Integer target, List<Integer> numbers",
    "return_statement": "Boolean false"

}

这个 class 是很久以前写的。它实际上是大约四个不同的 classes - 散布在一个名为 JavaParserBridge 的包中。它极大地简化了您要尝试做的事情。我去掉了所有不必要的东西,并将其简化为 100 行。大概用了一个小时...

我希望这一切都有意义。我通常会在代码中添加很多注释,但有时在处理其他库时 - 并在 Stack Overflow 上发帖 - 因为这实际上只是一个大构造函数 - 我会给你留下 Java Parser[=21 的文档页面=]

要使用此 class,只需将 Java Class 的 source-code 文件作为单个 java.lang.String,而名为 getMethods(String) 的方法将 return 一个 Java Vector<Method>。 returned Vector 的每个元素都将有一个 Method 的实例,其中应包含您在问题中请求的所有 元信息

IMPORTANT: You can get the JAR File for this package off of the github page. You need the JAR named: javaparser-core-3.16.2.jar

import com.github.javaparser.StaticJavaParser;
import com.github.javaparser.ast.CompilationUnit;
import com.github.javaparser.ast.body.TypeDeclaration;
import com.github.javaparser.ast.body.MethodDeclaration;
import com.github.javaparser.ast.body.Parameter;
import com.github.javaparser.ast.type.ReferenceType;
import com.github.javaparser.ast.type.TypeParameter;
import com.github.javaparser.ast.Node;
import com.github.javaparser.ast.NodeList;
import com.github.javaparser.ast.Modifier; // Modifiers are the key-words such as "public, private, static, etc..."
import com.github.javaparser.printer.lexicalpreservation.LexicalPreservingPrinter;
import com.github.javaparser.printer.lexicalpreservation.PhantomNodeLogic;

import java.io.IOException;
import java.util.Vector;


public class Method
{
    public final String name, signature, jdComment, body, returnType;
    public final String[] modifiers, parameterNames, parameterTypes, exceptions;

    private Method (MethodDeclaration md)
    {

        NodeList<Parameter>     paramList       = md.getParameters();
        NodeList<ReferenceType> exceptionList   = md.getThrownExceptions();
        NodeList<Modifier>      modifiersList   = md.getModifiers();

        this.name           = md.getNameAsString();
        this.signature      = md.getDeclarationAsString();
        this.jdComment      = (md.hasJavaDocComment() ? md.getJavadocComment().get().toString() : null);
        this.returnType     = md.getType().toString();
        this.modifiers      = new String[modifiersList.size()];
        this.parameterNames = new String[paramList.size()];
        this.parameterTypes = new String[paramList.size()];
        this.exceptions     = new String[exceptionList.size()];
        this.body           = (md.getBody().isPresent()
                                ?   LexicalPreservingPrinter.print
                                        (LexicalPreservingPrinter.setup(md.getBody().get()))
                                :   null);

        int i=0;
        for (Modifier modifier : modifiersList) modifiers[i++] = modifier.toString();

        i=0;
        for (Parameter p : paramList)
        {
            parameterNames[i]           = p.getName().toString();
            parameterTypes[i]           = p.getType().toString();
            i++;
        }

        i=0;
        for (ReferenceType r : exceptionList) this.exceptions[i++] = r.toString();
    }

    public static Vector<Method> getMethods(String sourceFileAsString) throws IOException
    {
        // This is the "Return Value" for this method (a Vector)
        final Vector<Method> methods = new Vector<>();

        // This asks Java Parser to parse the source code file
        // The String-parameter 'sourceFileAsString' should have this

        CompilationUnit cu = StaticJavaParser.parse(sourceFileAsString);

        // This will "walk" all of the methods that were parsed by
        // StaticJavaParser, and retrieve the method information.
        // The method information is stored in a class simply called "Method"

        cu.walk(MethodDeclaration.class, (MethodDeclaration md) -> methods.add(new Method(md)));

        // There is one important thing to do: clear the cache
        // Memory leaks shall occur if you do not.

        PhantomNodeLogic.cleanUpCache(); 

        // return the Vector<Method>
        return methods;
    }
}

您需要将此方法添加到上面的 class...我很少(如果有的话)为单个 Stack Overflow 问题添加多个答案。但是,由于这变成了很多代码,我不想让这个变得过于复杂,而是将这个 main 方法作为你问题的单独答案发布。

您需要在上面的 class 中包含此方法,它会正确处理我从您的网站下载的文件 functions.json。正在处理的文件名为 functions.json,它包含方法列表及其 data-base ID。

另外: 确保添加行:import java.util.regex.* 因为此方法使用 java class Patternclass Matcher


    public static void main(String[] argv) throws IOException
    {
        // "321": "\tpublic int getPushesLowerbound() {\n\t\treturn pushesLowerbound;\n\t}\n",
        // If you have not used "Regular Expressions" before, you are just
        // going to have to read about them.  This "Regular Expression" parses your
        // JSON "functions.json" file.  It is a little complicated, but not too bad.

        Pattern         P1          = Pattern.compile("^\s+\"(\d+)\"\:\s+\"(.*?)\\n\",$");
        BufferedReader  br          = new BufferedReader(new FileReader(new File("functions.json")));
        String          s           = br.readLine();

        // Any time you have a "Constructor" instead of a method, you should
        // use some other method in `StaticJavaParser` to deal with it.
        // for now, I am just going to keep a "Fail List" instead..

        int             failCount   = 0;
        Vector<String>  failIDs     = new Vector<>();
 
        while (! (s = br.readLine()).equals("}"))
        {
            // Parse the JSON using a Regular Expression.  It is just easier to do it this way
            // You have a VERY BASIC json file.

            Matcher m = P1.matcher(s);
            
            // I do not think any of the String's will fail the regular expression matcher.
            // Just in case, continue if the Regular Expression Match Failed.
            if (! m.find()) { System.out.print("."); continue; }
            
            // The ID is the first JSON element matched by the regular expression
            String id = m.group(1);
            
            // The source code is the second JSON element matched by the regular-expression
            // NOTE: Your source-code is not perfect... It has "escape sequences", so these sequennces
            //       have to be "unescaped"
            // ALSO: this is not the most efficient way to "un-escape" an escape-sequence, but I would
            //       have to include an external library to do it the right way, so I'm going to leave
            //       this version here for your to think about.
            String src = m.group(2)
                .replace("\\", "" + ((char) 55555))
                .replace("\n", "\n")
                .replace("\t", "\t")
                .replace("\\"", "\"")
                .replace("" + ((char) 55555), "\");

            // Java Parser has a method EXPLICITLY FOR parsing method Declarations.
            // Your "functions.json" file has a list of method-declarations.
            MethodDeclaration   md          = null;

            // I found one that failed - it was a constructor..
            try
                { md = StaticJavaParser.parseMethodDeclaration(src); }
            catch (Exception e)
                { System.out.println(src); e.printStackTrace(); failCount++; continue; }

            Method method = new Method(md);

            System.out.print(
                "ID:           " + id + '\n' +
                "Name:         " + method.name + '\n' +
                "Return Type:  " + method.returnType + '\n' +
                "Parameters:   "
            );

            for (int i=0; i < method.parameterNames.length; i++)
                System.out.print(method.parameterNames[i] + '(' + method.parameterTypes[i] + ")  ");

            System.out.println("\n");

            PhantomNodeLogic.cleanUpCache();
        }
        
        System.out.print(
            "Fail Count: " + failCount + "\n" +
            "Failed ID's: "
        );
        for (String failID : failIDs) System.out.print(failID + " ");
        System.out.println();
    }

上述方法会产生这种类型的输出。既然你有 - 从字面上 - 一百万个方法,它会 运行 一段时间。

注意:并非该列表中的每个方法都是有效方法。如果有构造函数而不是方法,则需要将其解析为构造函数。对于 JavaParser 无法解析的方法,有一个“失败列表”——我将把它留作练习,让您了解如何处理 Constructors(未被名为 parseMethodDeclaration

StaticJavaParser 方法解析

注意: 这将 运行 持续 很长时间 - 我只发布了这个 main(String[] argv) 方法输出的(非常)小的子集...


ID:           32808641
Name:         addUnboundTypePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808649
Name:         addNamePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808650
Name:         addInputParameterPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808651
Name:         addQualifiedNamePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808652
Name:         addOutputParameterPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808656
Name:         addReturnParameterPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808658
Name:         addSignatureParameterPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808659
Name:         getLabelProvider
Return Type:  IItemLabelProvider
Parameters:   namedElement(NamedElement)

ID:           32808661
Name:         getLabel
Return Type:  String
Parameters:   namedElement(NamedElement)

ID:           32808677
Name:         addBodyPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808678
Name:         addLanguagePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808696
Name:         addKindPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808707
Name:         addStaticPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808708
Name:         addKindPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808709
Name:         addSemanticsPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808711
Name:         addConstrainedElementPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808713
Name:         addDefinedFeaturePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808727
Name:         addNestingNamespacePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808741
Name:         addKindPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808749
Name:         addSuperTypePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32814359
Name:         getResource
Return Type:  ResourceBundle
Parameters:   name(String)  language(String)

ID:           32814360
Name:         store
Return Type:  void
Parameters:   resource(ResourceBundle)  name(String)  language(String)

ID:           32814364
Name:         getString
Return Type:  String
Parameters:   key(String)  resourceName(String)  language(String)

ID:           32814400
Name:         getGlobalCompletionRate
Return Type:  double
Parameters:

ID:           32814409
Name:         setCurrentSubTask
Return Type:  void
Parameters:   subTask(TaskMonitor)  subTaskShare(double)

ID:           32814429
Name:         enforceCompletion
Return Type:  void
Parameters:

ID:           32814431
Name:         getCurrentActiveSubTask
Return Type:  TaskMonitor
Parameters:

ID:           32814469
Name:         checkTaskState
Return Type:  void
Parameters:

ID:           32814619
Name:         getReportAsText
Return Type:  String
Parameters:   report(ProcessReport)

ID:           32815305
Name:         showRecoveryResultWindow
Return Type:  void
Parameters:   context(ProcessContext)

ID:           32815353
Name:         validateStructure
Return Type:  void
Parameters:

ID:           32815413
Name:         buildArchive
Return Type:  void
Parameters:   context(ProcessContext)

ID:           32815445
Name:         checkArchiveCompatibility
Return Type:  boolean
Parameters:   archive(File)

ID:           32815446
Name:         checkStupidConfigurations
Return Type:  boolean
Parameters:

ID:           32815472
Name:         getDescription
Return Type:  String
Parameters:

ID:           32815501
Name:         getDataDirectory
Return Type:  File
Parameters:   archive(File)

重要提示:(再次)任何时候你的任何 Data-base 函数都是 构造函数 而不是 methods 我在 class StaticJavaParser 中使用的 JavaParser 方法将抛出异常。

看这里:这是一个构造函数:


ID:           32812832
Name:         run
Return Type:  void
Parameters:

        public PeriodicData (String secProp ) {
                this.interval = 300;
                try {
                        this.interval = Integer.parseInt( secProp );
                } catch (Exception e ) {} // use default 5m

        }

我发布的代码在遇到它时会打印此消息:


com.github.javaparser.ParseProblemException: Encountered unexpected token: "(" "("
    at line 1, column 22.

Was expecting one of:

    "enum"
    "exports"
    "module"
    "open"
    "opens"
    "provides"
    "requires"
    "strictfp"
    "to"
    "transitive"
    "uses"
    "with"
    "yield"
    <IDENTIFIER>

Problem stacktrace :
  com.github.javaparser.GeneratedJavaParser.generateParseException(GeneratedJavaParser.java:10906)
  com.github.javaparser.GeneratedJavaParser.jj_consume_token(GeneratedJavaParser.java:10752)
  com.github.javaparser.GeneratedJavaParser.Identifier(GeneratedJavaParser.java:2193)
  com.github.javaparser.GeneratedJavaParser.SimpleName(GeneratedJavaParser.java:2127)
  com.github.javaparser.GeneratedJavaParser.MethodDeclaration(GeneratedJavaParser.java:1224)
  com.github.javaparser.GeneratedJavaParser.MethodDeclarationParseStart(GeneratedJavaParser.java:6020)
  com.github.javaparser.JavaParser.parse(JavaParser.java:123)
  com.github.javaparser.JavaParser.parseMethodDeclaration(JavaParser.java:540)
  com.github.javaparser.StaticJavaParser.parseMethodDeclaration(StaticJavaParser.java:480)
  Method.main(Method.java:110)

        at com.github.javaparser.StaticJavaParser.handleResult(StaticJavaParser.java:260)
        at com.github.javaparser.StaticJavaParser.parseMethodDeclaration(StaticJavaParser.java:480)
        at Method.main(Method.java:110)
        public PeriodicData (int seconds ) {
                this.interval = seconds;
        }
com.github.javaparser.ParseProblemException: Encountered unexpected token: "(" "("
    at line 1, column 22.

Was expecting one of:

    "enum"
    "exports"
    "module"
    "open"
    "opens"
    "provides"
    "requires"
    "strictfp"
    "to"
    "transitive"
    "uses"
    "with"
    "yield"
    <IDENTIFIER>

Problem stacktrace :
  com.github.javaparser.GeneratedJavaParser.generateParseException(GeneratedJavaParser.java:10906)
  com.github.javaparser.GeneratedJavaParser.jj_consume_token(GeneratedJavaParser.java:10752)
  com.github.javaparser.GeneratedJavaParser.Identifier(GeneratedJavaParser.java:2193)
  com.github.javaparser.GeneratedJavaParser.SimpleName(GeneratedJavaParser.java:2127)
  com.github.javaparser.GeneratedJavaParser.MethodDeclaration(GeneratedJavaParser.java:1224)
  com.github.javaparser.GeneratedJavaParser.MethodDeclarationParseStart(GeneratedJavaParser.java:6020)
  com.github.javaparser.JavaParser.parse(JavaParser.java:123)
  com.github.javaparser.JavaParser.parseMethodDeclaration(JavaParser.java:540)
  com.github.javaparser.StaticJavaParser.parseMethodDeclaration(StaticJavaParser.java:480)
  Method.main(Method.java:110)

        at com.github.javaparser.StaticJavaParser.handleResult(StaticJavaParser.java:260)
        at com.github.javaparser.StaticJavaParser.parseMethodDeclaration(StaticJavaParser.java:480)
        at Method.main(Method.java:110)