如何使用 pycparser 删除 AST 节点?

How to remove AST nodes with pycparser?

让我们从考虑这个片段开始:

import sys

from pycparser import c_parser, c_ast, c_generator


text = r"""
void main() {
    foo(1,3);

    foo1(4);

    x = 1;

     foo2(4,

        10,


        3);
    foo3(
        "xxx"

    );
}
"""


class FuncCallVisitor(c_ast.NodeVisitor):

    def visit_FuncCall(self, node):
        print('%s called at %s' % (node.name.name, node.name.coord))

        if node.args:
            self.visit(node.args)


class RemoveFuncCalls(c_generator.CGenerator):

    def visit_FuncCall(self, n):
        # fref = self._parenthesize_unless_simple(n.name)
        # return fref + '(' + self.visit(n.args) + ')'
        return ""


if __name__ == '__main__':
    parser = c_parser.CParser()
    ast = parser.parse(text)
    v = FuncCallVisitor()
    v.visit(ast)
    print('-' * 80)

    ast.show(showcoord=True)
    generator = RemoveFuncCalls()

    print('-' * 80)
    print(generator.visit(ast))

上面的输出将是:

void main()
{
  ;
  ;
  x = 1;
  ;
  ;
}

但我希望它变成这样:

void main()
{
  x = 1;
}

所以我的问题是,使用 pycparser 从 AST 中删除 nodes/subtrees 的 canonical/idiomatic 方法是什么?

看起来 c_generator.CGenerator 调用 _generate_stmt method for scope-like structures which appends ';\n'(缩进)到 visit for 语句的结果,即使它是一个空字符串。

要删除函数调用,我们可以像这样重载它

class RemoveFuncCalls(c_generator.CGenerator):
    def _generate_stmt(self, n, add_indent=False):
        if isinstance(n, c_ast.FuncCall):
            return ''
        else:
            return super()._generate_stmt(n, add_indent)

有了那个

void main()
{
  x = 1;
}

这看起来像你想要的。

我们来看一个案例

if (bar(42, "something"))
    return;

如果我们需要它成为

if ()
    return;

然后我们需要添加

    def visit_FuncCall(self, n):
        return ''

就像在 OP 中一样,因为 RemoveFuncCalls.visit_If 方法不调用 _generate_stmt 进行 cond 字段序列化。

更进一步

我不知道 "canonical/idiomatic way to delete nodes/subtrees from the AST with pycparser" 是什么,但我确实知道一个来自 stdlib 的 ast 模块——ast.NodeTransformer class(对于一些人来说 pycparser 中没有原因)。

它将允许我们通过覆盖私有方法和修改 AST 本身来避免混淆 AST 序列化为 str 的方式

from pycparser import c_ast

class NodeTransformer(c_ast.NodeVisitor):
    def generic_visit(self, node):
        for field, old_value in iter_fields(node):
            if isinstance(old_value, list):
                new_values = []
                for value in old_value:
                    if isinstance(value, c_ast.Node):
                        value = self.visit(value)
                        if value is None:
                            continue
                        elif not isinstance(value, c_ast.Node):
                            new_values.extend(value)
                            continue
                    new_values.append(value)
                old_value[:] = new_values
            elif isinstance(old_value, c_ast.Node):
                new_node = self.visit(old_value)
                setattr(node, field, new_node)
        return node


def iter_fields(node):
    # this doesn't look pretty because `pycparser` decided to have structure 
    # for AST node classes different from stdlib ones
    index = 0
    children = node.children()
    while index < len(children):
        name, child = children[index]
        try:
            bracket_index = name.index('[')
        except ValueError:
            yield name, child
            index += 1
        else:
            name = name[:bracket_index]
            child = getattr(node, name)
            index += len(child)
            yield name, child

对于我们的案例,它可以简单地进行子类化

class FuncCallsRemover(NodeTransformer):
    def visit_FuncCall(self, node):
        return None

并像

一样使用
...
ast = parser.parse(text)
v = FuncCallsRemover()
ast = v.visit(ast)  # note that `NodeTransformer` returns modified AST instead of `None`

之后我们可以使用未修改的 c_generator.CGenerator 实例并获得相同的结果。