Python 在 Emacs 中:跳转到全局常量的定义

Python in Emacs: Jump to the definition of a global constant

为我的项目 (find . -name "*.py" | xargs etags) 创建一个 TAGS 文件后,我可以使用 M-. 跳转到函数的定义。那太棒了。但是如果我想要一个全局常量的定义——比如说,x = 3——Emacs 不知道在哪里可以找到它。

有什么方法可以向 Emacs 解释在哪里定义了常量,而不仅仅是函数?对于函数(或 for 循环或诸如此类)中定义的任何内容,我都不需要它,只需要全局的。

更多详情

这个问题的前一个版本使用“顶级”而不是“全局”,但在@Thomas 的帮助下我意识到这是不精确的。我所说的全局定义是指模块定义的任何东西。因此在

import m

if m.foo:
  def f():
    x = 3
    return x
  y, z = 1, 2
else:
  def f():
    x = 4
    return x
  y, z = 2, 3
del(z)

模块定义的内容是 fy,尽管这些定义的位置向右缩进。 x是局部变量,z的定义在模块结束前被删除

相信捕获所有全局赋值的足够规则是在def表达式中简单地忽略它们(注意def关键字本身可能在任何级别缩进)并以其他方式解析 = 左侧的任何符号(注意可能不止一个,因为 Python 支持元组分配)。

Etags 似乎无法为 Python 文件生成此类信息,您可以通过 运行 在一个简单的测试文件上轻松验证这些信息:

x = 3

def fun():
    pass

运行 etags test.py 生成具有以下内容的 TAGS 文件:

/tmp/test.py,13
def fun(3,7

如您所见,x 在这个文件中完全不存在,因此 Emacs 没有机会找到它。

调用 etags' 手册页告诉我们有一个选项 --globals:

   --globals
          Create tag entries for global variables in  Perl  and  Makefile.
          This is the default in C and derived languages.

但是,这似乎是文档与实现不同步的悲惨案例之一,因为此选项似乎不存在。 (etags -h 也没有列出它,只有 --no-globals - 可能是因为 --globals 是默认值,如上所述。)

然而,即使 --globals 是默认值,文档片段也表示它仅适用于 Perl、Makesfiles、C 和派生语言。我们可以通过创建另一个简单的测试文件来检查是否是这种情况,这次是 C:

int x = 3;

void fun() {
}

确实,运行 etags test.c 生成以下 TAGS 文件:

/tmp/test.c,26
int x 1,0
void fun(3,12

您看到 x 被正确识别为 C。因此 etags 似乎根本不支持 Python 的全局变量。

但是,由于 Python 使用空格,在源文件中识别全局变量定义并不难 - 您基本上可以 grep 识别所有未开始的行带有空格但包含 = 符号(当然也有例外)。

因此,我编写了以下脚本来执行此操作,您可以将其用作 etags 的直接替代,因为它在内部调用 etags

#!/bin/bash

# make sure that some input files are provided, or else there's
# nothing to parse
if [ $# -eq 0 ]; then
    # the following message is just a copy of etags' error message
    echo "$(basename [=15=]): no input files specified."
    echo "  Try '$(basename [=15=]) --help' for a complete list of options."
    exit 1
fi

# extract all non-flag parameters as the actual filenames to consider
TAGS2="TAGS2"
argflags=($(etags -h | grep '^-' | sed 's/,.*$//' | grep ' ' | awk '{print }'))
files=()
skip=0 
for arg in "${@}"; do
    # the variable 'skip' signals arguments that should not be
    # considered as filenames, even though they don't start with a
    # hyphen
    if [ ${skip} -eq 0 ]; then
        # arguments that start with a hyphen are considered flags and
        # thus not added to the 'files' array
        if [ "${arg:0:1}" = '-' ]; then
            if [ "${arg:0:9}" = "--output=" ]; then
                TAGS2="${arg:9}2"
            else
                # however, since some flags take a parameter, we also
                # check whether we should skip the next command line
                # argument: the arguments for which this is the case are
                # contained in 'argflags'
                for argflag in ${argflags[@]}; do
                    if [ "${argflag}" = "${arg}" ]; then
                        # we need to skip the next 'arg', but in case the
                        # current flag is '-o' we should still look at the
                        # next 'arg' so as to update the path to the
                        # output file of our own parsing below
                        if [ "${arg}" = "-o" ]; then
                            # the next 'arg' will be etags' output file
                            skip=2                  
                        else
                            skip=1
                        fi
                        break
                    fi
                done
            fi
        else
            files+=("${arg}")
        fi
    else
        # the current 'arg' is not an input file, but it may be the
        # path to the etags output file
        if [ "${skip}" = 2 ]; then
            TAGS2="${arg}2"
        fi
        skip=0
    fi
done

# create a separate TAGS file specifically for global variables
for file in "${files[@]}"; do
    # find all lines that are not indented, are not comments or
    # decorators, and contain a '=' character, then turn them into
    # TAGS format, except that the filename is prepended
    grep -P -Hbn '^[^[# \t].*=' "${file}" | sed -E 's/([0-9]+):([0-9]+):([^= \t]+)\s*=.*$/\x7f,/'
done |\

# count the bytes of each entry - this is needed for the TAGS
# specification
while read line; do
    echo "$(echo $line | sed 's/^.*://' | wc -c):$line"
done |\

# turn the information above into the correct TAGS file format
awk -F: '
    BEGIN { filename=""; numlines=0 }
    { 
        if (filename != ) {
            if (numlines > 0) {
                print "\x0c\n" filename "," bytes+1

                for (i in lines) {
                    print lines[i]
                    delete lines[i]
                }
            }

            filename=
            numlines=0
            bytes=0
        }

        lines[numlines++] = ;
        bytes += ;
    }
    END {
        if (numlines > 0) {
            print "\x0c\n" filename "," bytes+1

            for (i in lines)
                print lines[i]
        }
    }' > "${TAGS2}"

# now run the actual etags, instructing it to include the global
# variables information
if ! etags -i "${TAGS2}" "${@}"; then
    # if etags failed to create the TAGS file, also delete the TAGS2
    # file
    /bin/rm -f "${TAGS2}"
fi

使用方便的名称将此脚本存储在您的 $PATH 中(我建议像 etags+ 这样的名称),然后这样调用它:

find . -name "*.py" | xargs etags+

除了创建一个 TAGS 文件外,该脚本还为所有全局变量定义创建一个 TAGS2 文件,并在引用后者的原始 TAGS 文件中添加一行。

从Emacs的角度来看,在使用上没有区别。

另一个答案只考虑没有缩进的行包含全局变量声明。虽然这有效地排除了函数体和 class 定义,但它遗漏了在 if 声明中定义的全局变量。这样的声明并不少见,例如,根据使用的 OS 等不同的常量

正如在问题下的评论中所论证的那样,任何静态分析都必然是不完美的,因为 Python 的动态特性使得除非程序实际执行,否则无法完全准确地决定哪些变量是全局定义的。

因此,以下也只是一个近似值。但是,它确实考虑了 if 内的全局变量定义,如上所述。由于这最好通过实际分析源文件的解析树来完成,因此 bash 脚本不再是合适的选择。不过,方便的是,Python 本身允许通过此处使用的 ast 包轻松访问解析树。

from argparse import ArgumentParser, SUPPRESS
import ast
from collections import Counter
from re import match as re_startswith
import os
import subprocess
import sys

# extract variable information from assign statements
def process_assign(target, results):
    if isinstance(target, ast.Name):
        results.append((target.lineno, target.col_offset, target.id))
    elif isinstance(target, ast.Tuple):
        for child in ast.iter_child_nodes(target):
            process_assign(child, results)

# extract variable information from delete statements
def process_delete(target, results):
    if isinstance(target, ast.Name):
        results[:] = filter(lambda t: t[2] != target.id, results)
    elif isinstance(target, ast.Tuple):
        for child in ast.iter_child_nodes(target):
            process_delete(child, results)

# recursively walk the parse tree of the source file
def process_node(node, results):
    if isinstance(node, ast.Assign):
        for target in node.targets:
            process_assign(target, results)
    elif isinstance(node, ast.Delete):
        for target in node.targets:
            process_delete(target, results)
    elif type(node) not in [ast.FunctionDef, ast.ClassDef]:
        for child in ast.iter_child_nodes(node):
            process_node(child, results)

def get_arg_parser():
    # create the parser to configure
    parser = ArgumentParser(usage=SUPPRESS, add_help=False)

    # run etags to find out about the supported command line parameters
    dashlines = list(filter(lambda line: re_startswith('\s*-', line),
                            subprocess.check_output(['etags', '-h'],
                                                    encoding='utf-8').split('\n')))

    # ignore lines that start with a dash but don't have the right
    # indentation
    most_common_indent = max([(v,k) for k,v in
                              Counter([line.index('-') for line in dashlines]).items()])[1]
    arglines = filter(lambda line: line.index('-') == most_common_indent, dashlines)

    for argline in arglines:
        # the various 'argline' entries contain the command line
        # arguments for etags, sometimes more than one separated by
        # commas.
        for arg in argline.split(','):
            if 'or' in arg:
                arg = arg[:arg.index('or')]
            if ' ' in arg or '=' in arg:
                arg = arg[:min(arg.index(' ') if ' ' in arg else len(arg),
                               arg.index('=') if '=' in arg else len(arg))]
                action='store'
            else:
                action='store_true'
            arg = arg.strip()
            if arg and not (arg == '-h' or arg == '--help'):
                parser.add_argument(arg, action=action)

    # we know we need files to run on
    parser.add_argument('files', nargs='*', metavar='file')

    # the parser is configured now to accept all of etags' arguments
    return parser


if __name__ == '__main__':
    # construct a parser for the command line arguments, unless
    # -h/-help/--help is given in which case we just print the help
    # screen
    etags_args = sys.argv[1:]
    if '-h' in etags_args or '-help' in etags_args or '--help' in etags_args:
        unknown_args = True
    else:
        argparser = get_arg_parser()
        known_ns, unknown_args = argparser.parse_known_args()

    # if something's wrong with the command line arguments, print
    # etags' help screen and exit
    if unknown_args:
        subprocess.run(['etags', '-h'], encoding='utf-8')
        sys.exit(1)

    # we base the output filename on the TAGS file name.  Other than
    # that, we only care about the actual filenames to parse, and all
    # other command line arguments are simply passed to etags later on
    tags_file = 'TAGS2' if hasattr(known_ns, 'o') is None else known_ns.o + '2'
    filenames = known_ns.files

    if filenames:
        # TAGS file sections, one per source file
        sections = []

        # process all files to populate the 'sections' list
        for filename in filenames:
            # read source file
            offsets = [0]; lines = []
            offsets, lines = [0], []
            with open(filename, 'r') as f:
                for line in f.readlines():
                    offsets.append(offsets[-1] + len(bytes(line, 'utf-8')))
                    lines.append(line)

            offsets = offsets[:-1]

            # parse source file
            source = ''.join(lines)
            root_node = ast.parse(source, filename)

            # extract global variable definitions
            vardefs = []
            process_node(root_node, vardefs)

            # create TAGS file section
            sections.append("")
            for lineno, column, varname in vardefs:
                line = lines[lineno-1]
                offset = offsets[lineno-1]
                end = line.index('=') if '=' in line else -1
                sections[-1] += f"{line[:end]}\x7f{varname}\x01{lineno},{offset + column - 1}\n"

        # write TAGS file
        with open(tags_file, 'w') as f:
            for filename, section in zip(filenames, sections):
                if section:
                    f.write("\x0c\n")
                    f.write(filename)
                    f.write(",")
                    f.write(str(len(bytes(section, 'utf-8'))))
                    f.write("\n")
                    f.write(section)
                    f.write("\n")

        # make sure etags includes the newly created file
        etags_args += ['-i', tags_file]

    # now run the actual etags to take care of all other definitions
    try:
        cp = subprocess.run(['etags'] + etags_args, encoding='utf-8')
        status = cp.returncode
    except:
        status = 1

    # if etags did not finish successfully, remove the tags_file
    if status != 0:
        try:
            os.remove(tags_file)
        except FileNotFoundError:
            # nothing to be removed
            pass

与另一个答案一样,此脚本旨在替代标准 etags,因为它在内部调用后者。因此它也接受所有 etags' 命令行参数(但目前不遵守 -a)。

建议修改 shell 的初始化文件为别名,例如在 ~/.bashrc 中添加以下行:

alias etags+=python3 -u /path/to/script.py

其中 /path/to/script.py 是保存上述代码的文件的路径。有了这样的别名,您可以简单地调用

etags+ /path/to/file

等等