从 python 的 glob 中提取所有扩展

Question

Python的glob模块允许指定通配符列出文件，获取文件.

非常实用

但是我怎样才能 get/reconstruct 通配符匹配的值呢？

比如我有这8个文件：fa1 fa2 fa3 fb1 fb3 fc1 fc2 fc3（注意：缺少fb2）

我可以

import glob
glob.glob('f[ab][12]') # ['fa2', 'fb1', 'fa1']

在这种情况下，我有 2 个通配符：[ab] 和 [12]。它们每个都匹配值a、b和1和2，但是这些值的组合只有3个，因为一个文件，fb2（通配符的有效组合`) 不存在。

问题：如何获取每个有效匹配值的列表通配符？更准确地说：如何获取与实际存在的文件匹配的（字符串）值的元组列表？

在我的示例中，我想获取元组列表：[('a', '2'), ('b', '1'), ('a', '1')]。

注意：

我不想获取全名，只想获取通配符匹配的值（在我的示例中，前缀 'f' 不是通配符的一部分，因此我不想获取它元组列表);
这必须适用于所有支持的通配符，包括 * 和 ?。

我能想到的唯一解决方案是使用正则表达式，但这基本上意味着重新实现整个 glob 机制来提取中间数据。

编辑

由于 "too broad" 问题 (???)，我得到了一个关闭提案，我将问题重新表述为：是否可以使用 glob/fnmatch 模块而不是直接使用正则表达式来获得该结果？

Answer 1

无法通过这些模块获得该信息。 glob 调用 fnmatch 进行模式匹配，fnmatch 使用正则表达式进行模式匹配。请参阅 glob and fnmatch Python 来源。

这里有一些 Python 2 演示代码，它使用 fnmatch 中 translate 函数的修改版本。从我的简短测试来看，它似乎可以工作，但不提供任何保证。 :) 请注意，这会忽略 fnmatch 执行的其他操作，例如不区分大小写的匹配。

#!/usr/bin/env python

import re, fnmatch, glob

def pat_translate(pat):
    """Translate a shell PATTERN to a regular expression.

    There is no way to quote meta-characters.
    Hacked to add capture groups
    """
    i, n = 0, len(pat)
    res = ''
    while i < n:
        c = pat[i]
        i = i+1
        if c == '*':
            res = res + '(.*)'
        elif c == '?':
            res = res + '(.)'
        elif c == '[':
            j = i
            if j < n and pat[j] == '!':
                j = j+1
            if j < n and pat[j] == ']':
                j = j+1
            while j < n and pat[j] != ']':
                j = j+1
            if j >= n:
                res = res + '\['
            else:
                stuff = pat[i:j].replace('\','\\')
                i = j+1
                if stuff[0] == '!':
                    stuff = '^' + stuff[1:]
                elif stuff[0] == '^':
                    stuff = '\' + stuff
                res = '%s([%s])' % (res, stuff)
        else:
            res = res + re.escape(c)
    return res + '\Z(?ms)'


def test(shell_pat):
    print 'Shell pattern %r' % shell_pat
    names = glob.glob(shell_pat)
    print 'Found', names
    regex = pat_translate(shell_pat)
    print 'Regex %r' % regex
    pat = re.compile(regex)
    groups = [pat.match(name).groups() for name in names]
    print 'name, groups'
    for name, row in zip(names, groups):
        print name, row

Answer 2

在您的具体情况下，您可能希望使用 itertools.product:

import itertools
import os


def get_wildcards(*specs):
    for wildcard in itertools.product(*specs):
        if os.path.exists('f{}{}'.format(*wildcard)):
            yield wildcard


for wildcard in get_wildcards('ab', '12'):
    print wildcard

输出：

('a', '1')
('a', '2')
('b', '1')

在这种情况下，您正在使用 "ab" 和 "12" 的 "product" 并结束最多有 4 个元组，os.path.exists 测试消除那些不指定现有文件的元组。

更新

计划将文件系统通配符转换为正则表达式（您可以避免使用正则表达式，但会很痛苦）。接下来，我们将列出当前目录中的所有文件，将每个文件与正则表达式进行匹配。如果找到匹配项，我们将构造一个元组来生成该匹配项。

import re
import os


def regex_from_wildcard(wildcard):
    wildcard = wildcard.replace('.', r'\.')
    wildcard = wildcard.replace('[', '([').replace(']', '])')
    wildcard = wildcard.replace('?', r'(.)').replace('*', r'(.*)')
    wildcard = r'^{}$'.format(wildcard)
    wildcard = re.compile(wildcard)
    return wildcard


def generate_from_wildcards(wildcard):
    pattern = regex_from_wildcard(wildcard)
    for filename in os.listdir('.'):
        match_object = re.match(pattern, filename)
        if match_object:
            yield tuple(''.join(match_object.groups()))


# Test
for tup in generate_from_wildcards('f[bc]?'):
      print tup

一些注意事项：

由于我还不清楚你到底想要什么，解决方案可能有几个地方不对
如果通配符包含非通配符字符，例如 f、一个点，则这些字符不包含在元组中。

从 python 的 glob 中提取所有扩展

extracting all expansions from python's glob

python

glob

编辑

更新