获取 'package' 和 'endpackage' 可选字符串之外的结构名称列表

Get list of struct names that are outside of 'package' and 'endpackage' optional strings

我正在尝试获取 packageendpackage 可选字符串之外的结构名称。 如果没有 packageendpackage 字符串,那么脚本应该 return 所有结构名称。

这是我的脚本:

import re

a = """
package new;

typedef struct packed
{
    logic a;
    logic b;
} abc_y;

typedef struct packed
{
    logic a;
    logic b;
} abc_t;

endpackage

typedef struct packed
{
    logic a;
    logic b;
} abc_x;

"""

print(re.findall(r'(?!package)*.*?typedef\s+struct\s+packed\s*{.*?}\s*(\w+);.*?(?!endpackage)*', a, re.MULTILINE|re.DOTALL))

这是输出:

['abc_y', 'abc_t', 'abc_x']

预期输出:

['abc_x']

我在正则表达式中遗漏了一些东西,但无法弄清楚是什么。有人可以帮我解决这个问题吗?提前致谢。

使用

\bpackage.*?\bendpackage\b|typedef\s+struct\s+packed\s*{[^{}]*}\s*(\w+);

regex proof

解释

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  package                  'package'
--------------------------------------------------------------------------------
  .*?                      any character except \n (0 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  endpackage               'endpackage'
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  typedef                  'typedef'
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  struct                   'struct'
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  packed                   'packed'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  {                        '{'
--------------------------------------------------------------------------------
  [^{}]*                   any character except: '{', '}' (0 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  }                        '}'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  ;                        ';'

Python code:

print(list(filter(None,re.findall(r'\bpackage.*?\bendpackage\b|typedef\s+struct\s+packed\s*{[^{}]*}\s*(\w+);', a, re.DOTALL))))

结果['abc_x']