用于从文本中查找基因产物的正则表达式

regex for finding gene product from the text

我应该使用什么正则表达式来匹配这样的文本

/product="hypothetical protein"".

到目前为止我已经厌倦了这种模式:

x = re.match(r"^s*\=product(.*)",line)"

使用

import re
test_str = ' /product="hypothetical protein"'
match = re.search(r'product="([^"]+)"', test_str)
if match:
    print(match.group(1))

参见regex proof

解释

--------------------------------------------------------------------------------
  product="                'product="'
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    [^"]+                    any character except: '"' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  "                        '"'