什么是 return 带有特殊字符的单个单词名称的正则表达式 | (管道)

What is the regex that will return single word names with a special character that is a | (pipe)

我有这样的话

John | Gilbert | alan
Stephen | king | harris
| | Steve
Barack | | Obama
Tom | George | Stevenson 
Donald | | 
 | Alan | 
Sir | Alex | 
Stewart | | 
John | new | man

我想return像下面这样的单字名字。

Steve
Alan
Stewart

我试过了

Name = re.search('\| (.*)',name)

以上都是return全部。

您可以尝试将 re.findall 与模式 (?:(?<=\n)|(?<=^))\|\s*\|\s*(\S+)(?:\n|$) 结合使用,这样只会找到单个单词名称:

inp = """| John | Gilbert | alan
| Stephen | king | harris
| | Steve
| Barack | | Obama
|| Donald | | Trump 
| | Alan
| | Stewart"""

single_names = re.findall(r'(?:(?<=\n)|(?<=^))\|\s*\|\s*(\S+)(?:\n|$)', inp)
print(single_names)

这会打印:

['Steve', 'Alan', 'Stewart']

对您现有的正则表达式模式进行简单修改即可:

>>> name = """
|| John Deere
|| Stephen king
|| Steve
|| Barack Hussein Obama
|| Donald Trump 
|| Alan
|| Stewart"""
>>> re.findall('\| ([^\s]*)(?:\n|$)', name)
['Steve', 'Alan', 'Stewart']

您可以在输入字符串中使用 re.findall 找到所有匹配项。

编辑:对于您在名称之间包含 | 的编辑输入,此方法有效:

>>> name = """| John | Gilbert | alan
| Stephen | king | harris
| | Steve
| Barack | | Obama
|| Donald | | Trump 
| | Alan
| | Stewart"""
>>> re.findall('^[|\W]*([^\s]+)(?:\n|$)', name, re.MULTILINE)
['Steve', 'Alan', 'Stewart']

你可以使用

^[|\s]*\|\s*([^|\s]+)$
  • ^ 字符串开头
  • [|\s]* 匹配 0+ 次 | 或空白字符
  • \| 匹配 |
  • \s* 匹配 0+ 个空白字符
  • ([^|\s]+) 捕获 组 1,匹配除 | 或空白字符
  • 之外的任何字符
  • $ 字符串结束

Regex demo | Python demo

例如

import re

regex = r"^[|\s]*\|\s*([^|\s]+)$"

names = ("| John | Gilbert | alan\n"
            "| Stephen | king | harris\n"
            "| | Steve\n"
            "| Barack | | Obama\n"
            "|| Donald | | Trump \n"
            "| | Alan\n"
            "| | Stewart")

print(re.findall(regex, names, re.MULTILINE))

输出

['Steve', 'Alan', 'Stewart']

使用

(?m)^(?:\|[^\S\n]*)*(\S+)[^\S\n]*$

proof

说明

--------------------------------------------------------------------------------
  (?m)                     multiline mode (= re.M / re.MULTILINE)
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \|                       '|'
--------------------------------------------------------------------------------
    [^\S\n]*                 any character except: non-whitespace
                             (all but \n, \r, \t, \f, and " "), '\n'
                             (newline) (0 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  [^\S\n]*                 any character except: non-whitespace (all
                           but \n, \r, \t, \f, and " "), '\n'
                           (newline) (0 or more times (matching the
                           most amount possible))
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

Python code:

import re
string = """| John | Gilbert | alan
| Stephen | king | harris
| | Steve
| Barack | | Obama
|| Donald | | Trump 
| | Alan
| | Stewart"""
pattern = r"^(?:\|[^\S\n]*)*(\S+)[^\S\n]*$"
print(re.findall(pattern, string, re.M))

结果['Steve', 'Alan', 'Stewart']