什么是 return 带有特殊字符的单个单词名称的正则表达式 | (管道)
What is the regex that will return single word names with a special character that is a | (pipe)
我有这样的话
John | Gilbert | alan
Stephen | king | harris
| | Steve
Barack | | Obama
Tom | George | Stevenson
Donald | |
| Alan |
Sir | Alex |
Stewart | |
John | new | man
我想return像下面这样的单字名字。
Steve
Alan
Stewart
我试过了
Name = re.search('\| (.*)',name)
以上都是return全部。
您可以尝试将 re.findall
与模式 (?:(?<=\n)|(?<=^))\|\s*\|\s*(\S+)(?:\n|$)
结合使用,这样只会找到单个单词名称:
inp = """| John | Gilbert | alan
| Stephen | king | harris
| | Steve
| Barack | | Obama
|| Donald | | Trump
| | Alan
| | Stewart"""
single_names = re.findall(r'(?:(?<=\n)|(?<=^))\|\s*\|\s*(\S+)(?:\n|$)', inp)
print(single_names)
这会打印:
['Steve', 'Alan', 'Stewart']
对您现有的正则表达式模式进行简单修改即可:
>>> name = """
|| John Deere
|| Stephen king
|| Steve
|| Barack Hussein Obama
|| Donald Trump
|| Alan
|| Stewart"""
>>> re.findall('\| ([^\s]*)(?:\n|$)', name)
['Steve', 'Alan', 'Stewart']
您可以在输入字符串中使用 re.findall 找到所有匹配项。
编辑:对于您在名称之间包含 |
的编辑输入,此方法有效:
>>> name = """| John | Gilbert | alan
| Stephen | king | harris
| | Steve
| Barack | | Obama
|| Donald | | Trump
| | Alan
| | Stewart"""
>>> re.findall('^[|\W]*([^\s]+)(?:\n|$)', name, re.MULTILINE)
['Steve', 'Alan', 'Stewart']
你可以使用
^[|\s]*\|\s*([^|\s]+)$
^
字符串开头
[|\s]*
匹配 0+ 次 |
或空白字符
\|
匹配 |
\s*
匹配 0+ 个空白字符
([^|\s]+)
捕获 组 1,匹配除 |
或空白字符 之外的任何字符
$
字符串结束
例如
import re
regex = r"^[|\s]*\|\s*([^|\s]+)$"
names = ("| John | Gilbert | alan\n"
"| Stephen | king | harris\n"
"| | Steve\n"
"| Barack | | Obama\n"
"|| Donald | | Trump \n"
"| | Alan\n"
"| | Stewart")
print(re.findall(regex, names, re.MULTILINE))
输出
['Steve', 'Alan', 'Stewart']
使用
(?m)^(?:\|[^\S\n]*)*(\S+)[^\S\n]*$
见proof。
说明
--------------------------------------------------------------------------------
(?m) multiline mode (= re.M / re.MULTILINE)
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
[^\S\n]* any character except: non-whitespace
(all but \n, \r, \t, \f, and " "), '\n'
(newline) (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
[^\S\n]* any character except: non-whitespace (all
but \n, \r, \t, \f, and " "), '\n'
(newline) (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
import re
string = """| John | Gilbert | alan
| Stephen | king | harris
| | Steve
| Barack | | Obama
|| Donald | | Trump
| | Alan
| | Stewart"""
pattern = r"^(?:\|[^\S\n]*)*(\S+)[^\S\n]*$"
print(re.findall(pattern, string, re.M))
结果:['Steve', 'Alan', 'Stewart']
我有这样的话
John | Gilbert | alan
Stephen | king | harris
| | Steve
Barack | | Obama
Tom | George | Stevenson
Donald | |
| Alan |
Sir | Alex |
Stewart | |
John | new | man
我想return像下面这样的单字名字。
Steve
Alan
Stewart
我试过了
Name = re.search('\| (.*)',name)
以上都是return全部。
您可以尝试将 re.findall
与模式 (?:(?<=\n)|(?<=^))\|\s*\|\s*(\S+)(?:\n|$)
结合使用,这样只会找到单个单词名称:
inp = """| John | Gilbert | alan
| Stephen | king | harris
| | Steve
| Barack | | Obama
|| Donald | | Trump
| | Alan
| | Stewart"""
single_names = re.findall(r'(?:(?<=\n)|(?<=^))\|\s*\|\s*(\S+)(?:\n|$)', inp)
print(single_names)
这会打印:
['Steve', 'Alan', 'Stewart']
对您现有的正则表达式模式进行简单修改即可:
>>> name = """
|| John Deere
|| Stephen king
|| Steve
|| Barack Hussein Obama
|| Donald Trump
|| Alan
|| Stewart"""
>>> re.findall('\| ([^\s]*)(?:\n|$)', name)
['Steve', 'Alan', 'Stewart']
您可以在输入字符串中使用 re.findall 找到所有匹配项。
编辑:对于您在名称之间包含 |
的编辑输入,此方法有效:
>>> name = """| John | Gilbert | alan
| Stephen | king | harris
| | Steve
| Barack | | Obama
|| Donald | | Trump
| | Alan
| | Stewart"""
>>> re.findall('^[|\W]*([^\s]+)(?:\n|$)', name, re.MULTILINE)
['Steve', 'Alan', 'Stewart']
你可以使用
^[|\s]*\|\s*([^|\s]+)$
^
字符串开头[|\s]*
匹配 0+ 次|
或空白字符\|
匹配|
\s*
匹配 0+ 个空白字符([^|\s]+)
捕获 组 1,匹配除|
或空白字符 之外的任何字符
$
字符串结束
例如
import re
regex = r"^[|\s]*\|\s*([^|\s]+)$"
names = ("| John | Gilbert | alan\n"
"| Stephen | king | harris\n"
"| | Steve\n"
"| Barack | | Obama\n"
"|| Donald | | Trump \n"
"| | Alan\n"
"| | Stewart")
print(re.findall(regex, names, re.MULTILINE))
输出
['Steve', 'Alan', 'Stewart']
使用
(?m)^(?:\|[^\S\n]*)*(\S+)[^\S\n]*$
见proof。
说明
--------------------------------------------------------------------------------
(?m) multiline mode (= re.M / re.MULTILINE)
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
[^\S\n]* any character except: non-whitespace
(all but \n, \r, \t, \f, and " "), '\n'
(newline) (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
[^\S\n]* any character except: non-whitespace (all
but \n, \r, \t, \f, and " "), '\n'
(newline) (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
import re
string = """| John | Gilbert | alan
| Stephen | king | harris
| | Steve
| Barack | | Obama
|| Donald | | Trump
| | Alan
| | Stewart"""
pattern = r"^(?:\|[^\S\n]*)*(\S+)[^\S\n]*$"
print(re.findall(pattern, string, re.M))
结果:['Steve', 'Alan', 'Stewart']