PyParsing:shell 样式 space 使用反斜杠转义
PyParsing: shell style space escape using backslash
我需要解析 space 形式的分隔键值对文本
<key>=<value> <key>=<value> ...
这对于 pyparsing 来说非常简单。除非值可以包含 spaces
例如
dog=blue cat="orange tangerine" mouse=a\ small\ grey\ mouse
最后一对的 pyparsing 语法是什么样的
pyparsing 在 spaces 上贪婪..
看起来像
的跨行文本使它变得更加复杂
dog=blue cat="orange tangerine" mouse=a\ small\ grey\ mouse \
lion=nonexistent
我在 http://pyparsing.wikispaces.com/share/view/7002417 看了几个例子
和 Python/Pyparsing - Multiline quotes 这对多行文本有帮助,但对反斜杠转义没有帮助-space
假设您的输入字符串位于名为 "input.py" 的文件中,以下适用于您的示例:
import pyparsing
from pyparsing import ZeroOrMore, Group
OP_EQ = pyparsing.Literal('=').suppress()
DQUOTE = pyparsing.Literal('"').suppress()
ESPACE = pyparsing.Literal('\ ').suppress().leaveWhitespace()
BSLASH = pyparsing.Literal('\')
S = pyparsing.Word(" \t\r\n").suppress().leaveWhitespace()
DELIM = ZeroOrMore(S ^ BSLASH).suppress()
KEY = pyparsing.Word(pyparsing.alphanums)("KEY")
VALTOK = pyparsing.Word(pyparsing.printables, excludeChars='="\')
QVALUE = ( DQUOTE +
Group(VALTOK + ZeroOrMore(S + VALTOK)) +
DQUOTE
)
NQVALUE = Group(VALTOK + ZeroOrMore(ESPACE + VALTOK))
VALUE = (NQVALUE ^ QVALUE)("VALUE")
PAIR = Group(KEY + OP_EQ + VALUE)("PAIR")
PAIRS = (PAIR + ZeroOrMore(DELIM + PAIR))
with open('input.txt') as f:
lines = f.read()
res = PAIRS.parseString(lines, parseAll=True)
for (k,v) in res:
print('{} = "{}"'.format(k, ' '.join(v)))
输出:
dog = "blue"
cat = "orange tangerine"
mouse = "a small grey mouse"
dog = "blue"
cat = "orange tangerine"
mouse = "a small grey mouse"
lion = "nonexistent"
和XML一样,供参考:
<PAIRS>
<PAIR>
<KEY>dog</KEY>
<VALUE>
<ITEM>blue</ITEM>
</VALUE>
</PAIR>
<PAIR>
<KEY>cat</KEY>
<VALUE>
<ITEM>orange</ITEM>
<ITEM>tangerine</ITEM>
</VALUE>
</PAIR>
<PAIR>
<KEY>mouse</KEY>
<VALUE>
<ITEM>a</ITEM>
<ITEM>small</ITEM>
<ITEM>grey</ITEM>
<ITEM>mouse</ITEM>
</VALUE>
</PAIR>
<PAIR>
<KEY>dog</KEY>
<VALUE>
<ITEM>blue</ITEM>
</VALUE>
</PAIR>
<PAIR>
<KEY>cat</KEY>
<VALUE>
<ITEM>orange</ITEM>
<ITEM>tangerine</ITEM>
</VALUE>
</PAIR>
<PAIR>
<KEY>mouse</KEY>
<VALUE>
<ITEM>a</ITEM>
<ITEM>small</ITEM>
<ITEM>grey</ITEM>
<ITEM>mouse</ITEM>
</VALUE>
</PAIR>
<PAIR>
<KEY>lion</KEY>
<VALUE>
<ITEM>nonexistent</ITEM>
</VALUE>
</PAIR>
</PAIRS>
编辑:FWIW,你可以在正则表达式中这样做:
import re
with open('input.txt') as f:
lines = f.read()
mat = re.sub(r'=([^"]\w*(?:(?:\ )\w*)*)', r'=""', lines) # Quote unquoted values
mat = mat.replace("\ "," ").replace("\\n","") # Replace escaped spaces
mat = re.findall(r'(\w*)="(.*?)"', mat) # Extract pairs
for (k,v) in mat: # Print pairs
print('{} = "{}"'.format(k, v))
输出:
dog = "blue"
cat = "orange tangerine"
mouse = "a small grey mouse"
dog = "blue"
cat = "orange tangerine"
mouse = "a small grey mouse"
lion = "nonexistent"
我需要解析 space 形式的分隔键值对文本
<key>=<value> <key>=<value> ...
这对于 pyparsing 来说非常简单。除非值可以包含 spaces 例如
dog=blue cat="orange tangerine" mouse=a\ small\ grey\ mouse
最后一对的 pyparsing 语法是什么样的 pyparsing 在 spaces 上贪婪.. 看起来像
的跨行文本使它变得更加复杂dog=blue cat="orange tangerine" mouse=a\ small\ grey\ mouse \
lion=nonexistent
我在 http://pyparsing.wikispaces.com/share/view/7002417 看了几个例子 和 Python/Pyparsing - Multiline quotes 这对多行文本有帮助,但对反斜杠转义没有帮助-space
假设您的输入字符串位于名为 "input.py" 的文件中,以下适用于您的示例:
import pyparsing
from pyparsing import ZeroOrMore, Group
OP_EQ = pyparsing.Literal('=').suppress()
DQUOTE = pyparsing.Literal('"').suppress()
ESPACE = pyparsing.Literal('\ ').suppress().leaveWhitespace()
BSLASH = pyparsing.Literal('\')
S = pyparsing.Word(" \t\r\n").suppress().leaveWhitespace()
DELIM = ZeroOrMore(S ^ BSLASH).suppress()
KEY = pyparsing.Word(pyparsing.alphanums)("KEY")
VALTOK = pyparsing.Word(pyparsing.printables, excludeChars='="\')
QVALUE = ( DQUOTE +
Group(VALTOK + ZeroOrMore(S + VALTOK)) +
DQUOTE
)
NQVALUE = Group(VALTOK + ZeroOrMore(ESPACE + VALTOK))
VALUE = (NQVALUE ^ QVALUE)("VALUE")
PAIR = Group(KEY + OP_EQ + VALUE)("PAIR")
PAIRS = (PAIR + ZeroOrMore(DELIM + PAIR))
with open('input.txt') as f:
lines = f.read()
res = PAIRS.parseString(lines, parseAll=True)
for (k,v) in res:
print('{} = "{}"'.format(k, ' '.join(v)))
输出:
dog = "blue" cat = "orange tangerine" mouse = "a small grey mouse" dog = "blue" cat = "orange tangerine" mouse = "a small grey mouse" lion = "nonexistent"
和XML一样,供参考:
<PAIRS>
<PAIR>
<KEY>dog</KEY>
<VALUE>
<ITEM>blue</ITEM>
</VALUE>
</PAIR>
<PAIR>
<KEY>cat</KEY>
<VALUE>
<ITEM>orange</ITEM>
<ITEM>tangerine</ITEM>
</VALUE>
</PAIR>
<PAIR>
<KEY>mouse</KEY>
<VALUE>
<ITEM>a</ITEM>
<ITEM>small</ITEM>
<ITEM>grey</ITEM>
<ITEM>mouse</ITEM>
</VALUE>
</PAIR>
<PAIR>
<KEY>dog</KEY>
<VALUE>
<ITEM>blue</ITEM>
</VALUE>
</PAIR>
<PAIR>
<KEY>cat</KEY>
<VALUE>
<ITEM>orange</ITEM>
<ITEM>tangerine</ITEM>
</VALUE>
</PAIR>
<PAIR>
<KEY>mouse</KEY>
<VALUE>
<ITEM>a</ITEM>
<ITEM>small</ITEM>
<ITEM>grey</ITEM>
<ITEM>mouse</ITEM>
</VALUE>
</PAIR>
<PAIR>
<KEY>lion</KEY>
<VALUE>
<ITEM>nonexistent</ITEM>
</VALUE>
</PAIR>
</PAIRS>
编辑:FWIW,你可以在正则表达式中这样做:
import re
with open('input.txt') as f:
lines = f.read()
mat = re.sub(r'=([^"]\w*(?:(?:\ )\w*)*)', r'=""', lines) # Quote unquoted values
mat = mat.replace("\ "," ").replace("\\n","") # Replace escaped spaces
mat = re.findall(r'(\w*)="(.*?)"', mat) # Extract pairs
for (k,v) in mat: # Print pairs
print('{} = "{}"'.format(k, v))
输出:
dog = "blue" cat = "orange tangerine" mouse = "a small grey mouse" dog = "blue" cat = "orange tangerine" mouse = "a small grey mouse" lion = "nonexistent"