基于pyparsing的拆分
Splits based on pyparsing
所以我想这样做(但使用 pyparsing)
Package:numpy11 Package:scipy
will be split into
[["Package:", "numpy11"], ["Package:", "scipy"]]
到目前为止我的代码是
package_header = Literal("Package:")
single_package = Word(printables + " ") + ~Literal("Package:")
full_parser = OneOrMore( pp.Group( package_header + single_package ) )
当前输出是这样的
([(['Package:', 'numpy11 Package:scipy'], {})], {})
我希望有这样的东西
([(['Package:', 'numpy11'], {})], [(['Package:', 'scipy'], {})], {})
基本上文本的其余部分匹配 pp.printables
我知道我可以使用 Words 但我想这样做
all printables but not the Literal
我该如何做到这一点?谢谢你。
你不应该需要负面的前瞻,即。这个:
from pyparsing import *
package_header = Literal("Package:")
single_package = Word(printables)
full_parser = OneOrMore( Group( package_header + single_package ) )
print full_parser.parseString("Package:numpy11 Package:scipy")
打印:
[['Package:', 'numpy11'], ['Package:', 'scipy']]
更新: 解析由 |
分隔的包你可以使用 delimitedList()
函数(现在你也可以在包名称中包含空格):
from pyparsing import *
package_header = Literal("Package:")
package_name = Regex(r'[^|]+') # | is a printable, so create a regex that excludes it.
package = Group(package_header + package_name)
full_parser = delimitedList(package, delim="|" )
print full_parser.parseString("Package:numpy11 foo|Package:scipy")
打印:
[['Package:', 'numpy11 foo'], ['Package:', 'scipy']]
所以我想这样做(但使用 pyparsing)
Package:numpy11 Package:scipy
will be split into
[["Package:", "numpy11"], ["Package:", "scipy"]]
到目前为止我的代码是
package_header = Literal("Package:")
single_package = Word(printables + " ") + ~Literal("Package:")
full_parser = OneOrMore( pp.Group( package_header + single_package ) )
当前输出是这样的
([(['Package:', 'numpy11 Package:scipy'], {})], {})
我希望有这样的东西
([(['Package:', 'numpy11'], {})], [(['Package:', 'scipy'], {})], {})
基本上文本的其余部分匹配 pp.printables
我知道我可以使用 Words 但我想这样做
all printables but not the Literal
我该如何做到这一点?谢谢你。
你不应该需要负面的前瞻,即。这个:
from pyparsing import *
package_header = Literal("Package:")
single_package = Word(printables)
full_parser = OneOrMore( Group( package_header + single_package ) )
print full_parser.parseString("Package:numpy11 Package:scipy")
打印:
[['Package:', 'numpy11'], ['Package:', 'scipy']]
更新: 解析由 |
分隔的包你可以使用 delimitedList()
函数(现在你也可以在包名称中包含空格):
from pyparsing import *
package_header = Literal("Package:")
package_name = Regex(r'[^|]+') # | is a printable, so create a regex that excludes it.
package = Group(package_header + package_name)
full_parser = delimitedList(package, delim="|" )
print full_parser.parseString("Package:numpy11 foo|Package:scipy")
打印:
[['Package:', 'numpy11 foo'], ['Package:', 'scipy']]