我如何使用 Haskell 中的解析器来查找字符串中某些子字符串的位置?
How can I use a parser in Haskell to find the locations of some substrings in a string?
我对 Haskell 很陌生。我希望能够在字符串中找到一些颜色表达式。假设我有这个表达式列表:
colorWords = ["blue", "green", "blue green"]
而且我希望能够获取所有这些的位置,在字符串中的任何位置,即使它被换行符打断,或者如果它被连字符分隔。所以给定一个字符串:
First there was blue
and then there was Green,
and then blue
green all of a sudden, and not to mention blue-green
它应该给出 "blue"(第一行)、"green"(第二行)、"blue green"(第 3-4 行)和 "blue-green" 的字符偏移量(第 4 行),类似于:
[("blue", [20]), ("green", [40]), ("blue green", [50, 65])]
我可以用正则表达式来做到这一点,但我一直在尝试用解析器来做这件事,就像练习一样。我猜它是这样的:
import Text.ParserCombinators.Parsec
separator = spaces <|> "-" <|> "\n"
colorExp colorString = if (length (words colorString))>1 then
multiWordColorExp colorString
else colorString
multiWordColorExp :: Parser -> String
multiWordColorExp colorString = do
intercalate separator (words colorString)
但我不知道我在做什么,而且我真的没有任何进展。
我们可以使用 sepCap
combinator from replace-megaparsec.
使用解析器查找子字符串位置
这是您的示例问题的解决方案。需要软件包 megaparsec, replace-megaparsec, containers
。
参考:
string'
choice
getOffset
try
来自百万秒差距。
import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Maybe
import Data.Either
import Data.Map.Strict as Map
let colorWords :: Parsec Void String (String, [Int])
colorWords = do
i <- getOffset
c <- choice
[ try $ string' "blue" >>
anySingle >>
string' "green" >>
pure "blue green"
, try $ string' "blue" >> pure "blue"
, try $ string' "green" >> pure "green"
]
return (c,[i])
input = "First there was blue\nand then there was Green,\nand then blue\ngreen all of a sudden, and not to mention blue-green"
Map.toList $ Map.fromListWith mappend $ rights $ fromJust
$ parseMaybe (sepCap colorWords) input
[("blue",[16]),("blue green",[103,56]),("green",[40])]
我对 Haskell 很陌生。我希望能够在字符串中找到一些颜色表达式。假设我有这个表达式列表:
colorWords = ["blue", "green", "blue green"]
而且我希望能够获取所有这些的位置,在字符串中的任何位置,即使它被换行符打断,或者如果它被连字符分隔。所以给定一个字符串:
First there was blue
and then there was Green,
and then blue
green all of a sudden, and not to mention blue-green
它应该给出 "blue"(第一行)、"green"(第二行)、"blue green"(第 3-4 行)和 "blue-green" 的字符偏移量(第 4 行),类似于:
[("blue", [20]), ("green", [40]), ("blue green", [50, 65])]
我可以用正则表达式来做到这一点,但我一直在尝试用解析器来做这件事,就像练习一样。我猜它是这样的:
import Text.ParserCombinators.Parsec
separator = spaces <|> "-" <|> "\n"
colorExp colorString = if (length (words colorString))>1 then
multiWordColorExp colorString
else colorString
multiWordColorExp :: Parser -> String
multiWordColorExp colorString = do
intercalate separator (words colorString)
但我不知道我在做什么,而且我真的没有任何进展。
我们可以使用 sepCap
combinator from replace-megaparsec.
这是您的示例问题的解决方案。需要软件包 megaparsec, replace-megaparsec, containers
。
参考:
string'
choice
getOffset
try
来自百万秒差距。
import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Maybe
import Data.Either
import Data.Map.Strict as Map
let colorWords :: Parsec Void String (String, [Int])
colorWords = do
i <- getOffset
c <- choice
[ try $ string' "blue" >>
anySingle >>
string' "green" >>
pure "blue green"
, try $ string' "blue" >> pure "blue"
, try $ string' "green" >> pure "green"
]
return (c,[i])
input = "First there was blue\nand then there was Green,\nand then blue\ngreen all of a sudden, and not to mention blue-green"
Map.toList $ Map.fromListWith mappend $ rights $ fromJust
$ parseMaybe (sepCap colorWords) input
[("blue",[16]),("blue green",[103,56]),("green",[40])]