如何使用变量和标记实现马尔可夫算法？

Question

我一直在尝试实现马尔可夫算法，但只取得了部分成功。该算法相当简单，可以找到here。

但是，我的项目有一个额外的困难，我必须使用包含标记和变量的规则。

变量代表字母表中的任何字母，而标记只是用作移动变量的参考的字符（它没有实际值）。

此示例复制字符串中的每个字符：

Alphabet: {a,b,c}

Markers: {M}

Variables: {x}

Rule 1: Mx -> xxM

Rule 2: xM -> x

Rule 3: x -> Mx

input: abc

abc //We apply rule 3

Mabc //We apply rule 1

aaMbc //We apply rule 1

aabbMc //We apply rule 1

aabbccM //We apply rule 2

aabbcc

这是我的递归函数，它实现了仅适用于字符串输入的马尔可夫算法，例如：规则 1："apple" -> "orange"，输入："apple".

public static String markov(String input, LinkedList<Rule> rules) {
    for (Rule rule : rules) {
        if (!input.equals(input.replace(rule.getFrom(), rule.getTo()))) { //If the rule matches a substring
            if (rule.isTerminating()) { //If the rule is terminating
                input = input.replaceFirst(Pattern.quote(rule.getFrom()), rule.getTo());
                System.out.println(input); //Replace the first instance
                return input; //return and end the cycle
            } else {
                input = input.replaceFirst(Pattern.quote(rule.getFrom()), rule.getTo());
                System.out.println(input);
                return markov(input, rules); //Start looking again for matching rules
            }
        }
    }
    return input;
}

我不知道如何在这个逻辑中实现变量和标记，也许有人可以教我实现这个逻辑的最佳方法？欢迎任何建议。

如果问题不符合 SO 准则，请在评论中告诉我原因，这样我就不会重蹈覆辙。

谢谢！

GitHub

Answer 1

我认为最简单的方法是使用 Java 正则表达式。一旦您了解了这些，那么以下规则应该适用于您的示例：

Rule 1: "M([a-c])" -> "M"
Rule 2: "([a-c])M" -> "" (terminating)
Rule 3: "([a-c])"  -> "M"

请注意，您需要对当前方法进行一些调整才能使此工作...

replace 将文字字符串作为第一个参数，而 replaceFirst 使用正则表达式，因此：

replace: if (!input.equals(input.replace(rule.getFrom(), rule.getTo()))) {
with:    if (!input.equals(input.replaceFirst(rule.getFrom(), rule.getTo()))) {

您正在引用 rule.getFrom() 字符串，它不适用于正则表达式，因此：

replace: input = input.replaceFirst(Pattern.quote(rule.getFrom()), rule.getTo());
with:    input = input.replaceFirst(rule.getFrom(), rule.getTo());

那时，您在两次调用 replaceFirst 的代码中有一些重复，因此您可以在第一次将其粘贴到临时变量中并重新使用它：

String next = input.replace(rule.getFrom(), rule.getTo());
if (!input.equals(next)) {
  ...
  input = next;
  ...
}

由于您目前正在引用整个 rule.getFrom() 字符串，我猜您之前在正则表达式特殊字符方面遇到过问题。如果是这样，您需要在创建规则时单独解决它们。我真的不想在这里讨论正则表达式，因为它是一个很大的领域，并且与马尔可夫算法完全分开，所以如果您遇到这些问题，请在线进行一些研究（例如 Regular Expressions and Capturing Groups），或在此处提出一个单独的问题，重点关注正则表达式的特定问题。

请注意，您仍然可以将这些与正常规则结合使用（将标记字符从 M 更改为 # 以允许在字母表中使用 M），这些规则：

"A"             -> "apple"
"B"             -> "bag"
"S"             -> "shop"
"T"             -> "the"
"the shop"      -> "my brother"
"#([a-zA-Z .])" -> "#"
"([a-zA-Z .])#" -> "" (terminating)
"([a-zA-Z .])"  -> "#"

将转换：

from: I bought a B of As from T S.
to:   II  bboouugghhtt  aa  bbaagg  ooff  aapppplleess  ffrroomm  mmyy  bbrrootthheerr..

希望这对您有所帮助。

如何使用变量和标记实现马尔可夫算法？

How can I implement Markov's algorithm with variables and markers?

java

recursion

replace

cycle

markov