学习 Clojure：隐马尔可夫模型的递归

Question

我正在学习 Clojure，并从复制 Python 程序的功能开始，该程序将通过遵循（极其简单的）隐马尔可夫模型创建基因组序列。

一开始我坚持使用我已知的串行编程方式并大量使用 def 关键字，从而解决了带有大量副作用的问题，几乎解决了每个概念Clojure 就在屁股上。（虽然它按预期工作）

然后我尝试将其转换为更实用的方式，使用 loop、recur、atom 等等。现在，当我运行我得到一个 ArityException，但我无法以某种方式读取错误消息，即使是哪个函数抛出它。

(defn create-model [name pA pC pG pT pSwitch]
; converts propabilities to cumulative prop's and returns a char
  (with-meta
    (fn []
      (let [x (rand)]
        (if (<= x pA)
          \A
          (if (<= x (+ pA pC))
            \C
            (if (<= x (+ pA pC pG))
              \G
              \T)))))                   ; the function object
    {:p-switch pSwitch :name name}))    ; the metadata, used to change model


(defn create-genome [n]
; adds random chars from a model to a string and switches models randomly
  (let [models [(create-model "std" 0.25 0.25 0.25 0.25 0.02) ; standard model, equal props
                (create-model "cpg" 0.1 0.4 0.4 0.1 0.04)]    ; cpg model
        islands (atom 0)                 ; island counter
        m (atom 0)]                      ; model index
    (loop [result (str)]
      (let [model (nth models @m)]
        (when (< (rand) (:p-switch (meta model))) ; random says "switch model!"
;          (swap! m #(mod (inc @m) 2))   ; swap model used in next iteration     
          (swap! m #(mod (inc %) 2))     ; EDIT: correction
          (if (= @m 1)                   ; on switch to model 1, increase island counter
;            (swap! islands #(inc @islands)))) ; EDIT: my try, with error
            (swap! islands inc)))) ;             EDIT: correction
        (if (< (count result) n)         ; repeat until result reaches length n
          (recur (format "%s%c" result (model)))
          result)))))

运行它有效，但调用 (create-genome 1000) 导致

ArityException Wrong number of args (1) passed to: user/create-genome/fn--772  clojure.lang.AFn.throwArity (AFn.java:429)

我的问题：

（显然）我做错了什么？
我必须如何准确理解错误消息？

我很乐意收到的信息

如何改进代码（以 clojure 新手可以理解的方式）？还有不同的范例 - 我很感激你的建议。
为什么我需要在用于改变原子状态的形式之前加上井号#？我在一个例子中看到了这个，没有它函数就不会计算，但我不明白:)

Answer 1

好吧，这是一个远景，但它看起来像你的原子更新函数：

#(mod (inc @m) 2)

和

#(inc @islands)

是 0-arity，它们应该至少是 1。

这引出了您最后一个问题的答案：#(body) 形式是 (fn [...] (body)) 的快捷方式。所以它创建了一个匿名函数。现在的诀窍是如果 body 包含 % 或 %x 其中 x 是一个数字，它出现的位置将被替换为对创建函数的参数编号的引用 x（或第一个参数，如果它只是 %）。

在您的情况下 body 不包含对函数参数的引用，因此 #() 创建一个不带参数的匿名函数，这不是 swap! 所期望的。

所以 swap 尝试将参数传递给不期望它的东西，然后砰！，你得到一个 ArityException。

在这些情况下你真正需要的是：

(swap! m #(mod (inc %) 2)) ; will swap value of m to (mod (inc current-value-of-m) 2) internally

和

(swap! islands inc) ; will swap value of islands to (inc current-value-of-islands) internally

分别

Answer 2

您的错误与您询问的有关主题标签宏的问题有关 #。

#(+ %1 %2) 对于 (fn [x y] (+ x y)) 是 shorthand。它也可以没有参数：#(+ 1 1)。这就是你使用它的方式。您收到的错误是因为 swap! needs a function that accepts a parameter. What it does is pass the atom's current value to your function. If you don't care about its state, use reset!：(reset! an-atom (+ 1 1))。这将修复您的错误。

更正：

我刚刚再次查看了您的代码，意识到您实际上是在对原子状态进行处理。所以你想要做的是：

(swap! m #(mod (inc %) 2)) 而不是 (swap! m #(mod (inc @m) 2)).

就风格而言，你做得很好。我在一周中的每一天都以不同的方式编写我的函数，所以也许我不适合就此提出建议。

Answer 3

既然你问了改进的方法，我发现自己经常使用一种方法：我可以将这个 loop 抽象成更高阶的模式吗？

在这种情况下，您的循环随机选择字符 - 这可以建模为调用一个没有参数的 fn returns 一个字符 - 然后将它们累积在一起直到它有足够的字符。这很自然地适合 repeatedly，它采用类似的函数并将其结果的惰性序列设置为您想要的任何长度。

然后，因为您将整个字符序列放在一起，所以您可以将它们连接成一个字符串，这比重复 formats 更有效 - clojure.string/join 应该很适合，或者您可以apply str 在上面。

这是我对这种代码形式的尝试 - 我也试图让它完全由数据驱动，这可能导致它有点神秘，所以请耐心等待：

(defn make-generator 
  "Takes a probability distribution, in the form of a map 
  from values to the desired likelihood of that value appearing in the output.
  Normalizes the probabilities and returns a nullary producer fn with that distribution."
  [p-distribution]  
  (let[sum-probs (reduce + (vals p-distribution))
       normalized (reduce #(update-in %1 [%2] / sum-probs) p-distribution (keys p-distribution) )]
      (fn [] (reduce 
              #(if (< %1 (val %2)) (reduced (key %2)) (- %1 (val %2))) 
              (rand) 
              normalized))))

(defn markov-chain 
  "Takes a series of states, returns a producer fn.
  Each call, the process changes to the next state in the series with probability :p-switch,
  and produces a value from the :producer of the current state."
  [states]
  (let[cur-state (atom (first states))
       next-states (atom (cycle states))]
    (fn [] 
      (when (< (rand) (:p-switch @cur-state))
        (reset! cur-state (first @next-states))
        (swap! next-states rest))
      ((:producer @cur-state)))))


(def my-states [{:p-switch 0.02 :producer (make-generator {\A 1 \C 1 \G 1 \T 1})  :name "std"}
                {:p-switch 0.04 :producer (make-generator {\A 1 \C 4 \G 4 \T 1})  :name "cpg"}])


(defn create-genome [n]
  (->> my-states
       markov-chain
       (repeatedly n)
       clojure.string/join))

希望能稍微解释一下复杂性：

make-generator 中的 let 只是确保概率之和为 1。
make-generator 大量使用了另一种高阶循环模式，即 reduce。这本质上采用 2 个值的函数，并通过它线程化一个集合。 (reduce + [4 5 2 9]) 就像 (((4 + 5) + 2) + 9)。我主要用它来对 create-model 中的嵌套 if 做类似的事情，但没有指出概率分布中有多少个值。
markov-chain 生成两个原子，cur-state 保持当前状态，next-states 保持下一个状态的无限序列（来自 cycle）以进行切换到。这与您的 m 和 models 一样工作，但适用于任意数量的状态。
然后我使用 when 检查随机状态切换是否应该发生，如果它确实执行两个副作用我需要保持状态原子最新。然后我只调用 @cur-state 的 :producer 没有参数和 return that.

很明显，您不必完全按照这种方式执行此操作，但寻找这些模式确实对我有所帮助。
如果您想获得更多功能，您还可以考虑转向一种设计，在该设计中，您的生成器采用一种状态（带有种子随机数生成器）并且 return 一个值加上一个新状态。这种 "state monad" 方法可以完全声明，而这个设计不是。

学习 Clojure：隐马尔可夫模型的递归

Learning Clojure: recursion for Hidden Markov Model

recursion

clojure

hidden-markov-models