为什么在 Clojure 的瞬态映射中插入 1000 000 个值会产生一个包含 8 个项目的映射?
Why inserting 1000 000 values in a transient map in Clojure yields a map with 8 items in it?
如果我尝试对瞬态向量执行 1000 000 assoc!
,我将得到一个包含 1000 000 个元素的向量
(count
(let [m (transient [])]
(dotimes [i 1000000]
(assoc! m i i)) (persistent! m)))
; => 1000000
另一方面,如果我对地图做同样的事情,它只会有 8 个项目
(count
(let [m (transient {})]
(dotimes [i 1000000]
(assoc! m i i)) (persistent! m)))
; => 8
发生这种情况是否有原因?
瞬态数据类型的操作不保证它们 return 与传入的引用相同。有时实施可能会决定 return 一个新的(但仍然是瞬态的)映射在 assoc!
而不是使用您传入的那个之后。
解释此行为的ClojureDocs page on assoc!
has a nice example:
;; The key concept to understand here is that transients are
;; not meant to be `bashed in place`; always use the value
;; returned by either assoc! or other functions that operate
;; on transients.
(defn merge2
"An example implementation of `merge` using transients."
[x y]
(persistent! (reduce
(fn [res [k v]] (assoc! res k v))
(transient x)
y)))
;; Why always use the return value, and not the original? Because the return
;; value might be a different object than the original. The implementation
;; of Clojure transients in some cases changes the internal representation
;; of a transient collection (e.g. when it reaches a certain size). In such
;; cases, if you continue to try modifying the original object, the results
;; will be incorrect.
;; Think of transients like persistent collections in how you write code to
;; update them, except unlike persistent collections, the original collection
;; you passed in should be treated as having an undefined value. Only the return
;; value is predictable.
我想重复最后一部分,因为它非常重要:您传入的原始集合应被视为具有未定义的值。只有 return 值是可预测的。
这是按预期工作的代码的修改版本:
(count
(let [m (transient {})]
(persistent!
(reduce (fn [acc i] (assoc! acc i i))
m (range 1000000)))))
作为旁注,您总是得到 8 的原因是因为 Clojure 喜欢对具有 8 个或更少元素的映射使用 clojure.lang.PersistentArrayMap
(由数组支持的映射)。一旦你超过 8,它就会切换到 clojure.lang.PersistentHashMap
。
user=> (type '{1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a})
clojure.lang.PersistentArrayMap
user=> (type '{1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a})
clojure.lang.PersistentHashMap
一旦超过 8 个条目,瞬态映射会将支持数据结构从成对数组 (PersistentArrayMap
) 切换到哈希表 (PersistentHashMap
),此时 assoc!
return这是一个新参考,而不是仅仅更新旧参考。
最简单的解释来自 Clojure documentation 本身(强调我的):
Transients support a parallel set of 'changing' operations, with similar names followed by ! - assoc!, conj! etc. These do the same things as their persistent counterparts except the return values are themselves transient. Note in particular that transients are not designed to be bashed in-place. You must capture and use the return value in the next call.
如果我尝试对瞬态向量执行 1000 000 assoc!
,我将得到一个包含 1000 000 个元素的向量
(count
(let [m (transient [])]
(dotimes [i 1000000]
(assoc! m i i)) (persistent! m)))
; => 1000000
另一方面,如果我对地图做同样的事情,它只会有 8 个项目
(count
(let [m (transient {})]
(dotimes [i 1000000]
(assoc! m i i)) (persistent! m)))
; => 8
发生这种情况是否有原因?
瞬态数据类型的操作不保证它们 return 与传入的引用相同。有时实施可能会决定 return 一个新的(但仍然是瞬态的)映射在 assoc!
而不是使用您传入的那个之后。
解释此行为的ClojureDocs page on assoc!
has a nice example:
;; The key concept to understand here is that transients are
;; not meant to be `bashed in place`; always use the value
;; returned by either assoc! or other functions that operate
;; on transients.
(defn merge2
"An example implementation of `merge` using transients."
[x y]
(persistent! (reduce
(fn [res [k v]] (assoc! res k v))
(transient x)
y)))
;; Why always use the return value, and not the original? Because the return
;; value might be a different object than the original. The implementation
;; of Clojure transients in some cases changes the internal representation
;; of a transient collection (e.g. when it reaches a certain size). In such
;; cases, if you continue to try modifying the original object, the results
;; will be incorrect.
;; Think of transients like persistent collections in how you write code to
;; update them, except unlike persistent collections, the original collection
;; you passed in should be treated as having an undefined value. Only the return
;; value is predictable.
我想重复最后一部分,因为它非常重要:您传入的原始集合应被视为具有未定义的值。只有 return 值是可预测的。
这是按预期工作的代码的修改版本:
(count
(let [m (transient {})]
(persistent!
(reduce (fn [acc i] (assoc! acc i i))
m (range 1000000)))))
作为旁注,您总是得到 8 的原因是因为 Clojure 喜欢对具有 8 个或更少元素的映射使用 clojure.lang.PersistentArrayMap
(由数组支持的映射)。一旦你超过 8,它就会切换到 clojure.lang.PersistentHashMap
。
user=> (type '{1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a})
clojure.lang.PersistentArrayMap
user=> (type '{1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a})
clojure.lang.PersistentHashMap
一旦超过 8 个条目,瞬态映射会将支持数据结构从成对数组 (PersistentArrayMap
) 切换到哈希表 (PersistentHashMap
),此时 assoc!
return这是一个新参考,而不是仅仅更新旧参考。
最简单的解释来自 Clojure documentation 本身(强调我的):
Transients support a parallel set of 'changing' operations, with similar names followed by ! - assoc!, conj! etc. These do the same things as their persistent counterparts except the return values are themselves transient. Note in particular that transients are not designed to be bashed in-place. You must capture and use the return value in the next call.