如何在 clojure 中正确地将一个字符串拆分成 n 个部分?

How to properly split a string into n number of pieces in clojure?

我正在尝试将一个字符串转换为大小为 N 的集合。我的方法returns 产生了错误的结果。


(defn chop
  [s pieces]
  (let [piece-size (int (/ (count s) pieces))]
    (map #(str/join %) (partition-all piece-size s))))

(def big-str (slurp "/path/to/9065-byte-string.txt"))

(count (chop (str/join (take 100 (repeat "x"))) 100))
100

(count (chop (str/join (take 10005 (repeat "x"))) 100))
101

因此,我尝试将测试字符串分成 100 份,实际上我有时会得到 101 份(如果 pieze-size 不是 100 的偶数倍)。不确定发生了什么。也许我的数学在 piece-size.

上是错误的

如果我填充字符串,它会起作用,但我不想这样做。

(defn chop
  [s pieces]
  (let [pad-size (- 100 (mod (count s) pieces))
        padded (str s (str/join (take pad-size (repeat " "))))
        piece-size (int (/ (count padded) pieces))
        ]
    (println "pad-size=" pad-size)
    (println "piece-size=" piece-size)
    (map #(str/join %) (partition-all piece-size padded))))

你的数学有误。

假设您有 N 个项目,并且您想要 g 个组。如果 N/g 不是整数,则您有包含不同数量项目的组。如果你想分散差异,这样定义:

(ns tst.demo.core
  (:use tupelo.core tupelo.test)
  (:require
    [tupelo.string :as str]))

(defn chop
  [s groups]
  (newline)
  (println :-----------------------------------------------------------------------------)
  (let-spy
    [N      (count s)
     r      (/ (float N) (float groups))  ; or use `quot`
     a      (int (Math/floor r))  ; size of "small" groups
     b      (inc a)               ; size of "big"   groups

     ; Solve 2 eq's in 2 unkowns
     ; xa + yb = N
     ; x  + y  = g
     x      (- (* b groups) N)   ; number of "small" groups
     y      (- groups x)         ; number of "big" groups
     N1     (* x a)  ; chars in all small groups
     N2     (* y b)  ; chars in all big groups
     >>     (assert (= N (+ N1 N2)))   ; verify calculated correctly
     chars  (vec s)
     smalls (vec (partition a (take N1 chars)))  ; or use `split-at`
     bigs   (vec (partition b (drop N1 chars)))

     result (mapv str/join
              (concat smalls bigs))
     ]
    result))

单元测试:

(dotest
  (is= (chop "abcd" 2) ["ab" "cd"])
  (is= (chop "abcd" 3) ["a" "b" "cd"])
  (is= (chop "abcde" 3) ["a" "bc" "de"])
  (is= (chop "abcdef" 3) ["ab" "cd" "ef"])
  (is= (chop "abcdefg" 3) ["ab" "cd" "efg"])

  (let [s100 (str/join (take 100 (repeat "x")))
        s105 (str/join (take 105 (repeat "x")))

        r100 (chop s100 10)
        r105 (chop s105 10)
        ]
    (is= 10 (spyx (count r100)))
    (is= 10 (spyx (count r105)))))

结果打印如下:

:-----------------------------------------------------------------------------
N => 4
r => 2.0
a => 2
b => 3
x => 2
y => 0
N1 => 4
N2 => 0
>> => nil
chars => [\a \b \c \d]
smalls => [(\a \b) (\c \d)]
bigs => []
result => ["ab" "cd"]

:-----------------------------------------------------------------------------
N => 4
r => 1.3333333333333333
a => 1
b => 2
x => 2
y => 1
N1 => 2
N2 => 2
>> => nil
chars => [\a \b \c \d]
smalls => [(\a) (\b)]
bigs => [(\c \d)]
result => ["a" "b" "cd"]

:-----------------------------------------------------------------------------
N => 5
r => 1.6666666666666667
a => 1
b => 2
x => 1
y => 2
N1 => 1
N2 => 4
>> => nil
chars => [\a \b \c \d \e]
smalls => [(\a)]
bigs => [(\b \c) (\d \e)]
result => ["a" "bc" "de"]

:-----------------------------------------------------------------------------
N => 6
r => 2.0
a => 2
b => 3
x => 3
y => 0
N1 => 6
N2 => 0
>> => nil
chars => [\a \b \c \d \e \f]
smalls => [(\a \b) (\c \d) (\e \f)]
bigs => []
result => ["ab" "cd" "ef"]

:-----------------------------------------------------------------------------
N => 7
r => 2.3333333333333335
a => 2
b => 3
x => 2
y => 1
N1 => 4
N2 => 3
>> => nil
chars => [\a \b \c \d \e \f \g]
smalls => [(\a \b) (\c \d)]
bigs => [(\e \f \g)]
result => ["ab" "cd" "efg"]

调试完成并了解步骤后,将 let-spy 更改为 let 并删除其他打印语句。

以上是使用my favorite template project.

制作的

更新

如果您不喜欢求解方程组,您可以使用 quotmodremainder 来计算除法:

(defn chop
  [s groups]
  (let [N            (count s)
        nsmall       (quot N groups) ; size of "small" groups
        nbig         (inc nsmall) ; size of "big"   groups
        ngrp-big     (- N (* nsmall groups)) ; number of "big" groups
        ngrp-small   (- groups ngrp-big) ; number of "small" groups
        nsmall-chars (* ngrp-small nsmall) ; chars in all small groups
        [chars-small chars-large] (split-at nsmall-chars s)
        smalls       (partition nsmall chars-small)
        bigs         (partition nbig chars-large)
        result       (mapv str/join
                       (concat smalls bigs))]
    result))

这里不需要浮点运算

(let [batches        3
      input-sequence "abcd"
      full-batches   (mod (count input-sequence) batches)
      full-batches   (if (zero? full-batches) batches full-batches)
      batch-size     (/ (+ (count input-sequence) (- full-batches) batches) batches)]]
    (let [[a b] (split-at (* full-batches batch-size) input-sequence)]
        (map #(apply str %)
            (concat (partition batch-size a)
                (partition (dec batch-size) b)))))
=> ("ab" "c" "d")

我通过求解方程得到批量大小公式:

您要这样做吗?

(defn chop [s piece-count]
  (let [step (/ (count s) piece-count)]
    (->> (range (inc piece-count)) ;; <-- Enumerated split positions
         (map #(-> % (* step) double Math/round)) ;; <-- Where to split
         (partition 2 1) ;; <-- Form slice lower/upper bounds
         (map (fn [[l u]] (subs s l u)))))) ;; <-- Slice input string

(chop "Input string." 3)
;; => ("Inpu" "t str" "ing.")

您的方法的主要问题是您将片段大小四舍五入为一个整数,使每个片段的长度都相等,如果字符串长度不能被片段数整除,这将不起作用。解决方案是先计算分数切片边界,然后 然后 将它们四舍五入到最接近输入字符串将被分割的整数。此算法生成长度相差不超过 1 的切片。