如何在 clojure 中正确地将一个字符串拆分成 n 个部分?
How to properly split a string into n number of pieces in clojure?
我正在尝试将一个字符串转换为大小为 N 的集合。我的方法returns 产生了错误的结果。
(defn chop
[s pieces]
(let [piece-size (int (/ (count s) pieces))]
(map #(str/join %) (partition-all piece-size s))))
(def big-str (slurp "/path/to/9065-byte-string.txt"))
(count (chop (str/join (take 100 (repeat "x"))) 100))
100
(count (chop (str/join (take 10005 (repeat "x"))) 100))
101
因此,我尝试将测试字符串分成 100 份,实际上我有时会得到 101 份(如果 pieze-size 不是 100 的偶数倍)。不确定发生了什么。也许我的数学在 piece-size
.
上是错误的
如果我填充字符串,它会起作用,但我不想这样做。
(defn chop
[s pieces]
(let [pad-size (- 100 (mod (count s) pieces))
padded (str s (str/join (take pad-size (repeat " "))))
piece-size (int (/ (count padded) pieces))
]
(println "pad-size=" pad-size)
(println "piece-size=" piece-size)
(map #(str/join %) (partition-all piece-size padded))))
你的数学有误。
假设您有 N 个项目,并且您想要 g
个组。如果 N/g
不是整数,则您有包含不同数量项目的组。如果你想分散差异,这样定义:
(ns tst.demo.core
(:use tupelo.core tupelo.test)
(:require
[tupelo.string :as str]))
(defn chop
[s groups]
(newline)
(println :-----------------------------------------------------------------------------)
(let-spy
[N (count s)
r (/ (float N) (float groups)) ; or use `quot`
a (int (Math/floor r)) ; size of "small" groups
b (inc a) ; size of "big" groups
; Solve 2 eq's in 2 unkowns
; xa + yb = N
; x + y = g
x (- (* b groups) N) ; number of "small" groups
y (- groups x) ; number of "big" groups
N1 (* x a) ; chars in all small groups
N2 (* y b) ; chars in all big groups
>> (assert (= N (+ N1 N2))) ; verify calculated correctly
chars (vec s)
smalls (vec (partition a (take N1 chars))) ; or use `split-at`
bigs (vec (partition b (drop N1 chars)))
result (mapv str/join
(concat smalls bigs))
]
result))
单元测试:
(dotest
(is= (chop "abcd" 2) ["ab" "cd"])
(is= (chop "abcd" 3) ["a" "b" "cd"])
(is= (chop "abcde" 3) ["a" "bc" "de"])
(is= (chop "abcdef" 3) ["ab" "cd" "ef"])
(is= (chop "abcdefg" 3) ["ab" "cd" "efg"])
(let [s100 (str/join (take 100 (repeat "x")))
s105 (str/join (take 105 (repeat "x")))
r100 (chop s100 10)
r105 (chop s105 10)
]
(is= 10 (spyx (count r100)))
(is= 10 (spyx (count r105)))))
结果打印如下:
:-----------------------------------------------------------------------------
N => 4
r => 2.0
a => 2
b => 3
x => 2
y => 0
N1 => 4
N2 => 0
>> => nil
chars => [\a \b \c \d]
smalls => [(\a \b) (\c \d)]
bigs => []
result => ["ab" "cd"]
:-----------------------------------------------------------------------------
N => 4
r => 1.3333333333333333
a => 1
b => 2
x => 2
y => 1
N1 => 2
N2 => 2
>> => nil
chars => [\a \b \c \d]
smalls => [(\a) (\b)]
bigs => [(\c \d)]
result => ["a" "b" "cd"]
:-----------------------------------------------------------------------------
N => 5
r => 1.6666666666666667
a => 1
b => 2
x => 1
y => 2
N1 => 1
N2 => 4
>> => nil
chars => [\a \b \c \d \e]
smalls => [(\a)]
bigs => [(\b \c) (\d \e)]
result => ["a" "bc" "de"]
:-----------------------------------------------------------------------------
N => 6
r => 2.0
a => 2
b => 3
x => 3
y => 0
N1 => 6
N2 => 0
>> => nil
chars => [\a \b \c \d \e \f]
smalls => [(\a \b) (\c \d) (\e \f)]
bigs => []
result => ["ab" "cd" "ef"]
:-----------------------------------------------------------------------------
N => 7
r => 2.3333333333333335
a => 2
b => 3
x => 2
y => 1
N1 => 4
N2 => 3
>> => nil
chars => [\a \b \c \d \e \f \g]
smalls => [(\a \b) (\c \d)]
bigs => [(\e \f \g)]
result => ["ab" "cd" "efg"]
调试完成并了解步骤后,将 let-spy
更改为 let
并删除其他打印语句。
以上是使用my favorite template project.
制作的
更新
如果您不喜欢求解方程组,您可以使用 quot
和 mod
或 remainder
来计算除法:
(defn chop
[s groups]
(let [N (count s)
nsmall (quot N groups) ; size of "small" groups
nbig (inc nsmall) ; size of "big" groups
ngrp-big (- N (* nsmall groups)) ; number of "big" groups
ngrp-small (- groups ngrp-big) ; number of "small" groups
nsmall-chars (* ngrp-small nsmall) ; chars in all small groups
[chars-small chars-large] (split-at nsmall-chars s)
smalls (partition nsmall chars-small)
bigs (partition nbig chars-large)
result (mapv str/join
(concat smalls bigs))]
result))
这里不需要浮点运算
(let [batches 3
input-sequence "abcd"
full-batches (mod (count input-sequence) batches)
full-batches (if (zero? full-batches) batches full-batches)
batch-size (/ (+ (count input-sequence) (- full-batches) batches) batches)]]
(let [[a b] (split-at (* full-batches batch-size) input-sequence)]
(map #(apply str %)
(concat (partition batch-size a)
(partition (dec batch-size) b)))))
=> ("ab" "c" "d")
我通过求解方程得到批量大小公式:
您要这样做吗?
(defn chop [s piece-count]
(let [step (/ (count s) piece-count)]
(->> (range (inc piece-count)) ;; <-- Enumerated split positions
(map #(-> % (* step) double Math/round)) ;; <-- Where to split
(partition 2 1) ;; <-- Form slice lower/upper bounds
(map (fn [[l u]] (subs s l u)))))) ;; <-- Slice input string
(chop "Input string." 3)
;; => ("Inpu" "t str" "ing.")
您的方法的主要问题是您将片段大小四舍五入为一个整数,使每个片段的长度都相等,如果字符串长度不能被片段数整除,这将不起作用。解决方案是先计算分数切片边界,然后 然后 将它们四舍五入到最接近输入字符串将被分割的整数。此算法生成长度相差不超过 1 的切片。
我正在尝试将一个字符串转换为大小为 N 的集合。我的方法returns 产生了错误的结果。
(defn chop
[s pieces]
(let [piece-size (int (/ (count s) pieces))]
(map #(str/join %) (partition-all piece-size s))))
(def big-str (slurp "/path/to/9065-byte-string.txt"))
(count (chop (str/join (take 100 (repeat "x"))) 100))
100
(count (chop (str/join (take 10005 (repeat "x"))) 100))
101
因此,我尝试将测试字符串分成 100 份,实际上我有时会得到 101 份(如果 pieze-size 不是 100 的偶数倍)。不确定发生了什么。也许我的数学在 piece-size
.
如果我填充字符串,它会起作用,但我不想这样做。
(defn chop
[s pieces]
(let [pad-size (- 100 (mod (count s) pieces))
padded (str s (str/join (take pad-size (repeat " "))))
piece-size (int (/ (count padded) pieces))
]
(println "pad-size=" pad-size)
(println "piece-size=" piece-size)
(map #(str/join %) (partition-all piece-size padded))))
你的数学有误。
假设您有 N 个项目,并且您想要 g
个组。如果 N/g
不是整数,则您有包含不同数量项目的组。如果你想分散差异,这样定义:
(ns tst.demo.core
(:use tupelo.core tupelo.test)
(:require
[tupelo.string :as str]))
(defn chop
[s groups]
(newline)
(println :-----------------------------------------------------------------------------)
(let-spy
[N (count s)
r (/ (float N) (float groups)) ; or use `quot`
a (int (Math/floor r)) ; size of "small" groups
b (inc a) ; size of "big" groups
; Solve 2 eq's in 2 unkowns
; xa + yb = N
; x + y = g
x (- (* b groups) N) ; number of "small" groups
y (- groups x) ; number of "big" groups
N1 (* x a) ; chars in all small groups
N2 (* y b) ; chars in all big groups
>> (assert (= N (+ N1 N2))) ; verify calculated correctly
chars (vec s)
smalls (vec (partition a (take N1 chars))) ; or use `split-at`
bigs (vec (partition b (drop N1 chars)))
result (mapv str/join
(concat smalls bigs))
]
result))
单元测试:
(dotest
(is= (chop "abcd" 2) ["ab" "cd"])
(is= (chop "abcd" 3) ["a" "b" "cd"])
(is= (chop "abcde" 3) ["a" "bc" "de"])
(is= (chop "abcdef" 3) ["ab" "cd" "ef"])
(is= (chop "abcdefg" 3) ["ab" "cd" "efg"])
(let [s100 (str/join (take 100 (repeat "x")))
s105 (str/join (take 105 (repeat "x")))
r100 (chop s100 10)
r105 (chop s105 10)
]
(is= 10 (spyx (count r100)))
(is= 10 (spyx (count r105)))))
结果打印如下:
:-----------------------------------------------------------------------------
N => 4
r => 2.0
a => 2
b => 3
x => 2
y => 0
N1 => 4
N2 => 0
>> => nil
chars => [\a \b \c \d]
smalls => [(\a \b) (\c \d)]
bigs => []
result => ["ab" "cd"]
:-----------------------------------------------------------------------------
N => 4
r => 1.3333333333333333
a => 1
b => 2
x => 2
y => 1
N1 => 2
N2 => 2
>> => nil
chars => [\a \b \c \d]
smalls => [(\a) (\b)]
bigs => [(\c \d)]
result => ["a" "b" "cd"]
:-----------------------------------------------------------------------------
N => 5
r => 1.6666666666666667
a => 1
b => 2
x => 1
y => 2
N1 => 1
N2 => 4
>> => nil
chars => [\a \b \c \d \e]
smalls => [(\a)]
bigs => [(\b \c) (\d \e)]
result => ["a" "bc" "de"]
:-----------------------------------------------------------------------------
N => 6
r => 2.0
a => 2
b => 3
x => 3
y => 0
N1 => 6
N2 => 0
>> => nil
chars => [\a \b \c \d \e \f]
smalls => [(\a \b) (\c \d) (\e \f)]
bigs => []
result => ["ab" "cd" "ef"]
:-----------------------------------------------------------------------------
N => 7
r => 2.3333333333333335
a => 2
b => 3
x => 2
y => 1
N1 => 4
N2 => 3
>> => nil
chars => [\a \b \c \d \e \f \g]
smalls => [(\a \b) (\c \d)]
bigs => [(\e \f \g)]
result => ["ab" "cd" "efg"]
调试完成并了解步骤后,将 let-spy
更改为 let
并删除其他打印语句。
以上是使用my favorite template project.
制作的更新
如果您不喜欢求解方程组,您可以使用 quot
和 mod
或 remainder
来计算除法:
(defn chop
[s groups]
(let [N (count s)
nsmall (quot N groups) ; size of "small" groups
nbig (inc nsmall) ; size of "big" groups
ngrp-big (- N (* nsmall groups)) ; number of "big" groups
ngrp-small (- groups ngrp-big) ; number of "small" groups
nsmall-chars (* ngrp-small nsmall) ; chars in all small groups
[chars-small chars-large] (split-at nsmall-chars s)
smalls (partition nsmall chars-small)
bigs (partition nbig chars-large)
result (mapv str/join
(concat smalls bigs))]
result))
这里不需要浮点运算
(let [batches 3
input-sequence "abcd"
full-batches (mod (count input-sequence) batches)
full-batches (if (zero? full-batches) batches full-batches)
batch-size (/ (+ (count input-sequence) (- full-batches) batches) batches)]]
(let [[a b] (split-at (* full-batches batch-size) input-sequence)]
(map #(apply str %)
(concat (partition batch-size a)
(partition (dec batch-size) b)))))
=> ("ab" "c" "d")
我通过求解方程得到批量大小公式:
您要这样做吗?
(defn chop [s piece-count]
(let [step (/ (count s) piece-count)]
(->> (range (inc piece-count)) ;; <-- Enumerated split positions
(map #(-> % (* step) double Math/round)) ;; <-- Where to split
(partition 2 1) ;; <-- Form slice lower/upper bounds
(map (fn [[l u]] (subs s l u)))))) ;; <-- Slice input string
(chop "Input string." 3)
;; => ("Inpu" "t str" "ing.")
您的方法的主要问题是您将片段大小四舍五入为一个整数,使每个片段的长度都相等,如果字符串长度不能被片段数整除,这将不起作用。解决方案是先计算分数切片边界,然后 然后 将它们四舍五入到最接近输入字符串将被分割的整数。此算法生成长度相差不超过 1 的切片。