我的数据有复杂的规范 - 如何生成样本?
I have complex Spec for my data - how to generate samples?
我的 Clojure 规范如下:
(spec/def ::global-id string?)
(spec/def ::part-of string?)
(spec/def ::type string?)
(spec/def ::value string?)
(spec/def ::name string?)
(spec/def ::text string?)
(spec/def ::date (spec/nilable (spec/and string? #(re-matches #"^\d{4}-\d{2}-\d{2}$" %))))
(spec/def ::interaction-name string?)
(spec/def ::center (spec/coll-of string? :kind vector? :count 2))
(spec/def ::context- (spec/keys :req [::global-id ::type]
:opt [::part-of ::center]))
(spec/def ::contexts (spec/coll-of ::context-))
(spec/def ::datasource string?)
(spec/def ::datasource- (spec/nilable (spec/keys :req [::global-id ::name])))
(spec/def ::datasources (spec/coll-of ::datasource-))
(spec/def ::location string?)
(spec/def ::location-meaning- (spec/keys :req [::global-id ::location ::contexts ::type]))
(spec/def ::location-meanings (spec/coll-of ::location-meaning-))
(spec/def ::context string?)
(spec/def ::context-association-type string?)
(spec/def ::context-association-name string?)
(spec/def ::priority string?)
(spec/def ::has-context- (spec/keys :req [::context ::context-association-type ::context-association-name ::priority]))
(spec/def ::has-contexts (spec/coll-of ::has-context-))
(spec/def ::fact- (spec/keys :req [::global-id ::type ::name ::value]))
(spec/def ::facts (spec/coll-of ::fact-))
(spec/def ::attribute- (spec/keys :req [::name ::type ::value]))
(spec/def ::attributes (spec/coll-of ::attribute-))
(spec/def ::fulltext (spec/keys :req [::global-id ::text]))
(spec/def ::feature- (spec/keys :req [::global-id ::date ::location-meanings ::has-contexts ::facts ::attributes ::interaction-name]
:opt [::fulltext]))
(spec/def ::features (spec/coll-of ::feature-))
(spec/def ::attribute- (spec/keys :req [::name ::type ::value]))
(spec/def ::attributes (spec/coll-of ::attribute-))
(spec/def ::ioi-slice string?)
(spec/def ::ioi- (spec/keys :req [::global-id ::type ::datasource ::features ::attributes ::ioi-slice]))
(spec/def ::iois (spec/coll-of ::ioi-))
(spec/def ::data (spec/keys :req [::contexts ::datasources ::iois]))
(spec/def ::data- ::data)
但它无法生成样本:
(spec/fdef data->graph
:args (spec/cat :data ::xml-spec/data-))
(println (stest/check `data->graph))
那么会生成异常失败:
Couldn't satisfy such-that predicate after 100 tries.
用stest/check
自动生成spec很方便,但是如何在spec旁边也有生成器呢?
当您在从规格生成数据时看到错误 Couldn't satisfy such-that predicate after 100 tries.
时,常见原因是 s/and
规格,因为规格仅基于 [=63] 为 s/and
规格构建生成器=]第一个内部规格
这个规范似乎最有可能导致这种情况,因为 s/and
中的第一个内部 spec/predicate 是 string?
,而下面的谓词是一个正则表达式:
(s/def ::date (s/nilable (s/and string? #(re-matches #"^\d{4}-\d{2}-\d{2}$" %))))
如果您对 string?
生成器进行采样,您会发现它生成的内容不太可能与您的正则表达式匹配:
(gen/sample (s/gen string?))
=> ("" "" "X" "" "" "hT9" "7x97" "S" "9" "1Z")
test.check 将尝试(默认情况下 100 次)获取满足 such-that
条件的值,如果不满足则抛出您看到的异常。
生成日期
您可以通过多种方式为此规范实现自定义生成器。这是一个 test.check 生成器,它将创建 ISO 本地日期字符串:
(def gen-local-date-str
(let [day-range (.range (ChronoField/EPOCH_DAY))
day-min (.getMinimum day-range)
day-max (.getMaximum day-range)]
(gen/fmap #(str (LocalDate/ofEpochDay %))
(gen/large-integer* {:min day-min :max day-max}))))
此方法获取有效纪元日的范围,使用它来控制 large-integer*
生成器的范围,然后 fmap
s LocalDate/ofEpochDay
生成的整数。
(def gen-local-date-str
(gen/fmap #(-> (Instant/ofEpochMilli %)
(LocalDateTime/ofInstant ZoneOffset/UTC)
(.toLocalDate)
(str))
gen/large-integer))
这从默认的 large-integer
生成器开始,并使用 fmap
提供一个函数,该函数从生成的整数创建 java.time.Instant
,将其转换为 java.time.LocalDate
,并将其转换为恰好与您的日期字符串格式匹配的字符串。 (这在 Java 9 及更高版本 java.time.LocalDate/ofInstant
上稍微简单一些。)
另一种方法可能使用 test.chuck 的 regex-based string generator 或不同的日期 classes/formatters。请注意,我的两个示例都会生成 before/after -9999/+9999 年,这与您的 \d{4}
年正则表达式不匹配,但是生成器应该经常生成令人满意的值,以至于它可能不会对您的用例很重要。有很多方法可以生成日期值!
(gen/sample gen-local-date-str)
=>
("1969-12-31"
"1970-01-01"
"1970-01-01"
...)
使用规范的自定义生成器
然后您可以使用 s/with-gen
:
将此生成器与您的规格相关联
(s/def ::date
(s/nilable
(s/with-gen
(s/and string? #(re-matches #"^\d{4}-\d{2}-\d{2}$" %))
(constantly gen-local-date-str))))
(gen/sample (s/gen ::date))
=>
("1969-12-31"
nil ;; note that it also makes nils b/c it's wrapped in s/nilable
"1970-01-01"
...)
如果您不想将自定义生成器直接绑定到规范定义:
(gen/sample (s/gen ::data {::date (constantly gen-local-date-str)}))
使用此规范和生成器,我能够生成更大的 ::data
规范,尽管由于某些集合规范,输出非常 很大。您还可以使用规范中的 :gen-max
选项在生成期间控制这些文件的大小。
我的 Clojure 规范如下:
(spec/def ::global-id string?)
(spec/def ::part-of string?)
(spec/def ::type string?)
(spec/def ::value string?)
(spec/def ::name string?)
(spec/def ::text string?)
(spec/def ::date (spec/nilable (spec/and string? #(re-matches #"^\d{4}-\d{2}-\d{2}$" %))))
(spec/def ::interaction-name string?)
(spec/def ::center (spec/coll-of string? :kind vector? :count 2))
(spec/def ::context- (spec/keys :req [::global-id ::type]
:opt [::part-of ::center]))
(spec/def ::contexts (spec/coll-of ::context-))
(spec/def ::datasource string?)
(spec/def ::datasource- (spec/nilable (spec/keys :req [::global-id ::name])))
(spec/def ::datasources (spec/coll-of ::datasource-))
(spec/def ::location string?)
(spec/def ::location-meaning- (spec/keys :req [::global-id ::location ::contexts ::type]))
(spec/def ::location-meanings (spec/coll-of ::location-meaning-))
(spec/def ::context string?)
(spec/def ::context-association-type string?)
(spec/def ::context-association-name string?)
(spec/def ::priority string?)
(spec/def ::has-context- (spec/keys :req [::context ::context-association-type ::context-association-name ::priority]))
(spec/def ::has-contexts (spec/coll-of ::has-context-))
(spec/def ::fact- (spec/keys :req [::global-id ::type ::name ::value]))
(spec/def ::facts (spec/coll-of ::fact-))
(spec/def ::attribute- (spec/keys :req [::name ::type ::value]))
(spec/def ::attributes (spec/coll-of ::attribute-))
(spec/def ::fulltext (spec/keys :req [::global-id ::text]))
(spec/def ::feature- (spec/keys :req [::global-id ::date ::location-meanings ::has-contexts ::facts ::attributes ::interaction-name]
:opt [::fulltext]))
(spec/def ::features (spec/coll-of ::feature-))
(spec/def ::attribute- (spec/keys :req [::name ::type ::value]))
(spec/def ::attributes (spec/coll-of ::attribute-))
(spec/def ::ioi-slice string?)
(spec/def ::ioi- (spec/keys :req [::global-id ::type ::datasource ::features ::attributes ::ioi-slice]))
(spec/def ::iois (spec/coll-of ::ioi-))
(spec/def ::data (spec/keys :req [::contexts ::datasources ::iois]))
(spec/def ::data- ::data)
但它无法生成样本:
(spec/fdef data->graph
:args (spec/cat :data ::xml-spec/data-))
(println (stest/check `data->graph))
那么会生成异常失败:
Couldn't satisfy such-that predicate after 100 tries.
用stest/check
自动生成spec很方便,但是如何在spec旁边也有生成器呢?
当您在从规格生成数据时看到错误 Couldn't satisfy such-that predicate after 100 tries.
时,常见原因是 s/and
规格,因为规格仅基于 [=63] 为 s/and
规格构建生成器=]第一个内部规格
这个规范似乎最有可能导致这种情况,因为 s/and
中的第一个内部 spec/predicate 是 string?
,而下面的谓词是一个正则表达式:
(s/def ::date (s/nilable (s/and string? #(re-matches #"^\d{4}-\d{2}-\d{2}$" %))))
如果您对 string?
生成器进行采样,您会发现它生成的内容不太可能与您的正则表达式匹配:
(gen/sample (s/gen string?))
=> ("" "" "X" "" "" "hT9" "7x97" "S" "9" "1Z")
test.check 将尝试(默认情况下 100 次)获取满足 such-that
条件的值,如果不满足则抛出您看到的异常。
生成日期
您可以通过多种方式为此规范实现自定义生成器。这是一个 test.check 生成器,它将创建 ISO 本地日期字符串:
(def gen-local-date-str
(let [day-range (.range (ChronoField/EPOCH_DAY))
day-min (.getMinimum day-range)
day-max (.getMaximum day-range)]
(gen/fmap #(str (LocalDate/ofEpochDay %))
(gen/large-integer* {:min day-min :max day-max}))))
此方法获取有效纪元日的范围,使用它来控制 large-integer*
生成器的范围,然后 fmap
s LocalDate/ofEpochDay
生成的整数。
(def gen-local-date-str
(gen/fmap #(-> (Instant/ofEpochMilli %)
(LocalDateTime/ofInstant ZoneOffset/UTC)
(.toLocalDate)
(str))
gen/large-integer))
这从默认的 large-integer
生成器开始,并使用 fmap
提供一个函数,该函数从生成的整数创建 java.time.Instant
,将其转换为 java.time.LocalDate
,并将其转换为恰好与您的日期字符串格式匹配的字符串。 (这在 Java 9 及更高版本 java.time.LocalDate/ofInstant
上稍微简单一些。)
另一种方法可能使用 test.chuck 的 regex-based string generator 或不同的日期 classes/formatters。请注意,我的两个示例都会生成 before/after -9999/+9999 年,这与您的 \d{4}
年正则表达式不匹配,但是生成器应该经常生成令人满意的值,以至于它可能不会对您的用例很重要。有很多方法可以生成日期值!
(gen/sample gen-local-date-str)
=>
("1969-12-31"
"1970-01-01"
"1970-01-01"
...)
使用规范的自定义生成器
然后您可以使用 s/with-gen
:
(s/def ::date
(s/nilable
(s/with-gen
(s/and string? #(re-matches #"^\d{4}-\d{2}-\d{2}$" %))
(constantly gen-local-date-str))))
(gen/sample (s/gen ::date))
=>
("1969-12-31"
nil ;; note that it also makes nils b/c it's wrapped in s/nilable
"1970-01-01"
...)
如果您不想将自定义生成器直接绑定到规范定义:
(gen/sample (s/gen ::data {::date (constantly gen-local-date-str)}))
使用此规范和生成器,我能够生成更大的 ::data
规范,尽管由于某些集合规范,输出非常 很大。您还可以使用规范中的 :gen-max
选项在生成期间控制这些文件的大小。