Alloy分析器是"a falsifier"吗?

Is Alloy Analyzer "a falsifier"?

在我的社区中,最近我们积极使用正式规范的术语 "falsification"。该术语出现在,例如: https://www.cs.huji.ac.il/~ornak/publications/cav05.pdf

不知AlloyAnalyzer是否做证伪。这对我来说似乎是真的,但我不确定。这是正确的吗?如果不是,有什么区别?

根据我对证伪的理解,是的,Alloy做到了。

当您查看创建 Alloy 背后的动机时,就会变得非常明显,正如软件抽象一书中所说:

This book is the result of a 10-year effort to bridge this gap, to develop a language (Alloy) that captures the essence of software abstractions simply and succinctly, with an analysis that is fully automatic, and can expose the subtlest of flaws.

是的,Alloy是一个伪造者。 Alloy 在 20 年前推出时的主要新颖之处在于证明伪造通常比验证更重要,因为大多数设计都是不正确的,因此分析器的作用应该是发现错误,而不是表明他们不存在。有关此问题的讨论,请参阅 Software analysis: A roadmap (Jackson and Rinard, 2000); Section 5.1.1, Instance Finding and Undecidability Compromises in Software Abstractions(Jackson 2006)中的第 1.4 节,验证与反驳。

但在 Alloy 的案例中,还有另一个方面,即从验证的角度来看,范围完备分析实际上非常有效。这种说法就是我们所说的 "small scope hypothesis" -- 大多数错误都可以在小范围内找到(即分析受每种基本类型中少量固定元素的限制)。

顺便说一句,Alloy 是最早建议使用 SAT 进行有界验证的工具之一。例如,参见 Boolean Compilation of Relational Specifications(Daniel Jackson,1998),这是第一篇有界模型检查论文的作者所熟知的技术报告,其中讨论了 Alloy 的前身 Nitpick,如下所示条款:

The hypothesis underlying Nitpick is a controversial one. It is that, in practice, small scopes suffice. In other words, most errors can be demonstrated by counterexamples within a small scope. This is a purely empirical hypothesis, since the relevant distribution of errors cannot be described mathematically: it is determined by the specifications people write.

Our hope is that successful use of the Nitpick tool will justify the hypothesis. There is some evidence already for its plausibility. In our experience with Nitpick to date, we have not gained further information by increasing the scope beyond 6.

A similar notion of scope is implicit in the context of model checking of hardware. Although the individual state machines are usually finite, the design is frequently parameterized by the number of machines executing in parallel. This metric is analogous to scope; as the number of machines increases, the state space increases exponentially, and it is rarely possible to analyze a system involving more than a handful of machines. Fortunately, however, it seems that only small configurations are required to find errors. The celebrated analysis of the Futurebus+ cache protocol [C+95], which perhaps marked the turning point in model checking’s industrial reputation, was performed for up to 8 processors and 3 buses. The reported flaws, however, could be demonstrated with counterexamples involving at most 3 processors and 2 buses.