正则表达式属性(理论)

Regular expression properties (theory)

我正在阅读 Aho 和 Ullman 的著作《解析、翻译和编译理论》。在第 2 章介绍正则表达式的部分中,有一个正则表达式的属性列表。我不明白属性 2 和 8。这是属性列表:

(1) + = +

(2) ∅* =

(3) + ( + ) = ( + ) +

(4) () = ()

(5) ( + ) = +

(6) ( + ) = +

(7) = =

(8) ∅ = ∅ = ∅

(9) * = + *

(10) (* )* = *

(11) + =

(12) + ∅ =

其中∅为表示正则集的正则表达式∅,为任意正则表达式,为空串

性质 (2) 和 (8) 如何证明?

编辑:为了解释+、*等符号,这里给出一些书中给出的定义(引用):

DEFINITION Let be a finite alphabet. We define a regular set over recursively in the following manner:

(1) ∅ (the empty set) is a regular set over .

(2) {} is a regular set over .

(3) {} is a regular set over for all in .

(4) If and are regular sets over , then so are

(a) ∪ .

(b) .

(c) *.

(5) Nothing else is a regular set.

Thus a subset of * is regular if and only if it is ∅, {}, or {}, for some in , or can be obtained from these by a finite number of applications of the operations union, concatenation, and closure.

.

DEFINITION Regular expressions over and the regular expressions they denote are defined recursively, as follows:

(1) ∅ is a regular expression denoting the regular set ∅.

(2) is a regular expression denoting the regular set {}.

(3) in is a regular expression denoting the regular set {}.

(4) If and are regular expressions denoting the regular sets and , respectively, then

(a) (+) is a regular expression denoting ∪ .

(b) () is a regular expression denoting .

(c) ()* is a regular expression denoting *.

(5) Nothing else is a regular expression.

我的猜测是 2 和 8 属性可能只是一个简单的数学运算:

属性 2

∅是空集,则∅* = 成立,∅+ = 也成立,∅{Infinity} = 也成立,因为e是空串。

一个正则表达式是一个字符串,因此一个空的正则表达式重复任意次数或任何操作,仍然是一个空的正则表达式,它又等于一个空的右边的字符串。

参考: Why is the Kleene star of a null set is an empty string?


属性 8

∅ = ∅ = ∅ 是正确的,∅∅∅ = ∅∅∅ = ∅ 也是如此,因为空集与任何东西组合都会产生空集。


参考:

Regular expressions with empty set/empty string

What is the difference between language of empty string and empty set language?

How can concatenating empty sets (languages) result in a set containing empty string?