TXR:使用函数使用更复杂的语法解析包含 unicode 的摘要报告
TXR: Parsing summary reports containing unicode with a more complicated syntax using functions
我正在尝试解析一堆计算机报告的 "summary" 区域,其中报告名称及其相关变量随文件而变化。我按照以下格式给出了一个虚构的例子:
Summary Report
Bath Tub
Temperature: 30 °C
Water ready
volume: 200000 cm³
Bath Room
Floor Area: 40 ft²
Door Height: 9 ± 0.1 ft
Full Report Set
从上面很难看出白色 space 是什么样子,所以这是我的文本编辑器的屏幕截图,其中可见白色 space。
关注区域以 Summary Report
开始,以 Full Report Set
结束。属性可能跨越两行。 属性 名称对齐,因此冒号 :
在每个子报告中保持相同的字符位置。
从诊断输出来看,我利用这一事实的尝试似乎没有奏效。
txr: (src/generic-micrometrics-report.txr:36) chr mismatch (position 11 vs. k)
txr: (src/generic-micrometrics-report.txr:36) variable k binding mismatch (13 vs. 12)
txr: (src/generic-micrometrics-report.txr:36) chr mismatch (position 12 vs. k)
txr: (src/generic-micrometrics-report.txr:36) string matched, position 13-18 (data/dummy-generic-report.txt:6)
txr: (src/generic-micrometrics-report.txr:36) Temperature: 30 °C
txr: (src/generic-micrometrics-report.txr:36) ^ ^
txr: (src/generic-micrometrics-report.txr:23) spec ran out of data
txr: (source location n/a) function (capture (nil (k . 13) (report . "Bath Tub"))) failed
我已经包含了下面的代码。你能解释为什么这段代码不起作用吗?我是在做我想用 colon_position 函数做的事情吗?如果是这样,为什么会失败?您将如何编写 capture
函数?这是您会采用的一般方法吗?有没有更好的办法?非常感谢您的帮助和建议。
@; This output format always starts with or ends with atleast 2 blank spaces.
@; Fully blank spaced lines follow each property value pair line.
@(define blank_spaces)
@/[ ]+/@(eol)
@(end)
@; All colons align at the same column position within the body of a report.
@; If that doesn't happen, that means there is nothing to capture,
@; which shouldn't happen.
@; This function should bind the appropriate position without updating
@; the line position.
@; Reports end when there is an empty line, so don't look past that.
@(define colon_position (column))
@(trailer)
@(gather :vars (column))
@(skip)@(chr column):@(skip)
@(until)
@(end)
@(end)
@; Capture values for a property. Values are always given on a single line.
@; If there is error information, it will be indicated by a ± character.#\x00B1
@(define capture (value error units))
@(cases)@value@\ ±@\ @error@\ @units@/[ ]+/@(eol)@\
@(or)@value@\ @units@/[ ]+/@(eol)@(bind error "")@\
@(end)
@(end)
Summary Report
@(collect :vars (report property value error units))
@report
@(forget k)
@(colon_position k)
@(cases)
@property@(chr k): @(capture value error units)@(blank_spaces)
@(ord)
@; Properties can span two lines. I have not seen any that span more.
@property_head@(chr k) @(blank_spaces)
@property_tail@(chr k): @(capture value error units)@(blank_spaces)
@(merge property property_head property_tail)
@(cat property " ")
@(end)
@(blank_spaces)
@(end)
Full Report Set
@(output)
report,property,value,error,units
@(repeat)
@report,@property,@value,@error,@units
@(end)
@(end)
在这里和那里进行一些更改后,我现在得到以下输出:
report,property,value,error,units
Bath Tub,Temperature,30,,°C
Bath Tub,Water ready volume,200000,,cm³
Bath Room,Floor Area,40,,ft²
Bath Room,Door Height,9,0.1,ft
代码:
@; This output format always starts with or ends with atleast 2 blank spaces.
@; Fully blank spaced lines follow each property value pair line.
@(define blank_spaces)@\
@/[ ]*/@(eol)@\
@(end)
@; All colons align at the same column position within the body of a report.
@; If that doesn't happen, that means there is nothing to capture,
@; which shouldn't happen.
@; This function should bind the appropriate position without updating
@; the line position.
@; Reports end when there is an empty line, so don't look past that.
@(define colon_position (column))
@ (trailer)
@ (gather :vars (column))
@ (skip)@(chr column):@(skip)
@(until)
@(end)
@(end)
@; Capture values for a property. Values are always given on a single line.
@; If there is error information, it will be indicated by a ± character.#\x00B1
@(define capture (value error units))@\
@(cases)@value@\ ±@\ @error@\ @units @(eol)@\
@(or)@value@\ @units@/[ ]+/@(eol)@(bind error "")@\
@(end)@\
@(end)
Summary Report
@(collect :vars (report property value error units))
@report
@ (colon_position k)
@ (collect)
@ (cases)
@property@(chr k): @(capture value error units)@(blank_spaces)
@ (or)
@; Properties can span two lines. I have not seen any that span more.
@property_head@(chr k) @(blank_spaces)
@property_tail@(chr k): @(capture value error units)@(blank_spaces)
@ (merge property property_head property_tail)
@ (cat property " ")
@ (end)
@ (until)
@ (end)
@(until)
Full Report Set
@(end)
@(output)
report,property,value,error,units
@ (repeat)
@ (repeat)
@report,@property,@value,@error,@units
@ (end)
@ (end)
@(end)
冒号的技巧确实有效(trailer
和 chr
的很好的应用)。代码被绊倒的地方是各种小细节。将 @(or)
拼错为 @(orf)
,应该是水平的模式函数没有使用正确的 @\
行继续,并且 @(blank_spaces)
中的不正确导致它想要无条件地消耗一些空间, @(merge)
之前的虚假空格等等。
此外,主要问题是数据是双重嵌套的,所以我们需要一个收集中的收集。我们还需要适当的 @(until)
终止模式。对于内部收集,我选择了两个空行;这似乎是终止这些部分的原因(它适用于数据样本)。外部收集在 Full Report Set
处终止,但这并不是绝对必要的。
为了配合嵌套集合,我们在输出中使用嵌套重复。
我应用了一些缩进。水平函数可以使用空格缩进,因为忽略续行后的前导空格。
那个@(forget k)
没了;那里的范围内没有 k
。周围 collect 的每次迭代都会在没有 k
.
的环境中重新绑定 k
附录:这里是与代码的差异,以使其对意外数据更加健壮。实际上,内部的 @(collect)
将默默地跳过不匹配的元素,这意味着如果文件包含不符合预期情况的元素,它们将被忽略。这种行为已经被利用:这就是数据项之间的空行被忽略的原因。我们可以使用 :gap 0
(收集的区域必须是连续的)来收紧它,并视情况处理空白行。然后,回退案例可以将输入行诊断为无法识别:
diff --git a/extract.txr b/extract.txr
index 8c93d89..3d1fac6 100644
--- a/extract.txr
+++ b/extract.txr
@@ -24,6 +24,7 @@
@(or)@value@\ @units@/[ ]+/@(eol)@(bind error "")@\
@(end)@\
@(end)
+@(name file)
Summary Report
@(collect :vars (report property value error units))
@@ -31,7 +32,7 @@
@report
@ (colon_position k)
-@ (collect)
+@ (collect :gap 0)
@ (cases)
@property@(chr k): @(capture value error units)@(blank_spaces)
@ (or)
@@ -40,6 +41,12 @@
@property_tail@(chr k): @(capture value error units)@(blank_spaces)
@ (merge property property_head property_tail)
@ (cat property " ")
+@ (or)
+
+@ (or)
+@ (line ln)
+@ badline
+@ (throw error `@file:@ln unrecognized syntax: @badline`)
@ (end)
@ (until)
我正在尝试解析一堆计算机报告的 "summary" 区域,其中报告名称及其相关变量随文件而变化。我按照以下格式给出了一个虚构的例子:
Summary Report
Bath Tub
Temperature: 30 °C
Water ready
volume: 200000 cm³
Bath Room
Floor Area: 40 ft²
Door Height: 9 ± 0.1 ft
Full Report Set
从上面很难看出白色 space 是什么样子,所以这是我的文本编辑器的屏幕截图,其中可见白色 space。
关注区域以 Summary Report
开始,以 Full Report Set
结束。属性可能跨越两行。 属性 名称对齐,因此冒号 :
在每个子报告中保持相同的字符位置。
从诊断输出来看,我利用这一事实的尝试似乎没有奏效。
txr: (src/generic-micrometrics-report.txr:36) chr mismatch (position 11 vs. k) txr: (src/generic-micrometrics-report.txr:36) variable k binding mismatch (13 vs. 12) txr: (src/generic-micrometrics-report.txr:36) chr mismatch (position 12 vs. k) txr: (src/generic-micrometrics-report.txr:36) string matched, position 13-18 (data/dummy-generic-report.txt:6) txr: (src/generic-micrometrics-report.txr:36) Temperature: 30 °C
txr: (src/generic-micrometrics-report.txr:36) ^ ^ txr: (src/generic-micrometrics-report.txr:23) spec ran out of data txr: (source location n/a) function (capture (nil (k . 13) (report . "Bath Tub"))) failed
我已经包含了下面的代码。你能解释为什么这段代码不起作用吗?我是在做我想用 colon_position 函数做的事情吗?如果是这样,为什么会失败?您将如何编写 capture
函数?这是您会采用的一般方法吗?有没有更好的办法?非常感谢您的帮助和建议。
@; This output format always starts with or ends with atleast 2 blank spaces.
@; Fully blank spaced lines follow each property value pair line.
@(define blank_spaces)
@/[ ]+/@(eol)
@(end)
@; All colons align at the same column position within the body of a report.
@; If that doesn't happen, that means there is nothing to capture,
@; which shouldn't happen.
@; This function should bind the appropriate position without updating
@; the line position.
@; Reports end when there is an empty line, so don't look past that.
@(define colon_position (column))
@(trailer)
@(gather :vars (column))
@(skip)@(chr column):@(skip)
@(until)
@(end)
@(end)
@; Capture values for a property. Values are always given on a single line.
@; If there is error information, it will be indicated by a ± character.#\x00B1
@(define capture (value error units))
@(cases)@value@\ ±@\ @error@\ @units@/[ ]+/@(eol)@\
@(or)@value@\ @units@/[ ]+/@(eol)@(bind error "")@\
@(end)
@(end)
Summary Report
@(collect :vars (report property value error units))
@report
@(forget k)
@(colon_position k)
@(cases)
@property@(chr k): @(capture value error units)@(blank_spaces)
@(ord)
@; Properties can span two lines. I have not seen any that span more.
@property_head@(chr k) @(blank_spaces)
@property_tail@(chr k): @(capture value error units)@(blank_spaces)
@(merge property property_head property_tail)
@(cat property " ")
@(end)
@(blank_spaces)
@(end)
Full Report Set
@(output)
report,property,value,error,units
@(repeat)
@report,@property,@value,@error,@units
@(end)
@(end)
在这里和那里进行一些更改后,我现在得到以下输出:
report,property,value,error,units
Bath Tub,Temperature,30,,°C
Bath Tub,Water ready volume,200000,,cm³
Bath Room,Floor Area,40,,ft²
Bath Room,Door Height,9,0.1,ft
代码:
@; This output format always starts with or ends with atleast 2 blank spaces.
@; Fully blank spaced lines follow each property value pair line.
@(define blank_spaces)@\
@/[ ]*/@(eol)@\
@(end)
@; All colons align at the same column position within the body of a report.
@; If that doesn't happen, that means there is nothing to capture,
@; which shouldn't happen.
@; This function should bind the appropriate position without updating
@; the line position.
@; Reports end when there is an empty line, so don't look past that.
@(define colon_position (column))
@ (trailer)
@ (gather :vars (column))
@ (skip)@(chr column):@(skip)
@(until)
@(end)
@(end)
@; Capture values for a property. Values are always given on a single line.
@; If there is error information, it will be indicated by a ± character.#\x00B1
@(define capture (value error units))@\
@(cases)@value@\ ±@\ @error@\ @units @(eol)@\
@(or)@value@\ @units@/[ ]+/@(eol)@(bind error "")@\
@(end)@\
@(end)
Summary Report
@(collect :vars (report property value error units))
@report
@ (colon_position k)
@ (collect)
@ (cases)
@property@(chr k): @(capture value error units)@(blank_spaces)
@ (or)
@; Properties can span two lines. I have not seen any that span more.
@property_head@(chr k) @(blank_spaces)
@property_tail@(chr k): @(capture value error units)@(blank_spaces)
@ (merge property property_head property_tail)
@ (cat property " ")
@ (end)
@ (until)
@ (end)
@(until)
Full Report Set
@(end)
@(output)
report,property,value,error,units
@ (repeat)
@ (repeat)
@report,@property,@value,@error,@units
@ (end)
@ (end)
@(end)
冒号的技巧确实有效(trailer
和 chr
的很好的应用)。代码被绊倒的地方是各种小细节。将 @(or)
拼错为 @(orf)
,应该是水平的模式函数没有使用正确的 @\
行继续,并且 @(blank_spaces)
中的不正确导致它想要无条件地消耗一些空间, @(merge)
之前的虚假空格等等。
此外,主要问题是数据是双重嵌套的,所以我们需要一个收集中的收集。我们还需要适当的 @(until)
终止模式。对于内部收集,我选择了两个空行;这似乎是终止这些部分的原因(它适用于数据样本)。外部收集在 Full Report Set
处终止,但这并不是绝对必要的。
为了配合嵌套集合,我们在输出中使用嵌套重复。
我应用了一些缩进。水平函数可以使用空格缩进,因为忽略续行后的前导空格。
那个@(forget k)
没了;那里的范围内没有 k
。周围 collect 的每次迭代都会在没有 k
.
k
附录:这里是与代码的差异,以使其对意外数据更加健壮。实际上,内部的 @(collect)
将默默地跳过不匹配的元素,这意味着如果文件包含不符合预期情况的元素,它们将被忽略。这种行为已经被利用:这就是数据项之间的空行被忽略的原因。我们可以使用 :gap 0
(收集的区域必须是连续的)来收紧它,并视情况处理空白行。然后,回退案例可以将输入行诊断为无法识别:
diff --git a/extract.txr b/extract.txr
index 8c93d89..3d1fac6 100644
--- a/extract.txr
+++ b/extract.txr
@@ -24,6 +24,7 @@
@(or)@value@\ @units@/[ ]+/@(eol)@(bind error "")@\
@(end)@\
@(end)
+@(name file)
Summary Report
@(collect :vars (report property value error units))
@@ -31,7 +32,7 @@
@report
@ (colon_position k)
-@ (collect)
+@ (collect :gap 0)
@ (cases)
@property@(chr k): @(capture value error units)@(blank_spaces)
@ (or)
@@ -40,6 +41,12 @@
@property_tail@(chr k): @(capture value error units)@(blank_spaces)
@ (merge property property_head property_tail)
@ (cat property " ")
+@ (or)
+
+@ (or)
+@ (line ln)
+@ badline
+@ (throw error `@file:@ln unrecognized syntax: @badline`)
@ (end)
@ (until)