为每个键加载带有行号的 YAML
Loading YAML with line number for each key
假设我有一个如下所示的 YAML 文件:
en:
errors:
# Some comment
format: "%{attribute} %{message}"
# One more comment
messages:
"1": "Message 1"
"2": "Message 2"
long_error_message: |
This is a
multiline message
date:
format: "YYYY-MM-DD"
我怎样才能把它读成这样的 Ruby Hash
?
{
'en': {
'errors': {
'format': { value: '%{attribute} %{message}', line: 4 }
'messages': {
'1': { value: 'Message 1', line: 8 },
'2': { value: 'Message 2', line: 9 }
}
'long_error_message' : { value: "This is a\nmultiline message", line: 11 }
},
'date': {
'format': { value: 'YYYY-MM-DD', line: 16 }
}
}
}
我尝试使用 YAML: Find line number of key? 中提到的技巧作为起点并实现了 Psych::Handler
,但感觉我必须重写 Psych 中的大量代码才能得到这个上班。
有什么办法可以解决这个问题吗?
我们可以通过 Psych 提供的已解析哈希递归并找到每个键的行号来手动添加数字。以下代码将匹配您指定的结果。
require 'psych'
def add_line_numbers(lines, hash)
# Ruby cannot iterate and modify a hash at the same time.
# So we dup the hash and iterate over the dup.
iterator = hash.dup
iterator.each do |key, value|
if value.is_a?(Hash)
add_line_numbers(lines, value)
else
index = lines.index { |line| line =~ /^\s.?*#{key}.?\:/ }
hash[key] = { "value" => value, "line" => (index + 1) }
end
end
end
yaml_file = File.expand_path('../foo.yml', __FILE__)
lines = File.readlines(yaml_file)
data = Psych.load(lines.join("\n"))
add_line_numbers(lines, data)
puts data
您似乎想要获取作为映射值的任何标量值,并将其替换为具有包含原始值的 value
键和包含以下行的 line
键的散列数.
以下几乎可以工作,主要问题是多行字符串,其中给定的行号是 Yaml 中下一个内容的开始。问题是,当处理程序 scalar
方法被调用时,解析器已经超出了感兴趣的标量,因此 mark
在知道标量已经结束时给出了位置的行。在您的示例中的大多数情况下,这无关紧要,但对于多行情况,它会给出错误的值。如果不进入 Psych C 代码,我看不到从 mark
获取标量开头的解析器信息的任何方法。
require 'psych'
# Psych's first step is to parse the Yaml into an AST of Node objects
# so we open the Node class and add a way to track the line.
class Psych::Nodes::Node
attr_accessor :line
end
# We need to provide a handler that will add the line to the node
# as it is parsed. TreeBuilder is the "usual" handler, that
# creates the AST.
class LineNumberHandler < Psych::TreeBuilder
# The handler needs access to the parser in order to call mark
attr_accessor :parser
# We are only interested in scalars, so here we override
# the method so that it calls mark and adds the line info
# to the node.
def scalar value, anchor, tag, plain, quoted, style
mark = parser.mark
s = super
s.line = mark.line
s
end
end
# The next step is to convert the AST to a Ruby object.
# Psych does this using the visitor pattern with the ToRuby
# visitor. Here we patch ToRuby rather than inherit from it
# as it makes the last step a little easier.
class Psych::Visitors::ToRuby
# This is the method for creating hashes. There may be problems
# with Yaml mappings that have tags.
def revive_hash hash, o
o.children.each_slice(2) { |k,v|
key = accept(k)
val = accept(v)
# This is the important bit. If the value is a scalar,
# we replace it with the desired hash.
if v.is_a? ::Psych::Nodes::Scalar
val = { "value" => val, "line" => v.line + 1} # line is 0 based, so + 1
end
# Code dealing with << (for merging hashes) omitted.
# If you need this you will probably need to copy it
# in here. See the method:
# https://github.com/tenderlove/psych/blob/v2.0.13/lib/psych/visitors/to_ruby.rb#L333-L365
hash[key] = val
}
hash
end
end
yaml = get_yaml_from_wherever
# Put it all together
handler = LineNumberHandler.new
parser = Psych::Parser.new(handler)
# Provide the handler with a reference to the parser
handler.parser = parser
# The actual parsing
parser.parse yaml
# We patched ToRuby rather than inherit so we can use to_ruby here
puts handler.root.to_ruby
我建议您选择@matt 的解决方案。除了更谨慎之外,它还能正确处理标量。
技巧可能是 monkeypatch TreeBuilder#scalar
方法:
y='
en:
errors:
# Some comment
format: "%{attribute} %{message}"
# One more comment
messages:
"1": "Message 1"
"2": "Message 2"
long_error_message: |
This is a
multiline message
date:
format: "YYYY-MM-DD"'
require 'yaml'
yphc = Class.new(YAML.parser.handler.class) do
def scalar value, anchor, tag, plain, quoted, style
value = { value: value, line: $line } if style > 1
$line = $parser.mark.line + 1 # handle multilines properly
super value, anchor, tag, plain, quoted, style
end
end
$parser = Psych::Parser.new(yphc.new)
# more careful handling required for multidocs
result = $parser.parse(y).handler.root.to_ruby[0]
实际上,我们快完成了。唯一剩下的就是留下带有行号 的补丁值,只留下 。我故意没有把这个逻辑放在解析内容中。
def unmark_keys hash
hash.map do |k,v|
[k.is_a?(Hash) ? k[:value] : k, v.is_a?(Hash) ? unmark_keys(v) : v]
end.to_h
end
p unmark_keys result
#⇒ {"en"=>
#⇒ {"errors"=>
#⇒ {
#⇒ "format"=>{:value=>"%{attribute} %{message}", :line=>4},
#⇒ "messages"=>
#⇒ {
#⇒ "1"=>{:value=>"Message 1", :line=>8},
#⇒ "2"=>{:value=>"Message 2", :line=>9}
#⇒ }
#⇒ },
#⇒ "long_error_message"=>{
#⇒ :value=>"This is a\nmultiline message\n", :line=>11
#⇒ },
#⇒ "date"=>{"format"=>{:value=>"YYYY-MM-DD", :line=>16}}
#⇒ }
#⇒ }
当然有人可能想要摆脱全局变量等。我试图保持核心实现尽可能干净。
我们开始吧。希望对你有帮助。
UPD 感谢@matt,上面的代码在标量上失败了:
key1:
val1
key2: val2
YAML 允许使用此语法,但上述方法无法正确处理它。不会为此返回任何行。除了无理取闹的标量支持之外,还可以正确报告其他任何行,请参阅对此答案的评论以获取更多详细信息。
我采用了@matt 的解决方案并创建了一个不需要手动修补的版本。它还处理跨越多行的值和 YAML 的 <<
运算符。
require "psych"
require "pp"
ValueWithLineNumbers = Struct.new(:value, :lines)
class Psych::Nodes::ScalarWithLineNumber < Psych::Nodes::Scalar
attr_reader :line_number
def initialize(*args, line_number)
super(*args)
@line_number = line_number
end
end
class Psych::TreeWithLineNumbersBuilder < Psych::TreeBuilder
attr_accessor :parser
def scalar(*args)
node = Psych::Nodes::ScalarWithLineNumber.new(*args, parser.mark.line)
@last.children << node
node
end
end
class Psych::Visitors::ToRubyWithLineNumbers < Psych::Visitors::ToRuby
def visit_Psych_Nodes_ScalarWithLineNumber(node)
visit_Psych_Nodes_Scalar(node)
end
private
def revive_hash(hash, node)
node.children.each_slice(2) do |k, v|
key = accept(k)
val = accept(v)
if v.is_a? Psych::Nodes::ScalarWithLineNumber
start_line = end_line = v.line_number + 1
if k.is_a? Psych::Nodes::ScalarWithLineNumber
start_line = k.line_number + 1
end
val = ValueWithLineNumbers.new(val, start_line..end_line)
end
if key == SHOVEL && k.tag != "tag:yaml.org,2002:str"
case v
when Psych::Nodes::Alias, Psych::Nodes::Mapping
begin
hash.merge! val
rescue TypeError
hash[key] = val
end
when Psych::Nodes::Sequence
begin
h = {}
val.reverse_each do |value|
h.merge! value
end
hash.merge! h
rescue TypeError
hash[key] = val
end
else
hash[key] = val
end
else
hash[key] = val
end
end
hash
end
end
# Usage:
handler = Psych::TreeWithLineNumbersBuilder.new
handler.parser = Psych::Parser.new(handler)
handler.parser.parse(yaml)
ruby_with_line_numbers =
Psych::Visitors::ToRubyWithLineNumbers.create.accept(handler.root)
pp ruby_with_line_numbers
我发布了 gist of the above 以及一些评论和示例
假设我有一个如下所示的 YAML 文件:
en: errors: # Some comment format: "%{attribute} %{message}" # One more comment messages: "1": "Message 1" "2": "Message 2" long_error_message: | This is a multiline message date: format: "YYYY-MM-DD"
我怎样才能把它读成这样的 Ruby Hash
?
{
'en': {
'errors': {
'format': { value: '%{attribute} %{message}', line: 4 }
'messages': {
'1': { value: 'Message 1', line: 8 },
'2': { value: 'Message 2', line: 9 }
}
'long_error_message' : { value: "This is a\nmultiline message", line: 11 }
},
'date': {
'format': { value: 'YYYY-MM-DD', line: 16 }
}
}
}
我尝试使用 YAML: Find line number of key? 中提到的技巧作为起点并实现了 Psych::Handler
,但感觉我必须重写 Psych 中的大量代码才能得到这个上班。
有什么办法可以解决这个问题吗?
我们可以通过 Psych 提供的已解析哈希递归并找到每个键的行号来手动添加数字。以下代码将匹配您指定的结果。
require 'psych'
def add_line_numbers(lines, hash)
# Ruby cannot iterate and modify a hash at the same time.
# So we dup the hash and iterate over the dup.
iterator = hash.dup
iterator.each do |key, value|
if value.is_a?(Hash)
add_line_numbers(lines, value)
else
index = lines.index { |line| line =~ /^\s.?*#{key}.?\:/ }
hash[key] = { "value" => value, "line" => (index + 1) }
end
end
end
yaml_file = File.expand_path('../foo.yml', __FILE__)
lines = File.readlines(yaml_file)
data = Psych.load(lines.join("\n"))
add_line_numbers(lines, data)
puts data
您似乎想要获取作为映射值的任何标量值,并将其替换为具有包含原始值的 value
键和包含以下行的 line
键的散列数.
以下几乎可以工作,主要问题是多行字符串,其中给定的行号是 Yaml 中下一个内容的开始。问题是,当处理程序 scalar
方法被调用时,解析器已经超出了感兴趣的标量,因此 mark
在知道标量已经结束时给出了位置的行。在您的示例中的大多数情况下,这无关紧要,但对于多行情况,它会给出错误的值。如果不进入 Psych C 代码,我看不到从 mark
获取标量开头的解析器信息的任何方法。
require 'psych'
# Psych's first step is to parse the Yaml into an AST of Node objects
# so we open the Node class and add a way to track the line.
class Psych::Nodes::Node
attr_accessor :line
end
# We need to provide a handler that will add the line to the node
# as it is parsed. TreeBuilder is the "usual" handler, that
# creates the AST.
class LineNumberHandler < Psych::TreeBuilder
# The handler needs access to the parser in order to call mark
attr_accessor :parser
# We are only interested in scalars, so here we override
# the method so that it calls mark and adds the line info
# to the node.
def scalar value, anchor, tag, plain, quoted, style
mark = parser.mark
s = super
s.line = mark.line
s
end
end
# The next step is to convert the AST to a Ruby object.
# Psych does this using the visitor pattern with the ToRuby
# visitor. Here we patch ToRuby rather than inherit from it
# as it makes the last step a little easier.
class Psych::Visitors::ToRuby
# This is the method for creating hashes. There may be problems
# with Yaml mappings that have tags.
def revive_hash hash, o
o.children.each_slice(2) { |k,v|
key = accept(k)
val = accept(v)
# This is the important bit. If the value is a scalar,
# we replace it with the desired hash.
if v.is_a? ::Psych::Nodes::Scalar
val = { "value" => val, "line" => v.line + 1} # line is 0 based, so + 1
end
# Code dealing with << (for merging hashes) omitted.
# If you need this you will probably need to copy it
# in here. See the method:
# https://github.com/tenderlove/psych/blob/v2.0.13/lib/psych/visitors/to_ruby.rb#L333-L365
hash[key] = val
}
hash
end
end
yaml = get_yaml_from_wherever
# Put it all together
handler = LineNumberHandler.new
parser = Psych::Parser.new(handler)
# Provide the handler with a reference to the parser
handler.parser = parser
# The actual parsing
parser.parse yaml
# We patched ToRuby rather than inherit so we can use to_ruby here
puts handler.root.to_ruby
我建议您选择@matt 的解决方案。除了更谨慎之外,它还能正确处理标量。
技巧可能是 monkeypatch TreeBuilder#scalar
方法:
y='
en:
errors:
# Some comment
format: "%{attribute} %{message}"
# One more comment
messages:
"1": "Message 1"
"2": "Message 2"
long_error_message: |
This is a
multiline message
date:
format: "YYYY-MM-DD"'
require 'yaml'
yphc = Class.new(YAML.parser.handler.class) do
def scalar value, anchor, tag, plain, quoted, style
value = { value: value, line: $line } if style > 1
$line = $parser.mark.line + 1 # handle multilines properly
super value, anchor, tag, plain, quoted, style
end
end
$parser = Psych::Parser.new(yphc.new)
# more careful handling required for multidocs
result = $parser.parse(y).handler.root.to_ruby[0]
实际上,我们快完成了。唯一剩下的就是留下带有行号 的补丁值,只留下 。我故意没有把这个逻辑放在解析内容中。
def unmark_keys hash
hash.map do |k,v|
[k.is_a?(Hash) ? k[:value] : k, v.is_a?(Hash) ? unmark_keys(v) : v]
end.to_h
end
p unmark_keys result
#⇒ {"en"=>
#⇒ {"errors"=>
#⇒ {
#⇒ "format"=>{:value=>"%{attribute} %{message}", :line=>4},
#⇒ "messages"=>
#⇒ {
#⇒ "1"=>{:value=>"Message 1", :line=>8},
#⇒ "2"=>{:value=>"Message 2", :line=>9}
#⇒ }
#⇒ },
#⇒ "long_error_message"=>{
#⇒ :value=>"This is a\nmultiline message\n", :line=>11
#⇒ },
#⇒ "date"=>{"format"=>{:value=>"YYYY-MM-DD", :line=>16}}
#⇒ }
#⇒ }
当然有人可能想要摆脱全局变量等。我试图保持核心实现尽可能干净。
我们开始吧。希望对你有帮助。
UPD 感谢@matt,上面的代码在标量上失败了:
key1:
val1
key2: val2
YAML 允许使用此语法,但上述方法无法正确处理它。不会为此返回任何行。除了无理取闹的标量支持之外,还可以正确报告其他任何行,请参阅对此答案的评论以获取更多详细信息。
我采用了@matt 的解决方案并创建了一个不需要手动修补的版本。它还处理跨越多行的值和 YAML 的 <<
运算符。
require "psych"
require "pp"
ValueWithLineNumbers = Struct.new(:value, :lines)
class Psych::Nodes::ScalarWithLineNumber < Psych::Nodes::Scalar
attr_reader :line_number
def initialize(*args, line_number)
super(*args)
@line_number = line_number
end
end
class Psych::TreeWithLineNumbersBuilder < Psych::TreeBuilder
attr_accessor :parser
def scalar(*args)
node = Psych::Nodes::ScalarWithLineNumber.new(*args, parser.mark.line)
@last.children << node
node
end
end
class Psych::Visitors::ToRubyWithLineNumbers < Psych::Visitors::ToRuby
def visit_Psych_Nodes_ScalarWithLineNumber(node)
visit_Psych_Nodes_Scalar(node)
end
private
def revive_hash(hash, node)
node.children.each_slice(2) do |k, v|
key = accept(k)
val = accept(v)
if v.is_a? Psych::Nodes::ScalarWithLineNumber
start_line = end_line = v.line_number + 1
if k.is_a? Psych::Nodes::ScalarWithLineNumber
start_line = k.line_number + 1
end
val = ValueWithLineNumbers.new(val, start_line..end_line)
end
if key == SHOVEL && k.tag != "tag:yaml.org,2002:str"
case v
when Psych::Nodes::Alias, Psych::Nodes::Mapping
begin
hash.merge! val
rescue TypeError
hash[key] = val
end
when Psych::Nodes::Sequence
begin
h = {}
val.reverse_each do |value|
h.merge! value
end
hash.merge! h
rescue TypeError
hash[key] = val
end
else
hash[key] = val
end
else
hash[key] = val
end
end
hash
end
end
# Usage:
handler = Psych::TreeWithLineNumbersBuilder.new
handler.parser = Psych::Parser.new(handler)
handler.parser.parse(yaml)
ruby_with_line_numbers =
Psych::Visitors::ToRubyWithLineNumbers.create.accept(handler.root)
pp ruby_with_line_numbers
我发布了 gist of the above 以及一些评论和示例