XML 到 CSV ruby
XML to CSV ruby
我有多个 XML 示例文件,我想将它们转换为 CSV,但是对于不同的 XML 文件会有多个不同的 attributes/nodes,因此我不想硬编码不同的属性。我希望输出显示列 header 作为第一行,然后每个 node/record 像传统的列和行电子表格一样垂直显示。
这是一个示例 XML:
<?xml version="1.0" encoding="UTF-8"?>
<sd:root xmlns:wd="urn:com.sample/bsvc" sd:version="v31.0">
<sd:Put_Job_Profile_Request sd:Add_Only="0">
<sd:Job_Profile_Data>
<sd:Job_Code>30000</sd:Job_Code>
<sd:Effective_Date>1900-01-01</sd:Effective_Date>
<sd:Job_Profile_Basic_Data>
<sd:Job_Title>Chief Executive Officer</sd:Job_Title>
</sd:Job_Profile_Basic_Data>
</sd:Job_Profile_Data>
</sd:Put_Job_Profile_Request>
<sd:Put_Job_Profile_Request sd:Add_Only="0">
<sd:Job_Profile_Data>
<sd:Job_Code>30100</sd:Job_Code>
<sd:Effective_Date>1900-01-01</sd:Effective_Date>
<sd:Job_Profile_Basic_Data>
<sd:Job_Title>Administrator Job Profile</sd:Job_Title>
</sd:Job_Profile_Basic_Data>
</sd:Job_Profile_Data>
</sd:Put_Job_Profile_Request>
<sd:Put_Job_Profile_Request sd:Add_Only="0">
<sd:Job_Profile_Data>
<sd:Job_Code>30200</sd:Job_Code>
<sd:Effective_Date>1900-01-01</sd:Effective_Date>
<sd:Job_Profile_Basic_Data>
<sd:Inactive>0</sd:Inactive>
<sd:Job_Title>Facilities & Grounds Maintenance Attendant</sd:Job_Title>
<sd:Include_Job_Code_in_Name>0</sd:Include_Job_Code_in_Name>
<sd:Job_Profile_Private_Title>Maintenance Job Title</sd:Job_Profile_Private_Title>
<sd:Job_Profile_Summary>Maintain cleanliness of the campus building throughout the day and fulfill special requests as needed.</sd:Job_Profile_Summary>
<sd:Job_Description><p>Job Description<b> rich text!</b></p></sd:Job_Description>
<sd:Additional_Job_Description><p><b><i><span class="emphasis-2"><u>Additional</u></span></i></b> Job Description<b> rich text!</b></p></sd:Additional_Job_Description>
<sd:Work_Shift_Required>0</sd:Work_Shift_Required>
<sd:Public_Job>1</sd:Public_Job>
</sd:Job_Profile_Basic_Data>
</sd:Job_Profile_Data>
</sd:Put_Job_Profile_Request>
<sd:Put_Job_Profile_Request sd:Add_Only="0">
<sd:Job_Profile_Data>
<sd:Job_Code>30300</sd:Job_Code>
<sd:Effective_Date>1900-01-01</sd:Effective_Date>
<sd:Job_Profile_Basic_Data>
<sd:Inactive>0</sd:Inactive>
<sd:Job_Title>Sample_Job_Title</sd:Job_Title>
<sd:Include_Job_Code_in_Name>0</sd:Include_Job_Code_in_Name>
<sd:Job_Profile_Summary>Sample Job Profile Summary</sd:Job_Profile_Summary>
<sd:Job_Description>Sample Job Description</sd:Job_Description>
<sd:Additional_Job_Description>Sample Additional Job Description</sd:Additional_Job_Description>
<sd:Work_Shift_Required>1</sd:Work_Shift_Required>
</sd:Job_Profile_Basic_Data>
</sd:Job_Profile_Data>
</sd:Put_Job_Profile_Request>
</sd:root>
我使用的代码却出现错误:
require 'csv'
require 'nokogiri'
file = File.read('jobProfile.xml')
doc = Nokogiri::XML(file)
a = []
CSV.open('xmloutput.csv', 'wb') do |csv|
csv << doc.at('.').search('*').map(&:name)
doc.search('.').each do |x|
csv << x.search('*').map(&:text)
end
end
每组记录的列 headers 和数据水平设置。但我想迭代数据并保留一行 headers 列。如果不对每个属性进行硬编码,我不确定如何做到这一点:/
请帮忙,因为我还是编程新手,我已经尝试了一个星期来寻找解决方案:(
screenshot showing the csv output
您需要先构建一个哈希数组并将键提取为headers,然后将值放在右列中,所有节点展平为列,忽略根键和记录键。
像这样
require 'nokogiri'
require 'set'
file = File.read('jobProfile.xml')
doc = Nokogiri::XML(file)
record = {}
keys = Set.new
records = []
csv = ""
doc.traverse do |node|
value = node.text.gsub(/\n +/, '')
if node.name
if node.name != "text" # skip these nodes
if value.length > 0 # skip empty nodes
key = node.name.gsub(/sd:/,'').to_sym
# if a new and not empty record, add to our records collection
if key == :Job_Profile_Data && !record.empty?
records << record
record = {}
elsif key[/Job_Profile|^root$|^document$/]
# neglect these keys
else
key = node.name.gsub(/sd:/,'').to_sym
# in case our value is html instead of text
record[key] = Nokogiri::HTML.parse(value).text
# add to our key set only if not allready in the set
keys << key
end
end
end
end
end
# build our csv
File.open('./xmloutput.csv', 'w') do |file|
file.puts %Q{"#{keys.to_a.join('","')}"}
records.each do |record|
keys.each do |key|
file.write %Q{"#{record[key]}",}
end
file.write "\n"
end
end
在我们的 csv 文件中给出以下内容
"Job_Code","Effective_Date","Job_Title","Inactive","Include_Job_Code_in_Name","Job_Description","Additional_Job_Description","Work_Shift_Required","Public_Job"
"30000","1900-01-01","Chief Executive Officer","","","","","","",
"30100","1900-01-01","Administrator Job Profile","","","","","","",
"30200","1900-01-01","Facilities & Grounds Maintenance Attendant","0","0","Job Description rich text!","Additional Job Description rich text!","0","1",
"30300","1900-01-01","Sample_Job_Title","0","0","Sample Job Description","Sample Additional Job Description","1","",
我有多个 XML 示例文件,我想将它们转换为 CSV,但是对于不同的 XML 文件会有多个不同的 attributes/nodes,因此我不想硬编码不同的属性。我希望输出显示列 header 作为第一行,然后每个 node/record 像传统的列和行电子表格一样垂直显示。 这是一个示例 XML:
<?xml version="1.0" encoding="UTF-8"?>
<sd:root xmlns:wd="urn:com.sample/bsvc" sd:version="v31.0">
<sd:Put_Job_Profile_Request sd:Add_Only="0">
<sd:Job_Profile_Data>
<sd:Job_Code>30000</sd:Job_Code>
<sd:Effective_Date>1900-01-01</sd:Effective_Date>
<sd:Job_Profile_Basic_Data>
<sd:Job_Title>Chief Executive Officer</sd:Job_Title>
</sd:Job_Profile_Basic_Data>
</sd:Job_Profile_Data>
</sd:Put_Job_Profile_Request>
<sd:Put_Job_Profile_Request sd:Add_Only="0">
<sd:Job_Profile_Data>
<sd:Job_Code>30100</sd:Job_Code>
<sd:Effective_Date>1900-01-01</sd:Effective_Date>
<sd:Job_Profile_Basic_Data>
<sd:Job_Title>Administrator Job Profile</sd:Job_Title>
</sd:Job_Profile_Basic_Data>
</sd:Job_Profile_Data>
</sd:Put_Job_Profile_Request>
<sd:Put_Job_Profile_Request sd:Add_Only="0">
<sd:Job_Profile_Data>
<sd:Job_Code>30200</sd:Job_Code>
<sd:Effective_Date>1900-01-01</sd:Effective_Date>
<sd:Job_Profile_Basic_Data>
<sd:Inactive>0</sd:Inactive>
<sd:Job_Title>Facilities & Grounds Maintenance Attendant</sd:Job_Title>
<sd:Include_Job_Code_in_Name>0</sd:Include_Job_Code_in_Name>
<sd:Job_Profile_Private_Title>Maintenance Job Title</sd:Job_Profile_Private_Title>
<sd:Job_Profile_Summary>Maintain cleanliness of the campus building throughout the day and fulfill special requests as needed.</sd:Job_Profile_Summary>
<sd:Job_Description><p>Job Description<b> rich text!</b></p></sd:Job_Description>
<sd:Additional_Job_Description><p><b><i><span class="emphasis-2"><u>Additional</u></span></i></b> Job Description<b> rich text!</b></p></sd:Additional_Job_Description>
<sd:Work_Shift_Required>0</sd:Work_Shift_Required>
<sd:Public_Job>1</sd:Public_Job>
</sd:Job_Profile_Basic_Data>
</sd:Job_Profile_Data>
</sd:Put_Job_Profile_Request>
<sd:Put_Job_Profile_Request sd:Add_Only="0">
<sd:Job_Profile_Data>
<sd:Job_Code>30300</sd:Job_Code>
<sd:Effective_Date>1900-01-01</sd:Effective_Date>
<sd:Job_Profile_Basic_Data>
<sd:Inactive>0</sd:Inactive>
<sd:Job_Title>Sample_Job_Title</sd:Job_Title>
<sd:Include_Job_Code_in_Name>0</sd:Include_Job_Code_in_Name>
<sd:Job_Profile_Summary>Sample Job Profile Summary</sd:Job_Profile_Summary>
<sd:Job_Description>Sample Job Description</sd:Job_Description>
<sd:Additional_Job_Description>Sample Additional Job Description</sd:Additional_Job_Description>
<sd:Work_Shift_Required>1</sd:Work_Shift_Required>
</sd:Job_Profile_Basic_Data>
</sd:Job_Profile_Data>
</sd:Put_Job_Profile_Request>
</sd:root>
我使用的代码却出现错误:
require 'csv'
require 'nokogiri'
file = File.read('jobProfile.xml')
doc = Nokogiri::XML(file)
a = []
CSV.open('xmloutput.csv', 'wb') do |csv|
csv << doc.at('.').search('*').map(&:name)
doc.search('.').each do |x|
csv << x.search('*').map(&:text)
end
end
每组记录的列 headers 和数据水平设置。但我想迭代数据并保留一行 headers 列。如果不对每个属性进行硬编码,我不确定如何做到这一点:/ 请帮忙,因为我还是编程新手,我已经尝试了一个星期来寻找解决方案:(
screenshot showing the csv output
您需要先构建一个哈希数组并将键提取为headers,然后将值放在右列中,所有节点展平为列,忽略根键和记录键。
像这样
require 'nokogiri'
require 'set'
file = File.read('jobProfile.xml')
doc = Nokogiri::XML(file)
record = {}
keys = Set.new
records = []
csv = ""
doc.traverse do |node|
value = node.text.gsub(/\n +/, '')
if node.name
if node.name != "text" # skip these nodes
if value.length > 0 # skip empty nodes
key = node.name.gsub(/sd:/,'').to_sym
# if a new and not empty record, add to our records collection
if key == :Job_Profile_Data && !record.empty?
records << record
record = {}
elsif key[/Job_Profile|^root$|^document$/]
# neglect these keys
else
key = node.name.gsub(/sd:/,'').to_sym
# in case our value is html instead of text
record[key] = Nokogiri::HTML.parse(value).text
# add to our key set only if not allready in the set
keys << key
end
end
end
end
end
# build our csv
File.open('./xmloutput.csv', 'w') do |file|
file.puts %Q{"#{keys.to_a.join('","')}"}
records.each do |record|
keys.each do |key|
file.write %Q{"#{record[key]}",}
end
file.write "\n"
end
end
在我们的 csv 文件中给出以下内容
"Job_Code","Effective_Date","Job_Title","Inactive","Include_Job_Code_in_Name","Job_Description","Additional_Job_Description","Work_Shift_Required","Public_Job"
"30000","1900-01-01","Chief Executive Officer","","","","","","",
"30100","1900-01-01","Administrator Job Profile","","","","","","",
"30200","1900-01-01","Facilities & Grounds Maintenance Attendant","0","0","Job Description rich text!","Additional Job Description rich text!","0","1",
"30300","1900-01-01","Sample_Job_Title","0","0","Sample Job Description","Sample Additional Job Description","1","",