使用 C++ 比较 XML 中的变化

Question

我有两个大的 XML 文件，它们具有相同的架构但不同的条目。条目每天都在变化，我希望能够找到：

条目出现在文件 A 而不是文件 B
条目出现在文件 B 而不是文件 A
条目出现在文件 A 和 B 中

我是编程新手，我很难理解解决这个问题的有效方法。使用（数万亿个）循环是解决这个问题的关键吗？

示例缩短 XML 文件：

<?xml version="1.0" encoding="ISO-8859-1" ?>
<site_entries>
<entry>
<id><![CDATA[946757316]]></id>
<url><![CDATA[http://www.site.co.uk/cgi-bin/tr.cgi?tid=752276]]></url>
<content><![CDATA[Specialized Dolce Sport 27 Speed]]></content>
<title><![CDATA[Bike]]></title>
<price><![CDATA[£600]]></price>
<date><![CDATA[01-AUG-13]]></date>
<display_reference><![CDATA[214683-50142933_370647]]></display_reference>
<location><![CDATA[City of London]]></location>
<category><![CDATA[Bike]]></category>
</entry>
<entry>
<id><![CDATA[90007316]]></id>
<url><![CDATA[http://www.site.co.uk/cgi-bin/tr.cgi?tid=70952276]]></url>
<content><![CDATA[Giant Sport Offroad Bike]]></content>
<title><![CDATA[Bike]]></title>
<price><![CDATA[£100]]></price>
<date><![CDATA[11-AUG-15]]></date>
<display_reference><![CDATA[2146433-50142933_370647]]></display_reference>
<location><![CDATA[City of London]]></location>
<category><![CDATA[Bike]]></category>
</entry>
</site_entries>

编辑：我不能指望整个文件的顺序正确。

Answer 1

这是一个使用 pugixml 的示例。

出于测试目的，XML 文件存储在 std::istringstream 个对象中，可以用 std::ifstream 个对象替换以从文件中读取。

#include <set>
#include <string>
#include <sstream>
#include <iostream>
#include <algorithm>

#include "pugixml.hpp"

#define con(m) std::cout << m << '\n'
#define err(m) std::cerr << m << std::endl

std::istringstream iss_a(R"~(<?xml version="1.0" encoding="ISO-8859-1" ?>
<site_entries>
<entry>
<id><![CDATA[1]]></id>
</entry>
<entry>
<id><![CDATA[2]]></id>
</entry>
</site_entries>)~");

std::istringstream iss_b(R"~(<?xml version="1.0" encoding="ISO-8859-1" ?>
<site_entries>
<entry>
<id><![CDATA[2]]></id>
</entry>
<entry>
<id><![CDATA[3]]></id>
</entry>
</site_entries>)~");

using str_set = std::set<std::string>;

int main()
{
    pugi::xml_document doc;

    str_set a;
    doc.load(iss_a); // use doc.load_file() in real code

    // fill set a with just the ids from file a
    for(auto&& node: doc.child("site_entries").children("entry"))
        a.emplace(node.child("id").text().as_string());

    str_set b;
    doc.load(iss_b);

    // fill set b with just the ids from file b
    for(auto&& node: doc.child("site_entries").children("entry"))
        b.emplace(node.child("id").text().as_string());

    // now use the <algorithms> library

    str_set b_from_a;
    std::set_difference(a.begin(), a.end(), b.begin(), b.end()
        , std::inserter(b_from_a, b_from_a.begin()));

    str_set a_from_b;
    std::set_difference(b.begin(), b.end(), a.begin(), a.end()
        , std::inserter(a_from_b, a_from_b.begin()));

    str_set a_and_b;
    std::set_intersection(a.begin(), a.end(), b.begin(), b.end()
        , std::inserter(a_and_b, a_and_b.begin()));

    for(auto&& v: a)
        con("a       : " << v);

    con("");

    for(auto&& v: b)
        con("b       : " << v);

    con("");

    for(auto&& v: b_from_a)
        con("b_from_a: " << v);

    con("");

    for(auto&& v: a_from_b)
        con("a_from_b: " << v);

    con("");

    for(auto&& v: a_and_b)
        con("a_and_b : " << v);

    con("");
}

输出：

a       : 1
a       : 2

b       : 2
b       : 3

b_from_a: 1

a_from_b: 3

a_and_b : 2

参考文献：

std::set_difference

std::set_intersection

使用 C++ 比较 XML 中的变化

Use C++ to compare changes in XML

c++

xml

diff

xml-parsing