yaml-cpp 的主要性能问题

Major Performance Issues with yaml-cpp

所以我正在使用 yaml-cpp 以便能够在 c++ 中将 yaml 用于我的游戏数据文件,但是我 运行 遇到了一些主要的性能问题。

我想测试一个有点大的文件,所以我创建了一些虚拟数据来写出:

Player newPlayer = Player();
newPlayer.name = "new player";
newPlayer.maximumHealth = 1000;
newPlayer.currentHealth = 1;

Inventory newInventory;
newInventory.maximumWeight = 10.9f;

for (int z = 0; z < 10000; z++) {
  InventoryItem* newItem = new InventoryItem();
  newItem->name = "Stone";
  newItem->baseValue = 1;
  newItem->weight = 0.1f;

  newInventory.items.push_back(newItem);
}

YAML::Node newSavedGame;
newSavedGame["player"] = newPlayer;
newSavedGame["inventory"] = newInventory;

然后我编写了这个函数来获取数据并将其写入文件:

void YamlUtility::saveAsFile(YAML::Node node, std::string filePath) {
  std::ofstream myfile;

  myfile.open(filePath);
  myfile << node << std::endl;

  myfile.close();
}

现在,在我添加这段代码之前,我的游戏内存使用量约为 22MB。在我添加 newPlayernewInventoryInventoryItems 之后,它的大小达到了大约 23MB。然后当我在 YAML::Node newSavedGame 中添加时,内存上升到 108MB。另外,写出的文件只有 570KB,所以我想不出为什么它会将内存增加 85MB。

第二个问题是这段代码写入文件大约需要8秒。这对我来说有点不对劲。

我决定使用 YAML::Emitter 重写保存函数,代码如下所示:

static void buildYamlManually(std::ofstream& file, YAML::Node node) {
  YAML::Emitter out;
  out << YAML::BeginMap << YAML::Key << "player" << YAML::Value << YAML::BeginMap << YAML::Key << "name" << YAML::Value
      << node["player"]["name"].as<std::string>() << YAML::Key << "maximumHealth" << YAML::Value
      << node["player"]["maximumHealth"].as<int>() << YAML::Key << "currentHealth" << YAML::Value
      << node["player"]["currentHealth"].as<int>() << YAML::EndMap;

  out << YAML::BeginSeq;

  std::vector<InventoryItem*> items = node["inventory"]["items"].as<std::vector<InventoryItem*>>();

  for (InventoryItem* const value : items) {
    out << YAML::BeginMap << YAML::Key << "name" << YAML::Value << value->name << YAML::Key << "baseValue"
        << YAML::Value << value->baseValue << YAML::Key << "weight" << YAML::Value << value->weight << YAML::EndMap;
  }

  out << YAML::EndSeq;

  out << YAML::EndMap;

  file << out.c_str() << std::endl;
}

这似乎对性能影响很小,但保存文件的时间仍接近 7 秒(而不是 8 秒)。

然后我决定看看如果我完全不使用 yaml-cpp 手动编写文件会是什么样子,该代码如下所示:

static void buildYamlManually(std::ofstream& file, SavedGame savedGame) {
  file << "player: \n"
       << "  name: " << savedGame.player.name << "\n  maximumHealth: " << savedGame.player.maximumHealth
       << "\n  currentHealth: " << savedGame.player.currentHealth << "\ninventory:"
       << "\n  maximumWeight: " << savedGame.inventory.maximumWeight << "\n  items:";

  for (InventoryItem* const value : savedGame.inventory.items) {
    file << "\n    - name: " << value->name << "\n      baseValue: " << value->baseValue
         << "\n      weight: " << value->weight;
  }
}

删除此代码和所有 yaml-cpp 代码后,内存从 23MB 变为 24MB,写入文件大约需要 0.15 秒。

虽然我理解使用 yaml-cpp 与手动处理文件(就像文本一样)会产生一些开销,但这种性能差异似乎是错误的。

我想说我做错了什么,但根据 yaml-cpp 文档,我看不出那可能是什么。

您需要提供一个完整的示例来实际演示问题。我一直想尝试 yaml-cpp,所以今天早上我试图重现您的问题,但未能成功。使用下面与您提供的代码片段非常相似的代码,在我的 VM 中编写文件花费了大约 0.06 秒。看起来问题不是 yaml-cpp 固有的,而是您代码中的某个地方。

#include <string>
#include <vector>
#include <iostream>
#include <yaml-cpp/yaml.h>
#include <fstream>
#include <chrono>

class Player
{
    public:
        Player(const std::string& name, int maxHealth, int curHealth) : 
          m_name(name),
          m_maxHealth(maxHealth),
          m_currentHealth(curHealth) 
        {
        }

        const std::string& name() const     { return m_name;}
        int maxHealth() const               { return m_maxHealth; }
        int currentHealth() const           { return m_currentHealth; }

    private:
        const std::string m_name;
        int m_maxHealth;
        int m_currentHealth;
};

class Item
{
    public:
        Item(const std::string& name, int value, double weight) :
          m_name(name),
          m_value(value),
          m_weight(weight)
        {
        }

        const std::string& name() const     { return m_name; }
        int value() const                   { return m_value; }
        double maxWeight() const            { return m_weight; }

    private:
        const std::string m_name;
        int m_value;
        double m_weight;
};

class Inventory
{
    public:
        Inventory(double maxWeight) :
          m_maxWeight(maxWeight) 
        {
            m_items.reserve(10'000);
        }

        std::vector<Item>& items()              { return m_items;}
        const std::vector<Item>& items() const  { return m_items;}

        double maxWeight() const                { return m_maxWeight; };

    private:
        double m_maxWeight;
        std::vector<Item> m_items;
};

namespace YAML
{

    template<>
    struct convert<Inventory> 
    {
        static Node encode(const Inventory& rhs)
        {
            Node node;
            node.push_back(rhs.maxWeight());
            for(const auto& item : rhs.items())
            {
                node.push_back(item.name());
                node.push_back(item.value());
                node.push_back(item.maxWeight());
            }
            return node;
        }

        // TODO decode Inventory
    };


    template<>
    struct convert<Player> 
    {
        static Node encode(const Player& rhs)
        {
            Node node;
            node.push_back(rhs.name());
            node.push_back(rhs.maxHealth());
            node.push_back(rhs.currentHealth());
            return node;
        }

        //TODO Decode Player
    };

}

void saveAsFile(const YAML::Node& node, const std::string& filePath)
{
    std::ofstream myFile(filePath);

    myFile << node << std::endl;
}

int main(int arg, char **argv)
{
    Player newPlayer("new player", 1'000, 1);

    Inventory newInventory(10.9f);

    for(int z = 0; z < 10'000; z++)
    {
        newInventory.items().emplace_back("Stone", 1, 0.1f);
    }

    std::cout << "Inventory has " << newInventory.items().size() << " items\n";

    YAML::Node newSavedGame;
    newSavedGame["player"] = newPlayer;
    newSavedGame["inventory"] = newInventory;


    //Measure it 
    auto start = std::chrono::high_resolution_clock::now();

    saveAsFile(newSavedGame, "/tmp/save.yaml");

    auto end = std::chrono::high_resolution_clock::now();

    std::cout << "Wrote to file in " 
              << std::chrono::duration_cast<std::chrono::duration<double>>(end - start).count() 
              << " seconds\n";

    return 0;
}

输出:

user@mintvm ~/Desktop/yaml $ g++ -std=c++14 -o test main.cpp -lyaml-cpp
user@mintvm ~/Desktop/yaml $ ./test 
Inventory has 10000 items
Wrote to file in 0.0628495 second

更新编辑(来自 Michael Goldshteyn):

我想 运行 在本机而不是 VM 上执行此操作,以表明实际上上述代码在使用适当的优化、正确的时间和 运行 本机(即,不在 VM 中):

$ # yaml-cpp built from source commit: * c90c08cThu Aug 9 10:05:07 2018 -0500 
$ #   (HEAD -> master, origin/master, origin/HEAD)
$ #   Revert "Improvements to CMake buildsystem (#563)"
$ #  - Lib was built Release with flags: -std=c++17 -O3 -march=native -mtune=native
$ # Benchmark hardware info
$ # -----------------------
$ # CPU: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
$ # Kernel: 4.4.0-131-generic #157-Ubuntu SMP
$ # gcc: gcc (Debian 8.1.0-9) 8.1.0
$
$ # And away we go:
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
$ g++ -std=c++17 -O3 -march=native -mtune=native -o yamltest yamltest.cpp -lyaml-cpp
$ ./yamltest
Inventory has 10000 items    
After 100 saveAsFile() iterations, the average execution time
per iteration was 0.0521697 seconds.