嵌套的 std::transform 效率低下吗?
Is a nested std::transform inefficient?
如果我有 std::string
:
std::string s{"hello"};
和一个就地修改它的循环,像这样:
for (auto &c: s)
c = std::toupper(c);
我可以用等效的 transform
:
替换它
std::transform(s.begin(), s.end(), s.begin(),
[](unsigned char c) -> unsigned char
{ return std::toupper(c); });
这些片段生成相同的 assembly. They also have similar performance。
但是,如果我有 std::vector<std::string>
:
std::vector<std::string> v {"hello", "how", "are", "you"};
并像这样就地修改它:
for (auto & s : v)
for (auto &c: s)
c = std::toupper(c);
等效变换应该是:
std::transform(std::begin(v), std::end(v), std::begin(v),
[](auto s) {
std::transform(std::begin(s), std::end(s), std::begin(s),
[](unsigned char c) -> unsigned char { return std::toupper(c); });
return s;
});
然而,transform
版本产生了一半以上 assembly, and performs 相应地很差,这让我感到惊讶。
std::transform
在这种情况下不是零成本抽象,还是我只是使用不当?
通过并return 引用所有内容。否则,您会制作该字符串的多个副本。注意变化:[](auto& s) -> std::string& {
std::transform(std::begin(v), std::end(v), std::begin(v),
[](auto& s) -> std::string& {
std::transform(std::begin(s), std::end(s), std::begin(s),
[](unsigned char c) -> unsigned char { return std::toupper(c); });
return s;
});
我向您的 link 添加了两个新的快速工作台功能。一种将输入字符串作为引用传递。另一个也是 return 的引用。即:
static void Transform2(benchmark::State& state) {
// Code before the loop is not measured
std::vector<std::string> v {"hello", "how", "are", "you"};
for (auto _ : state) {
std::transform(std::begin(v), std::end(v), std::begin(v),
[](auto& s) {
std::transform(std::begin(s), std::end(s), std::begin(s),
[](unsigned char c) -> unsigned char { return std::toupper(c); });
return s;
});
}
}
BENCHMARK(Transform2);
static void Transform3(benchmark::State& state) {
// Code before the loop is not measured
std::vector<std::string> v {"hello", "how", "are", "you"};
for (auto _ : state) {
std::transform(std::begin(v), std::end(v), std::begin(v),
[](auto& s) -> std::string& {
std::transform(std::begin(s), std::end(s), std::begin(s),
[](unsigned char c) -> unsigned char { return std::toupper(c); });
return s;
});
}
}
BENCHMARK(Transform3);
取决于我在 运行 基准测试时的幸运程度,Transform3 的性能几乎(有时等于)InPlace 测试实现。
如果我有 std::string
:
std::string s{"hello"};
和一个就地修改它的循环,像这样:
for (auto &c: s)
c = std::toupper(c);
我可以用等效的 transform
:
std::transform(s.begin(), s.end(), s.begin(),
[](unsigned char c) -> unsigned char
{ return std::toupper(c); });
这些片段生成相同的 assembly. They also have similar performance。
但是,如果我有 std::vector<std::string>
:
std::vector<std::string> v {"hello", "how", "are", "you"};
并像这样就地修改它:
for (auto & s : v)
for (auto &c: s)
c = std::toupper(c);
等效变换应该是:
std::transform(std::begin(v), std::end(v), std::begin(v),
[](auto s) {
std::transform(std::begin(s), std::end(s), std::begin(s),
[](unsigned char c) -> unsigned char { return std::toupper(c); });
return s;
});
然而,transform
版本产生了一半以上 assembly, and performs 相应地很差,这让我感到惊讶。
std::transform
在这种情况下不是零成本抽象,还是我只是使用不当?
通过并return 引用所有内容。否则,您会制作该字符串的多个副本。注意变化:[](auto& s) -> std::string& {
std::transform(std::begin(v), std::end(v), std::begin(v),
[](auto& s) -> std::string& {
std::transform(std::begin(s), std::end(s), std::begin(s),
[](unsigned char c) -> unsigned char { return std::toupper(c); });
return s;
});
我向您的 link 添加了两个新的快速工作台功能。一种将输入字符串作为引用传递。另一个也是 return 的引用。即:
static void Transform2(benchmark::State& state) {
// Code before the loop is not measured
std::vector<std::string> v {"hello", "how", "are", "you"};
for (auto _ : state) {
std::transform(std::begin(v), std::end(v), std::begin(v),
[](auto& s) {
std::transform(std::begin(s), std::end(s), std::begin(s),
[](unsigned char c) -> unsigned char { return std::toupper(c); });
return s;
});
}
}
BENCHMARK(Transform2);
static void Transform3(benchmark::State& state) {
// Code before the loop is not measured
std::vector<std::string> v {"hello", "how", "are", "you"};
for (auto _ : state) {
std::transform(std::begin(v), std::end(v), std::begin(v),
[](auto& s) -> std::string& {
std::transform(std::begin(s), std::end(s), std::begin(s),
[](unsigned char c) -> unsigned char { return std::toupper(c); });
return s;
});
}
}
BENCHMARK(Transform3);
取决于我在 运行 基准测试时的幸运程度,Transform3 的性能几乎(有时等于)InPlace 测试实现。