从 C++ 无序集中高效提取元素

Question

在 C++ 中，假设您有一个无序的字符串集 (https://en.cppreference.com/w/cpp/container/unordered_set) - 有没有一种方法可以有效地从该集合中提取满足特定条件的所有字符串（例如，找到集合中以开头的所有字符串with letter "a") 使用的方法不是使用 for 循环遍历整个集合并检查每个字符串的第一个字符？

Answer 1

不要使用无序集，而是将您的数据结构调整为类似 trie 的结构。在这种情况下，它可能对您更有用。

更多详情请查看：https://en.wikipedia.org/wiki/Trie

实施：https://www.geeksforgeeks.org/trie-insert-and-search/.

根据您的需要，您可能会想到一些其他算法，如 Aho-Corasick/Suffix-Arrays 等。您可能需要根据您拥有的数据量对所需的数据结构进行一些研究，重新计算您需要的查询数量以及您执行的查询数量。

希望对您有所帮助。

Answer 2

对于任何条件，这是不可能的，请参阅了解更多详情。

根据您的其他需求，排序 std::vector 很可能 是提取部分最有效的方法独自的。使用像 std::lower_bound 这样的算法来处理 排序的 std::vector。最后，您的实际用例决定了总体上哪个容器最适合性能 - 尽管 std::vector 接近于一刀切地考虑性能（这是因为所有 连续存储的内部 优化）。

话虽这么说，但通常建议使用看起来最适合手头问题的容器，并且只有在存在实际性能瓶颈时才进行巧妙的优化。

Answer 3

对于 任何条件 的一般情况，没有比遍历每个元素更好的方法了。

每个容器都有特定的标准，它可以做得更好，例如

std::set<std::string> strings = /* something */;
auto first = strings.lower_bound("a"); // O(log(strings)), "a" is the least string that starts with 'a'
auto last = strings.lower_bound("b"); // O(log(strings)), "b" is the first string after those that start with 'a'
strings.erase(first, last); // O(log(strings) + distance(first, last)), all the strings starting with 'a' are removed

这里我们删除以 'a' 开头的元素，复杂度为 O(log(strings) + distance(first, last))，这是对迭代所有元素的 O(alphabet) 改进。

或者更做作的

std::unordered_set<std::string> strings = /* something */;
auto hashed = strings.hash_function()("Any collision will do"); // O(1)
strings.erase(strings.begin(hashed), strings.end(hashed)); // O(distance(first, last))

这里我们去掉散列与"Any collision will do"相同的元素，复杂度为O(distance(first, last))

从 C++ 无序集中高效提取元素

efficient extraction of elements from a C++ unordered set

c++

set

unordered-set