给定一个字符串数组，如何删除重复项？

Question

我想知道如何从容器中删除重复的字符串，但忽略尾随标点符号的单词差异。

例如给定这些字符串：

Why do do we we here here?

我想得到这个输出：

Why do we here?

Answer 1

算法：

读字成功后，执行：
如果文件结束，退出。
如果单词列表为空，则推回单词。
否则开始
搜索单词列表。
如果单词不存在，则推回该单词。
否则结束（第 4 步）
结束（读单词时）

用 std::string 来表达你的意思。这允许您执行以下操作：

std::string word;
while (data_file >> word)
{
}

使用 std::vector 来包含您的文字（尽管您也可以使用 std::list）。 std::vector 动态增长，因此如果您选择了错误的大小，您不必担心重新分配。
要附加到 std::vector，请使用 push_back 方法。

要比较 std::string，请使用 operator==：

std::string new_word;
std::vector<std::string> word_list;
//...
if (word_list[index] == new_word)
{
  continue;
}

Answer 2

所以 you know how to tokenize a string. (If you don't spend some time here: ) 所以我假设我们得到一个 vector<string> foo，其中包含可能带有尾随标点符号的单词。

for(auto it = cbegin(foo); it != cend(foo); ++it) {
    if(none_of(next(it), cend(foo), [&](const auto& i) {
                                                         const auto finish = mismatch(cbegin(*it), cend(*it), cbegin(i), cend(i));
                                                         return (finish.first == cend(*it) || !isalnum(*finish.first)) && (finish.second == cend(i) || !isalnum(*finish.second));
                                                        })) {
        cout << *it << ' ';
    }
}

Live Example

这里值得注意的是，您没有为我们提供有关如何处理以下单词的规则："down"、"down-vote" 和 "downvote" 该算法假定 1^st2个相等。你也没有给我们如何处理的规则："Why do, do we we here, here?"这个算法总是returns最后的重复，所以输出将是"Why do we here?"

如果此算法做出的假设不完全符合您的喜好，请给我留言，我们会努力让您熟悉此代码，以便您可以根据需要进行调整。

给定一个字符串数组，如何删除重复项？

Given an Array of strings how do I Remove Duplicates?

c++

string

containers

duplicates

punctuation