一个特定的测试用例不会以某种方式通过测试

Question

代码的目的是基本上删除无用数组中存在于文本文件中的单词。我遇到了一个非常奇怪的问题，代码不会删除短语 'waiting on the shelf' 中的单词 'the'，但所有其他测试用例（很多）都通过了。有什么想法吗？

int main(){
    string useless[20] = { "an", "the" , "of", "to", "and", "but", "nor", "or", "some", "any", "very", "in", "on", "at", "before", "after", "into", "over", "through", "along"};

    ifstream fin("input.txt");
    if(fin.fail()){
        cout << "Input failed to open" << endl;
        exit(-1);
    }

    string line;
    getline(fin, line);
    getline(fin, line);
    getline(fin, line);
    getline(fin, line);

    ofstream fout("output.txt");

    while(getline(fin, line)){
        vector<string> vec;
        istringstream iss(line);
        while (iss) {
            string word;
            iss >> word;
            transform(word.begin(), word.end(), word.begin(), ::tolower);
            vec.push_back(word);
        }

        for(int i = 0; i < vec.size(); i++){
            for(int j = 0; j < 20; j++){
                if(vec[i] == useless[j]){
                    vec.erase(remove(vec.begin(), vec.end(), vec[i]), vec.end());
                }
            }
            fout << vec[i] << " ";
        }
        fout << endl;
    }
}

Answer 1

您在这里使用了不正确的迭代

 for(int i = 0; i < vec.size(); i++){
        for(int j = 0; j < 20; j++){
            if(vec[i] == useless[j]){
                vec.erase(remove(vec.begin(), vec.end(), vec[i]), vec.end());
            }
        }
        fout << vec[i] << " ";
    }
    fout << endl;
}

在此迭代之前，您有具有下一个值的向量：[waiting][on][the][shelf]。当 i == 1 你从向量中删除 "on"，这样你就有了下一个向量 [waiting][the][shelf]，但是 i 索引仍然等于 1，在下一次迭代中您跳过 "the" 字，因为最后一次擦除操作重组了您的向量并将 "the" 字移动到删除的 "on" 位置。

您可以使用 remove_if。例如：

 vec.erase(remove_if(vec.begin(), vec.end(), [&]( const string& str )
 {
     return std::find(begin(useless), end(useless), str ) != end(useless);
 }), vec.end());

之后你会得到过滤后的向量，useless数组中没有单词。

顺便说一句，我们可以优化它。上面的算法具有下一个复杂度：O(vec_size*useless_size)。我们可以将它优化到 O(vec_size) 而已。您可以使用散列集合来代替数组 (unordered_set) 它为您提供恒定的元素访问时间。

 unordered_set<string> useless = { "an", "the" , "of", "to", "and", "but", "nor", "or", "some", "any", "very", "in", "on", "at", "before", "after", "into", "over", "through", "along" };
 ...
 vec.erase(remove_if(vec.begin(), vec.end(), [&](const string& str)
 {
     return  useless.find(str) != useless.end();
 }), vec.end());

一个特定的测试用例不会以某种方式通过测试

A specific test case will not pass the test somehow

c++

string

vector

token

erase