从所有文件的一行中查找特定字符串

Question

我有一个文件，我想在其中搜索一行中的特定字符串，然后在包含 5000 行的整个文件中比较该字符串。所有与该字符串匹配的行都将被写入另一个文本文件中。到目前为止，我已经成功地从第一行获取了特定的字符串，并写下了所有与特定字符串匹配的行。以下是仅解决第一行问题的代码。

 #include <iostream>
 #include <fstream>

 using namespace std;

 //gets the specific string from first line
 string FirstLineSplitedString()
 {
ifstream infile;
infile.open("test.txt");
//prints first line
string fline;
string splited;
if (infile.good())
{
string sLine;
getline(infile, sLine);
fline = sLine;

//string starting from Cap900 till before .waa (specific string)
int first = fline.find('_');
int last = fline.find_last_of('.');
splited = fline.substr (first+1,last-first);

}
return splited;
}


 int main()
 {
string SString = FirstLineSplitedString();
ifstream  stream1("test.txt");
string line ;
ofstream stream2("output.txt");

while( std::getline( stream1, line ) )
{
if(line.find(SString) != string::npos)
    stream2 << line << endl;
 }


stream1.close();
stream2.close();
  return 0;
}

我不知道该怎么做：我不知道如何对所有文档执行此操作。我的意思是当我从第一行找到特定的字符串并写下所有与该字符串匹配的行时，如何转到下一行并执行相同的步骤并写下所有与彼此匹配的字符串的行。此外，如果没有匹配项，则只会将行本身写入文件。

例如：假设我有一个文件 test.txt，其中包含以下内容（以粗体显示）

aaaaaa _men 在这里。那里。等等
bbbb _men 在这里。那里。等等
aaaabbbbbbaa _from 来自。那里。等等
zzzzzzzz _from 来自。那里。等等
aaabbbbbbaaa _men 在这里。那里。等等
aabbbbaaaa _men 在这里。那里。等等
nnnnnnn _from 来自。那里。等等

当我运行代码时，我在 output.txt
中得到以下几行 aaaaaa _men 在这里。那里。等等
bbbb _men 在这里。那里。等等
aaabbbbbbaaa _men 在这里。那里。等等
aabbbbaaaa _men 在这里。那里。等等

这是正确的，因为我想 split 得到特定的 stringfrom(_)till last(.) 。现在我想把它放到与第一行不同的下一行并得到结果。下面是我想从 test.txt

实现的 output.txt

aaaaaa _men 在这里。那里。等等
bbbb _men 在这里。那里。等等
aaabbbbbbaaa _men 在这里。那里。等等
aabbbbaaaa _men 在这里。那里。等等

aaaabbbbbbaa 来自。那里。等等
zzzzzzzz _from 来自。那里。等等
nnnnnnn _from 来自。那里。等等

此模式应持续到文件的最后一行

抱歉写了这么久，但我想尽可能清楚。任何帮助将不胜感激。
也不要忘记匹配特定字符串的行可能在彼此下面或者可能在 2000 行之后。

Answer 1

我进行了新的更改，我认为现在有效并且最简单：

#include <iostream>
#include <fstream>
#include <vector>
#include <set>

using namespace std;    

int main()
{
    string line,splited;
    int current_line = 0, first = 0, last = 0;
    ifstream  stream1("test.txt");
    ofstream stream2("output.txt");

    //Set when I'm going to save those distinct keys (splited string)
    set<string>insertedKeys;
    vector<string>my_array;

    while( std::getline( stream1, line ) )
    {
        first = line.find('_');
        last = line.find_last_of('.');
        splited = line.substr (first+1,last-first);         
        insertedKeys.insert(splited);           
        my_array.insert(my_array.end(), line);
        //cout << line << endl;             
    }


    //Then, for each key in insertedKeys you're going to find and save in output.txt those lines that match against the current key
    std::set<string>::iterator it = insertedKeys.begin();   
    for (it ; it != insertedKeys.end(); ++it){
        string current_key = *it;
        for(int i=0 ; i< my_array.size() ; i++){
            line = my_array[i];
            if( line.find(current_key) != string::npos ){   
                stream2 << line << endl;    
            }
        }
        stream2 << " ----------------------------------------------------------------------- " << endl;     
    }
}

Answer 2

所以我认为您需要根据一些子字符串键对输入文件行进行分组。

最简单的方法是在读取文件时填充内存中的行组集合，然后在处理完整个输入后将组刷新到输出文件：

#include <iostream>
#include <string>
#include <fstream>
#include <deque>

using namespace std;

string findGroupKey(const string &line)
{
    size_t first = line.find('_');
    if (first == string::npos)
        first = 0;
    size_t last = line.find_last_of('.');
    size_t len = (last == string::npos ? string::npos : last - first + 1);
    // The key includes the start and end chars themselves in order to
    // distinguish lines like "x_test." and "xtest"
    return line.substr(first,len);
}

int main()
{
    // *** Var defs
    // Read the input file as stream
    ifstream inStream("test.txt");
    // line by line placing each just read line into inLine
    string inLine;
    // Place each inLine into its one group
    deque<deque<string> *> linesGrouped;
    // according to the grouping key
    deque<string> keys;

    // *** Read the input file and group the lines in memory collections
    while (getline(inStream, inLine)) {
        string groupKey = findGroupKey(inLine);

        // Find the key in our keys-met-so-far collection
        int keyIndex = -1;
        for (int i = 0, keyCount = (int)keys.size(); i < keyCount; i++)
            if (keys.at(i) == groupKey) {
                keyIndex = i;
                break;
            };

        if (keyIndex == -1) {
            // If haven't found the key so far, add it to our key index collection
            keys.push_back(groupKey);
            // and add a new group collection
            deque<string> *newGroup = new deque<string>();
            newGroup->push_back(inLine);
            linesGrouped.push_back(newGroup);
        } else {
            // Otherwise just add the line into its respective group
            linesGrouped.at(keyIndex)->push_back(inLine);
        }
    }

    // *** Write the groups into the output file
    ofstream outStream("output.txt");
    for (int i = 0, groupCount = (int)linesGrouped.size(); i < groupCount; i++) {
        for (int j = 0, lineCount = (int)linesGrouped.at(i)->size(); j < lineCount; j++)
            outStream << linesGrouped.at(i)->at(j) << endl;
        // Add a delimiter line (uncomment if you need one)
        //if (i < groupCount - 1)
        //  outStream << "-------------------" << endl;
    }
    return 0;
}

从所有文件的一行中查找特定字符串

Find a specific string from a line through all the file

c++

string

fstream

pattern-matching

ofstream