C++ 在循环中使用 getline() 读取 CSV 文件

Question

我正在尝试读取包含 3 行的 CSV 文件 people/patients，其中第 1 列是用户 ID，第 2 列是 fname，第 3 列是 lname，第 4 列是保险，第 5 列是看起来像下面的版本。

编辑：抱歉，我只是 copy/pasted 我的 CSV 电子表格在这里，所以它之前没有显示逗号。它不会看起来更像下面的东西吗？下面的约翰还指出版本后没有逗号，这似乎解决了这个问题！非常感谢约翰！（试图弄清楚我如何接受你的回答:)）

nm92,Nate,Matthews,Aetna,1
sc91,Steve,Combs,Cigna,2
ml94,Morgan,Lands,BCBS,3

我正在尝试在循环中使用 getline() 来读取所有内容，它在第一次迭代中工作正常，但 getline() 似乎导致它在下一次迭代中跳过一个值。知道我该如何解决这个问题吗？

我也不确定为什么输出如下所示，因为我没有看到代码中打印了带 "sc91" 和 "ml94" 的行。这就是当前代码的输出结果。

userid is: nm92
fname is: Nate
lname is: Matthews
insurance is: Aetna
version is: 1
sc91
userid is: Steve
fname is: Combs
lname is: Cigna
insurance is: 2
ml94
version is: Morgan
userid is: Lands
fname is: BCBS
lname is: 3

insurance is:
version is:

我对 getline() 和 >> 流运算符之间的差异进行了大量研究，但大多数 getline() 材料似乎都围绕着从 cin 获取输入而不是像这里这样从文件中读取，所以我认为 getline() 正在发生某些事情，以及它如何读取我不理解的文件。不幸的是，当我尝试 >> 运算符时，这迫使我使用 strtok() 函数，并且我在处理 C 字符串并将它们分配给 C++ 字符串数组时遇到了很多困难。

#include <iostream>
#include <string>                               // for strings
#include <cstring>                              // for strtok()
#include <fstream>                              // for file streams

using namespace std;

struct enrollee
{
    string userid = "";
    string fname = "";
    string lname = "";
    string insurance = "";
    string version = "";
};

int main()
{
    const int ENROLL_SIZE = 1000;               // used const instead of #define since the performance diff is negligible,
    const int numCols = 5;                    // while const allows for greater utility/debugging bc it is known to the compiler ,
                                                // while #define is a preprocessor directive
    ifstream inputFile;                         // create input file stream for reading only
    struct enrollee enrollArray[ENROLL_SIZE];   // array of structs to store each enrollee and their respective data
    int arrayPos = 0;

    // open the input file to read
    inputFile.open("input.csv");
    // read the file until we reach the end
    while(!inputFile.eof())
    {
        //string inputBuffer;                         // buffer to store input, which will hold an entire excel row w/ cells delimited by commas
                                                    // must be a c string since strtok() only takes c string as input
        string tokensArray[numCols];
        string userid = "";
        string fname = "";
        string lname = "";
        string insurance = "";
        string sversion = "";
        //int version = -1;

        //getline(inputFile,inputBuffer,',');
        //cout << inputBuffer << endl;

        getline(inputFile,userid,',');
        getline(inputFile,fname,',');
        getline(inputFile,lname,',');
        getline(inputFile,insurance,',');
        getline(inputFile,sversion,',');

        enrollArray[0].userid = userid;
        enrollArray[0].fname = fname;
        enrollArray[0].lname = lname;
        enrollArray[0].insurance = insurance;
        enrollArray[0].version = sversion;

        cout << "userid is: " << enrollArray[0].userid << endl;
        cout << "fname is: " << enrollArray[0].fname << endl;
        cout << "lname is: " << enrollArray[0].lname << endl;
        cout << "insurance is: " << enrollArray[0].insurance << endl;
        cout << "version is: " << enrollArray[0].version << endl;
    }
}

Answer 1

这只是一个想法，但它可以帮助你。这是我正在从事的一个项目的一段代码：

std::vector<std::string> ARDatabase::split(const std::string& line, char delimiter)
{
    std::vector<std::string> tokens;
    std::string token;
    std::istringstream tokenStream(line);
    while (std::getline(tokenStream, token, delimiter))
    {
        tokens.push_back(token);
    }
    return tokens;
}

void ARDatabase::read_csv_map(std::string root_csv_map)
{
    qDebug() << "Starting to read the people database...";
    std::ifstream file(root_csv_map);
    std::string str;
    while (std::getline(file, str))
    {
        std::vector<std::string> tokens = split(str, ' ');
        std::vector<std::string> splitnames = split(tokens.at(1), '_');

        std::string name_w_spaces;
        for(auto i: splitnames) name_w_spaces = name_w_spaces + i + " ";

        people_names.insert(std::make_pair(stoi(tokens.at(0)), name_w_spaces));
        people_images.insert(std::make_pair(stoi(tokens.at(0)), std::string("database/images/" + tokens.at(2))));

    }
}

而不是 std::vector，您可能想要使用其他更适合您情况的容器。最后一个示例是针对我的案例的输入格式制作的。您可以轻松修改它以使其适应您的代码。

Answer 2

你的问题是每行最后一个数据项后面没有逗号，所以

 getline(inputFile,sversion,',');

是不正确的，因为它读取到下一个逗号，实际上是在下一位患者的用户 ID 之后的下一行。这解释了您看到的输出，其中下一个专利的用户 ID 与版本一起输出。

要解决此问题，只需将上面的代码替换为

 getline(inputFile,sversion);

这将根据需要读到行尾。

Answer 3

关于你的功能。如果查看源文件的结构，您会发现它包含 5 个字符串，以“,”分隔。所以一个典型的 CSV 文件。

调用 std::getline 将读取包含 5 个字符串的完整行。在您的代码中，您试图为每个字符串调用 std::getline，后跟一个逗号。最后一个字符串后不存在逗号。这是行不通的。您还应该使用 getline 来获取完整的行。

您需要阅读整行然后对其进行标记化。

我将向您展示如何使用 std::sregex_token_iterator 执行此操作的示例。那很简单。此外，我们将覆盖插入器和提取器操作符。有了它，您可以轻松读写 "enrollee" 数据，例如 Enrollee e{}; std::cout << e;

此外，我还使用 C++ 算法。这让生活变得非常轻松。 Input 和 Output 是一个 one-liner in main.

请看：

#include <iostream>
#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <regex>


struct Enrollee
{
    // Data
    std::string userid{};
    std::string fname{};
    std::string lname{};
    std::string insurance{};
    std::string version{};

    // Overload Extractor Operator to read data from somewhere
    friend std::istream& operator >> (std::istream &is, Enrollee& e) {
        std::vector<std::string> wordsInLine{};       // Here we will store all words that we read in onle line;
        std::string wholeLine;                        // Temporary storage for the complete line that we will get by getline
        std::regex separator("[ \;\,]"); ;          // Separator for a CSV file
        std::getline(is, wholeLine);                  // Read one complete line and split it into parts
        std::copy(std::sregex_token_iterator(wholeLine.begin(), wholeLine.end(), separator, -1), std::sregex_token_iterator(), std::back_inserter(wordsInLine));
        // If we have read all expted strings, then store them in our struct
        if (wordsInLine.size() == 5) {
            e.userid = wordsInLine[0];
            e.fname = wordsInLine[1];
            e.lname = wordsInLine[2];
            e.insurance = wordsInLine[3];
            e.version = wordsInLine[4];
        }
        return is;
    }

    // Overload Inserter operator. Insert data into output stream
    friend std::ostream& operator << (std::ostream& os, const Enrollee& e) {
        return os << "userid is:    " << e.userid << "\nfname is:     " << e.fname << "\nlname is:     " << e.lname << "\ninsurance is: " << e.insurance << "\nversion is:   " << e.version << '\n';
    }
};


int main()
{
    // Her we will store all Enrollee data in a dynamic growing vector
    std::vector<Enrollee> enrollmentData{};

    // Define inputFileStream and open the csv
    std::ifstream inputFileStream("r:\input.csv");

    // If we could open the file
    if (inputFileStream) {

        // Then read all csv data
        std::copy(std::istream_iterator<Enrollee>(inputFileStream), std::istream_iterator<Enrollee>(), std::back_inserter(enrollmentData));

        // For Debug Purposes: Print all data to cout
        std::copy(enrollmentData.begin(), enrollmentData.end(), std::ostream_iterator<Enrollee>(std::cout, "\n"));
    }
    else {
        std::cerr << "Could not open file 'input.csv'\n";
    }
}

这将读取包含

的输入文件"input.csv"

nm92,Nate,Matthews,Aetna,1
sc91,Steve,Combs,Cigna,2
ml94,Morgan,Lands,BCBS,3

并显示为输出：

userid is:    nm92
fname is:     Nate
lname is:     Matthews
insurance is: Aetna
version is:   1

userid is:    sc91
fname is:     Steve
lname is:     Combs
insurance is: Cigna
version is:   2

userid is:    ml94
fname is:     Morgan
lname is:     Lands
insurance is: BCBS
version is:   3

C++ 在循环中使用 getline() 读取 CSV 文件

C++ Using getline() inside loop to read in CSV file

c++

getline