有什么方法可以从 csv 文件中获取特定列?

is there any ways to get specific column from csv file?

大家好,我刚刚开始学习如何使用 C++ 进行 csv 文件管理,目前这段代码有效。它可以打印出 'math' 列。

但这只是当我使用 getline(ss,#any column variable#, ',') 分配每一列时 然后我打印出我想要的专栏。但是如果我将它用于一个大列表,可以说一个包含大约 100 列的 csv 文件。那么,我该如何简化呢?还是有什么方法可以让我只获取特定的列,而不需要 assigning/parsing 每个变量的每个列?假设从 100 列开始,我只想要第 47 列有任何可能的名称?或者我可以通过名称获取该列?

谢谢。

这是一个快速的[工作]示例。

  • 第一部分读入table。
  • 第二部分(在 fin.close() 之后)让您选择要打印的内容(或您选择用它做什么)。
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
#include <algorithm>  //std::find
using namespace std;
int main(int argc, char** argv)
{
   ifstream fin("filename");
   string line;
   int rowCount=0;
   int rowIdx=0; //keep track of inserted rows

   //count the total nb of lines in your file
   while(getline(fin,line)){
      rowCount++;
   }

   //this will be your table. A row is represented by data[row_number].
   //If you want to access the name of the column #47, you would
   //cout << data[0][46]. 0 being the first row(assuming headers)
   //and 46 is the 47 column.
   //But first you have to input the data. See below.
   vector<string> data[rowCount];

   fin.clear(); //remove failbit (ie: continue using fin.)
   fin.seekg(fin.beg); //rewind stream to start

   while(getline(fin,line)) //for every line in input file
   {
      stringstream ss(line);  //copy line to stringstream
      string value;
      while(getline(ss,value,’,’)){       //for every value in that stream (ie: every cell on that row)
         data[rowIdx].push_back(value);//add that value at the end of the current row in our table
      }
      rowIdx++;   //increment row number before reading in next line
   }
}
   fin.close();


   //Now you can choose to access the data however you like.
   //If you want to printout only column 47...

   int colNum=47;  //set this number to the column you want to printout

   for(int row=0; row<rowCount; row++)
   {
      cout << data[row][colNum] << "\t";  //print every value in column 47 only
   }
   cout << endl


   return 0;
}

编辑:添加这个以获得更完整的答案。

要按名称搜索列,请用此代码段替换最后一个 for 循环


   //if you want to look up a column by name, instead of by column number...
   //Use find on that row to get its column number.
   //Than you can printout just that column.
   int colNum;
   string colName = "computer science";

   //1.Find the index of column name "computer science" on the first row, using iterator
   //note: if "it == data[0].end()", it means that that column name was not found 
   vector<string>::iterator it = find(data[0].begin(), data[0].end(),colName);  

   //calulate its index (ie: column number integer)  
   colNum = std::distance(data[0].begin(), it);   

   //2. Print the column with the header "computer science"
   for(int row=0; row<rowCount; row++)
   {
      cout << data[row][colNum] << "\t";  //print every value in column 47 only
   }
   cout << endl

   return 0;
}


or is there any ways for me to only get specific column only without assigning/parsing each column to each variable?

使用 CSV 格式来避免 阅读 每一列并不实用,所以您真正想要做的基本上只是 丢弃您不想要的列,就像您已经在做的那样。

要使其与未知数量的列一起使用,您可以读入一个 std::vector,它基本上是一个动态大小的数组,对于这种情况非常有用。

std::vector<std::string> read_csv_line(const std::string &line)
{
    std::vector<std::string> ret;
    std::string val;
    std::stringstream ss(line);
    while (std::getline(ss, val, ','))
        ret.push_back(std::move(val));
    return ret;
}

...
std::getline(is, line);
auto row = read_csv_line(line);
if (row.size() > 10) // Check each row is expected size!
  std::cout << row[0] << ", " << row[10] << std::endl;
else std::cerr << "Row too short" << std::endl;

然后您可以访问所需的特定列。

or maybe i could get the column by its name?

假设您的 CSV 文件有一个 header 行,您可以将其读入一个 std::unordered_map<std::string, size_t> 中,其中值是列索引。或者像 std::vectorstd::find.


请注意,单次 std::getline.

无法处理引用值和其他一些可能的 CSV 功能