有什么方法可以从 csv 文件中获取特定列?
is there any ways to get specific column from csv file?
大家好,我刚刚开始学习如何使用 C++ 进行 csv 文件管理,目前这段代码有效。它可以打印出 'math' 列。
但这只是当我使用 getline(ss,#any column variable#, ',') 分配每一列时
然后我打印出我想要的专栏。但是如果我将它用于一个大列表,可以说一个包含大约 100 列的 csv 文件。那么,我该如何简化呢?还是有什么方法可以让我只获取特定的列,而不需要 assigning/parsing 每个变量的每个列?假设从 100 列开始,我只想要第 47 列有任何可能的名称?或者我可以通过名称获取该列?
谢谢。
这是一个快速的[工作]示例。
- 第一部分读入table。
- 第二部分(在
fin.close()
之后)让您选择要打印的内容(或您选择用它做什么)。
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
#include <algorithm> //std::find
using namespace std;
int main(int argc, char** argv)
{
ifstream fin("filename");
string line;
int rowCount=0;
int rowIdx=0; //keep track of inserted rows
//count the total nb of lines in your file
while(getline(fin,line)){
rowCount++;
}
//this will be your table. A row is represented by data[row_number].
//If you want to access the name of the column #47, you would
//cout << data[0][46]. 0 being the first row(assuming headers)
//and 46 is the 47 column.
//But first you have to input the data. See below.
vector<string> data[rowCount];
fin.clear(); //remove failbit (ie: continue using fin.)
fin.seekg(fin.beg); //rewind stream to start
while(getline(fin,line)) //for every line in input file
{
stringstream ss(line); //copy line to stringstream
string value;
while(getline(ss,value,’,’)){ //for every value in that stream (ie: every cell on that row)
data[rowIdx].push_back(value);//add that value at the end of the current row in our table
}
rowIdx++; //increment row number before reading in next line
}
}
fin.close();
//Now you can choose to access the data however you like.
//If you want to printout only column 47...
int colNum=47; //set this number to the column you want to printout
for(int row=0; row<rowCount; row++)
{
cout << data[row][colNum] << "\t"; //print every value in column 47 only
}
cout << endl
return 0;
}
编辑:添加这个以获得更完整的答案。
要按名称搜索列,请用此代码段替换最后一个 for 循环
//if you want to look up a column by name, instead of by column number...
//Use find on that row to get its column number.
//Than you can printout just that column.
int colNum;
string colName = "computer science";
//1.Find the index of column name "computer science" on the first row, using iterator
//note: if "it == data[0].end()", it means that that column name was not found
vector<string>::iterator it = find(data[0].begin(), data[0].end(),colName);
//calulate its index (ie: column number integer)
colNum = std::distance(data[0].begin(), it);
//2. Print the column with the header "computer science"
for(int row=0; row<rowCount; row++)
{
cout << data[row][colNum] << "\t"; //print every value in column 47 only
}
cout << endl
return 0;
}
or is there any ways for me to only get specific column only without assigning/parsing each column to each variable?
使用 CSV 格式来避免 阅读 每一列并不实用,所以您真正想要做的基本上只是 丢弃您不想要的列,就像您已经在做的那样。
要使其与未知数量的列一起使用,您可以读入一个 std::vector
,它基本上是一个动态大小的数组,对于这种情况非常有用。
std::vector<std::string> read_csv_line(const std::string &line)
{
std::vector<std::string> ret;
std::string val;
std::stringstream ss(line);
while (std::getline(ss, val, ','))
ret.push_back(std::move(val));
return ret;
}
...
std::getline(is, line);
auto row = read_csv_line(line);
if (row.size() > 10) // Check each row is expected size!
std::cout << row[0] << ", " << row[10] << std::endl;
else std::cerr << "Row too short" << std::endl;
然后您可以访问所需的特定列。
or maybe i could get the column by its name?
假设您的 CSV 文件有一个 header 行,您可以将其读入一个 std::unordered_map<std::string, size_t>
中,其中值是列索引。或者像 std::vector
和 std::find
.
请注意,单次 std::getline
.
无法处理引用值和其他一些可能的 CSV 功能
大家好,我刚刚开始学习如何使用 C++ 进行 csv 文件管理,目前这段代码有效。它可以打印出 'math' 列。
但这只是当我使用 getline(ss,#any column variable#, ',') 分配每一列时 然后我打印出我想要的专栏。但是如果我将它用于一个大列表,可以说一个包含大约 100 列的 csv 文件。那么,我该如何简化呢?还是有什么方法可以让我只获取特定的列,而不需要 assigning/parsing 每个变量的每个列?假设从 100 列开始,我只想要第 47 列有任何可能的名称?或者我可以通过名称获取该列?
谢谢。
这是一个快速的[工作]示例。
- 第一部分读入table。
- 第二部分(在
fin.close()
之后)让您选择要打印的内容(或您选择用它做什么)。
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
#include <algorithm> //std::find
using namespace std;
int main(int argc, char** argv)
{
ifstream fin("filename");
string line;
int rowCount=0;
int rowIdx=0; //keep track of inserted rows
//count the total nb of lines in your file
while(getline(fin,line)){
rowCount++;
}
//this will be your table. A row is represented by data[row_number].
//If you want to access the name of the column #47, you would
//cout << data[0][46]. 0 being the first row(assuming headers)
//and 46 is the 47 column.
//But first you have to input the data. See below.
vector<string> data[rowCount];
fin.clear(); //remove failbit (ie: continue using fin.)
fin.seekg(fin.beg); //rewind stream to start
while(getline(fin,line)) //for every line in input file
{
stringstream ss(line); //copy line to stringstream
string value;
while(getline(ss,value,’,’)){ //for every value in that stream (ie: every cell on that row)
data[rowIdx].push_back(value);//add that value at the end of the current row in our table
}
rowIdx++; //increment row number before reading in next line
}
}
fin.close();
//Now you can choose to access the data however you like.
//If you want to printout only column 47...
int colNum=47; //set this number to the column you want to printout
for(int row=0; row<rowCount; row++)
{
cout << data[row][colNum] << "\t"; //print every value in column 47 only
}
cout << endl
return 0;
}
编辑:添加这个以获得更完整的答案。
要按名称搜索列,请用此代码段替换最后一个 for 循环
//if you want to look up a column by name, instead of by column number...
//Use find on that row to get its column number.
//Than you can printout just that column.
int colNum;
string colName = "computer science";
//1.Find the index of column name "computer science" on the first row, using iterator
//note: if "it == data[0].end()", it means that that column name was not found
vector<string>::iterator it = find(data[0].begin(), data[0].end(),colName);
//calulate its index (ie: column number integer)
colNum = std::distance(data[0].begin(), it);
//2. Print the column with the header "computer science"
for(int row=0; row<rowCount; row++)
{
cout << data[row][colNum] << "\t"; //print every value in column 47 only
}
cout << endl
return 0;
}
or is there any ways for me to only get specific column only without assigning/parsing each column to each variable?
使用 CSV 格式来避免 阅读 每一列并不实用,所以您真正想要做的基本上只是 丢弃您不想要的列,就像您已经在做的那样。
要使其与未知数量的列一起使用,您可以读入一个 std::vector
,它基本上是一个动态大小的数组,对于这种情况非常有用。
std::vector<std::string> read_csv_line(const std::string &line)
{
std::vector<std::string> ret;
std::string val;
std::stringstream ss(line);
while (std::getline(ss, val, ','))
ret.push_back(std::move(val));
return ret;
}
...
std::getline(is, line);
auto row = read_csv_line(line);
if (row.size() > 10) // Check each row is expected size!
std::cout << row[0] << ", " << row[10] << std::endl;
else std::cerr << "Row too short" << std::endl;
然后您可以访问所需的特定列。
or maybe i could get the column by its name?
假设您的 CSV 文件有一个 header 行,您可以将其读入一个 std::unordered_map<std::string, size_t>
中,其中值是列索引。或者像 std::vector
和 std::find
.
请注意,单次 std::getline
.