std::getline 部分读取第一行和最后一行并设置 eof-bit

std::getline partially reads first and last line and sets eof-bit

我需要用 C++ 阅读 csv-files:文件的第一行包含所有列标题,其余行包含浮点数据(以下示例,文件已缩小)。

几个文件有问题,我正在使用下面的代码

#include <iostream>
#include <fstream>
#include <string>

// Compiled and testen on with Clang++ on Ubuntu 14.04
int main(int argc, char** argv) {
    std::ifstream in;
    in.open(argv[1]);

    if(!in.is_open()) {
        std::cerr << "Cannot open file: " << argv[1] << "\n";
        return 1;
    }

    std::string buff;
    std::getline(in, buff);
    while(!in.eof()) {
        std::cout << buff << "\n";
        getline(in, buff);
    }

    in.close();
    return 0;
}

对于大多数文件来说,这运行良好,每次迭代读取一行; 'good' 文件示例:

Time,Smile,AU04,AU02,AU15,Trackerfail,AU18,AU09,negAU12,AU10,Expressive,Unilateral_LAU12,Unilateral_RAU12,AU14,Unilateral_LAU14,Unilateral_RAU14,AU05,AU17,AU26,Forward,Backward
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,33.333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,20.0
0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,33.333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,33.333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,16.667,0.0
58.3,50.0,0.0,0.0,0.0,33.333,0.0,0.0,0.0,0.0,100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
62.4,33.333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,20.0

一些文件变得疯狂并在第一个 getline 之后设置 eof-bit。第一次读取后,buff 包含第一行的一部分和最后一行的一部分; 'bad' 文件示例:

Time,Smile,AU04,AU02,AU15,Trackerfail,AU18,AU09,negAU12,AU10,Occlusion,Expressive,Unilateral_LAU12,Unilateral_RAU12,AU14,Unilateral_LAU14,Unilateral_RAU14,AU05,Au17,AU57,AU58
0,0,0,0,0,16.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0.3,0,0,0,0,33.333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1.3,0,0,0,0,16.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
57.9,66.667,0,0,0,66.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
60.3,33.333,0,0,0,66.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

buff 调用一次 getline 后的内容:

Time,Smile,AU04,AU02,AU15,Trackerfail,AU18,AU09,negAU12,AU10,Occlusion,Expressive,Unilateral_LAU12,Unilateral_RAU12,AU14,Unilateral_LAU14,Unilateral_RA60.3,33.333,0,0,0,66.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

如您所见,第一行与最后一行混在一起。我不知道出了什么问题。每行以 \n 结尾,文件以空 \n.

结尾

我想我的问题是:为什么 getline 跳到 end-of-file 而混合某些文件的第一行和最后一行而其他文件工作正常?

编辑: 我需要将大数据集转换为新的、更一致的格式。当前格式充满了不一致(使用 00.0AU17Au17)。不过,这些格式问题应该不会影响单纯的读取文件吧?

编辑2:

cat -v -e -t上好档:

Time,Smile,AU04,AU02,AU15,Trackerfail,AU18,AU09,negAU12,AU10,Expressive,Unilateral_LAU12,Unilateral_RAU12,AU14,Unilateral_LAU14,AU05,AU17,AU26,Forward,Backward^M$
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,66.667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0^M$
0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,33.333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0^M$
etc...

cat -v -e -t 文件错误:

Time,Smile,AU04,AU02,AU15,Trackerfail,AU18,AU09,negAU12,AU10,Occlusion,Expressive,Unilateral_LAU12,Unilateral_RAU12,AU14,Unilateral_LAU14,Unilateral_RAU14,AU05,Au17,AU57,AU58^M0,0,0,0,0,16.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M0.3,0,0,0,0,33.333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M1.3,0,0,0,0,16.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M1.4,0,0,0,0,33.333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M1.8,0,0,0,0,50,0,0,0,0,0,0,0,0,0,0,0,0,0,25,0^M2.8,0,0,0,0,50,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M3,0,0,0,0,33.333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M31,0,0,0,0,33.333,0,0,0,0,25,0,0,0,0,0,0,0,0,0,0^M31.1,0,0,0,0,50,0,0,0,0,50,0,0,0,0,0,0,0,0,0,0^M31.2,0,0,0,0,66.667,0,0,0,0,50,0,0,0,0,0,0,0,0,0,0^M31.4,0,0,33.333,0,66.667,0,0,0,0,50,0,0,0,0,0,0,0,0,0,0^M31.5,0,0,33.333,0,66.667,0,0,0,0,50,25,0,0,0,0,0,0,0,0,0^M32,0,0,33.333,0,66.667,0,0,0,0,50,25,0,0,0,0,0,0,0,0,25^M32.1,0,0,33.333,0,83.333,0,0,0,0,50,25,0,0,0,0,0,0,0,0,25^M32.2,0,0,33.333,0,83.333,0,0,0,0,25,25,0,0,0,0,0,0,0,0,25^M32.4,0,0,33.333,0,83.333,0,0,0,0,25,0,0,0,0,0,0,0,0,0,25^M32.7,0,0,33.333,0,83.333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,25^M33,0,0,33.333,0,83.333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M33.5,0,0,0,0,83.333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M33.9,0,0,0,0,66.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M55,33.333,0,0,0,66.667,0,0,0,0,0,25,0,0,0,0,0,0,0,0,0^M55.2,66.667,0,0,0,66.667,0,0,0,0,0,25,0,0,0,0,0,0,0,0,0^M55.8,100,0,0,0,66.667,0,0,0,0,0,25,0,0,0,0,0,0,0,0,0^M56.8,100,0,0,0,66.667,0,0,0,0,0,25,0,0,0,0,0,0,0,0,25^M57.4,66.667,0,0,0,66.667,0,0,0,0,0,25,0,0,0,0,0,0,0,0,25^M57.8,66.667,0,0,0,66.667,0,0,0,0,0,25,0,0,0,0,0,0,0,0,0^M57.9,66.667,0,0,0,66.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0^M60.3,33.333,0,0,0,66.667,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

好像差别很大,我该如何解决?

似乎文件缺少换行符,而只有回车-return 字符(等于 ^MCTRLM).

您可以通过对文件使用 cat 来修复它,并通过管道传输到 tr 以将回车符-return 转换为换行符:

$ cat your-file | tr '\r' '\n' > your-file-fixed

在看到您对来自 Max OS 的文件的评论后,我认为它是旧的 OSX 之前的版本,当 Mac [=26= 上的换行符]只是一节车厢-return.