lzma totalread 比 header 未压缩大小大 1

Question

我正在尝试使用 easylzma 库使用 lzma 解压缩文件，一些文件工作正常但随机文件不解压缩。
经过一些调试后，我发现 totalread 数比 header uncompressedSize 大 1，而且 header istreamed 为 0。
代码说没有页脚但是当我从总读取中减去 1 以跳过错误时，文件被正确解压缩但在文件末尾添加了一行，其中有几个字段为 0 和单个字段有值。
这些文件是杜高斯贝的.bi5。
我想确定错误是否是由于我使用的库中的某些错误逻辑引起的，或者是错误的文件，在这种情况下应该做什么。
使用的库是来自 github 的 easylzma-master 和 dukascopy-master，文件是从杜高斯贝服务器下载的。
13h_ticks.bi5 和 21_ticks.bi5 on 30 september 2020 "september is 8 " 的文件正好显示了这个问题。

更新：
我没有放代码，因为我现在正在询问指南，代码存在并且它显示 problem.but 它是库 code.so 我想知道是否有人对杜高斯贝 bi5 的这个特定文件有同样的问题type 和这个 lzma library.I 我现在只是在寻找一般规则“在 lzma 解压中我们什么时候得到 totalread 大于 header 未压缩大小重复 1 的行为？这是否意味着有页脚但在 header 字节中没有提及？？"

更新：
这就是我打开文件的方式

int HTTPRequest::read_bi5_main(boost::filesystem::path p, ptime epoch)
{
    boost::unique_lock<boost::mutex> read_bi5_to_bin_lock(mBOOST_LOGMutex,boost::defer_lock);
    boost::unique_lock<boost::mutex> read_bi5_to_bin_lock2(m_read_bi5_to_binMutex, boost::defer_lock);

    unsigned char *buffer;
    size_t buffer_size;

    int counter;

    size_t raw_size = 0;

    std::string filename_string = p.generic_string();
    path p2 = p;
    p2.replace_extension(".bin");
    std::string filename_string_to_bin =p2.generic_string() ;

    path p3 = p;
    p3.replace_extension(".csv");
    std::string filename_string_to_csv = p3.generic_string();

    const char *filename = filename_string.c_str();
    const char *filename_to_bin = filename_string_to_bin.c_str();
    const char *filename_to_csv = filename_string_to_csv.c_str();

    //22-9-2020 here I open the downloaded file if possible
    if (fs::exists(p) && fs::is_regular(p))
    {
        buffer_size = fs::file_size(p);
        buffer = new unsigned char[buffer_size];
    }
    else {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Error: couldn't access the data file. |"
            << filename << "|" << std::endl;
        read_bi5_to_bin_lock.unlock();
        return 2;
    }

    //22-9-2020 here I read the downloaded file into filestream
    std::ifstream fin(filename, std::ifstream::binary);
    fin.read(reinterpret_cast<char*>(buffer), buffer_size);
    fin.close();

    //22-9-2020 here I check if file is related to japanese yen so that I determine how to write its value
    /*
    if symbols_xxx has mHTTPRequest_Symbol_str then PV=0.001
    else if symbols_xxxx has mHTTPRequest_Symbol_str then PV=0.0001
    else if symbols_xxxx has mHTTPRequest_Symbol_str then PV=0.00001
    */
    //28-9-2020 I will make 3 vectors in utils.h for 3,4,5 point value ,then I find symbol in vector,
    //std::size_t pos = mHTTPRequest_Symbol_str.find("JPY");

    double PV;

    std::vector<std::string>::iterator it3 = std::find(point_value_xxx.begin(), point_value_xxx.end(), mHTTPRequest_Symbol_str);

    std::vector<std::string>::iterator it4 = std::find(point_value_xxxx.begin(), point_value_xxxx.end(), mHTTPRequest_Symbol_str);

    std::vector<std::string>::iterator it5 = std::find(point_value_xxxxx.begin(), point_value_xxxxx.end(), mHTTPRequest_Symbol_str);
    if (it3 != point_value_xxx.end())
    {
        PV = 0.001;
    }
    else if (it4 != point_value_xxxx.end())
    {
        PV = 0.0001;
    }
    else if (it5 != point_value_xxxxx.end())
    {
        PV = 0.00001;
    }
    else
    {
        //10-1-2020throw;
        PV = 0.001;

    }
    read_bi5_to_bin_lock2.lock();
    unsigned char *data_bin_buffer = 0 ;
    n47::tick_data *data = n47::read_bi5_to_bin(
            buffer, buffer_size, epoch, PV, &raw_size, &data_bin_buffer);

    //5-11-2020 here i will save binary file
    std::string file_name_path_string=output_compressed_file_2(&data_bin_buffer, raw_size, filename_to_bin);
    read_bi5_to_bin_lock2.unlock();

    path file_name_path_2{ file_name_path_string };
    buffer_size = 0;
    if (fs::exists(file_name_path_2) && fs::is_regular(file_name_path_2))
    {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << boost::this_thread::get_id() <<"\t we can access the data .bin file. |"
            << filename_to_bin << "| with size ="<< fs::file_size(file_name_path_2) << std::endl;
        read_bi5_to_bin_lock.unlock();
    }
    else {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Error: couldn't access the data .bin file. |"
            << filename_to_bin << "|" << std::endl;
        read_bi5_to_bin_lock.unlock();
        return 2;
    }

    n47::tick_data_iterator iter;

    //5-11-2020 here i will save file.csv from data which is pointer to vector to pointers to ticks
    if (data == 0)
    {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Failure: Failed to load the data!" << std::endl;
        read_bi5_to_bin_lock.unlock();
    }
    //5-15-2020 take care that without else ,error happens with empty files because data is pointer to vector of pointers to ticks .so when data is made inside read_bi5 ,it is made as null pointer and later it is assigned to vector if file has ticks.if file does not have ticks ,then it is just returned as null pointer .so when dereferencing null pointer we got error
    else if (data->size() != (raw_size / n47::ROW_SIZE))
    {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Failure: Loaded " << data->size()
            << " ticks but file size indicates we should have loaded "
            << (raw_size / n47::ROW_SIZE) << std::endl;
        read_bi5_to_bin_lock.unlock();
    }
    //22-9-2020 in last if and if else I checked if file is either empty or has error of data size So now I have good clean file to work with
    //read_bi5_to_bin_lock.lock();
    //BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "time, bid, bid_vol, ask, ask_vol" << std::endl;
    //read_bi5_to_bin_lock.unlock();

    counter = 0;

    std::ofstream out_csv(filename_string_to_csv);
    if (data == 0)
    {

    }
    else if (data != 0)
    {
        for (iter = data->begin(); iter != data->end(); iter++) {
            //5-11-2020 here i will save file.csv from data which is pointer to vector to pointers to ticks>>>>>>>here i should open file stream for output and save data to it
            out_csv
            //<< std::setfill('0')<<std::setw(sizeof((*iter)->epoch + (*iter)->td))<<std::fixed<<((*iter)->epoch + (*iter)->td) << ","
            //<< std::setfill('0')<<std::setw(27)<<std::fixed<<((*iter)->epoch + (*iter)->td) << ","
            << std::setfill('0')<<((*iter)->epoch + (*iter)->td) << ","
            << std::setfill('0')<<std::setw(sizeof(*iter)->bid)<<std::fixed << (*iter)->bid << ","
            << std::setfill('0')<<std::setw(sizeof(*iter)->bidv)<<std::fixed << (*iter)->bidv << ","
            << std::setfill('0')<<std::setw(sizeof(*iter)->ask)<<std::fixed << (*iter)->ask << ","
            << std::setfill('0')<<std::setw(sizeof(*iter)->askv)<<std::fixed << (*iter)->askv << std::endl;
            //??5-17-2020 isolate multithreaded error
            /*
            read_bi5_to_bin_lock.lock();
            BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) <<
                boost::this_thread::get_id() << "\t"<<((*iter)->epoch + (*iter)->td) << ", "
                << (*iter)->bid << ", " << (*iter)->bidv << ", "
                << (*iter)->ask << ", " << (*iter)->askv << std::endl;
            BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) <<
                            boost::this_thread::get_id() << "\t"<< std::setfill('0')<< std::setw(sizeof((*iter)->epoch + (*iter)->td))<<((*iter)->epoch + (*iter)->td) << ","
                            << std::setfill('0')<<std::setw(sizeof(*iter)->bid)<< (*iter)->bid << ","
                            << std::setfill('0')<<std::setw(sizeof(*iter)->bidv)<< (*iter)->bidv << ","
                            << std::setfill('0')<<std::setw(sizeof(*iter)->ask)<< (*iter)->ask << ","
                            << std::setfill('0')<<std::setw(sizeof(*iter)->askv)<< (*iter)->askv << std::endl;
            read_bi5_to_bin_lock.unlock();
            */
            counter++;
        }
        ////read_bi5_to_bin_lock.unlock();

    }
    out_csv.close();
    //5-13-2020

    //??5-17-2020 isolate multithreaded error
    read_bi5_to_bin_lock.lock();

    BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << ".end." << std::endl << std::endl
        << "From " << raw_size << " bytes we read " << counter
        << " records." << std::endl
        << raw_size << " / " << n47::ROW_SIZE << " = "
        << (raw_size / n47::ROW_SIZE) << std::endl;
    read_bi5_to_bin_lock.unlock();


    delete data;
    delete[] buffer;
    delete [] data_bin_buffer;
    return 0;
}

这是我的dukascopy修改文件

//#include "stdafx.h"

/*
Copyright 2013 Michael O'Keeffe (a.k.a. ninety47).

This file is part of ninety47 Dukascopy toolbox.

The "ninety47 Dukascopy toolbox" is free software: you can redistribute it
and/or modify it under the terms of the GNU General Public License as
published by the Free Software Foundation, either version 3 of the License,
or any later version.

"ninety47 Dukascopy toolbox" is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
Public License for more details.

You should have received a copy of the GNU General Public License along with
"ninety47 Dukascopy toolbox".  If not, see <http://www.gnu.org/licenses/>.
*/

#include "ninety47/dukascopy.h"
#include <boost/date_time/posix_time/posix_time.hpp>
#include <algorithm>
#include <vector>
#include "ninety47/dukascopy/defs.h"
#include "ninety47/dukascopy/io.hpp"
#include "ninety47/dukascopy/lzma.h"



namespace n47 {

namespace pt = boost::posix_time;


tick *tickFromBuffer(
        unsigned char *buffer, pt::ptime epoch, float digits, size_t offset) {
    bytesTo<unsigned int, n47::BigEndian> bytesTo_unsigned;
    bytesTo<float, n47::BigEndian> bytesTo_float;

    unsigned int ts = bytesTo_unsigned(buffer + offset);
    pt::time_duration ms = pt::millisec(ts);
    unsigned int ofs = offset + sizeof(ts);
    float ask = bytesTo_unsigned(buffer + ofs) * digits;
    ofs += sizeof(ts);
    float bid = bytesTo_unsigned(buffer + ofs) * digits;
    ofs += sizeof(ts);
    //28-9-2020 convert volume to million
    float askv = bytesTo_float(buffer + ofs) *1000000;
    ofs += sizeof(ts);
    float bidv = bytesTo_float(buffer + ofs) *1000000;

    return new tick(epoch, ms, ask, bid, askv, bidv);
}


tick_data* read_bin(
        unsigned char *buffer, size_t buffer_size, pt::ptime epoch, float point_value) {
    std::vector<tick*> *data = new std::vector<tick*>();
    std::vector<tick*>::iterator iter;

    std::size_t offset = 0;

    while ( offset < buffer_size ) {
        data->push_back(tickFromBuffer(buffer, epoch, point_value, offset));
        offset += ROW_SIZE;
    }

    return data;
}


tick_data* read_bi5(
        unsigned char *lzma_buffer, size_t lzma_buffer_size, pt::ptime epoch,
        float point_value, size_t *bytes_read) {
    tick_data *result = 0;

    // decompress
    int status;
    unsigned char *buffer = n47::lzma::decompress(lzma_buffer,
            lzma_buffer_size, &status, bytes_read);

    //5-11-2020 here i will save binary file


    if (status != N47_E_OK) {
        bytes_read = 0;
    } else {
        // convert to tick data (with read_bin).
        result = read_bin(buffer, *bytes_read, epoch, point_value);
        delete [] buffer;
    }

    return result;
}

//5-11-2020
tick_data* read_bi5_to_bin(
    unsigned char *lzma_buffer, size_t lzma_buffer_size, pt::ptime epoch,
    float point_value, size_t *bytes_read, unsigned char** buffer_decompressed) {
    tick_data *result = 0;

    // decompress
    int status;
    *buffer_decompressed = n47::lzma::decompress(lzma_buffer,
        lzma_buffer_size, &status, bytes_read);

    if (status != N47_E_OK) 
    {
        bytes_read = 0;
    }
    else {
        // convert to tick data (with read_bin).
        result = read_bin(*buffer_decompressed, *bytes_read, epoch, point_value);
        //delete[] buffer;
    }

    return result;
}


tick_data* read(
        const char *filename, pt::ptime epoch, float point_value, size_t *bytes_read) {
    tick_data *result = 0;
    size_t buffer_size = 0;
    unsigned char *buffer = n47::io::loadToBuffer<unsigned char>(filename, &buffer_size);

    if ( buffer != 0 ) {
        if ( n47::lzma::bufferIsLZMA(buffer, buffer_size) ) {
            result = read_bi5(buffer, buffer_size, epoch, point_value, bytes_read);
            // Reading in as bi5 failed lets double check its not binary
            // data in the buffer.
            if (result == 0) {
                result = read_bin(buffer, buffer_size, epoch, point_value);
            }
        } else {
            result = read_bin(buffer, buffer_size, epoch, point_value);
            *bytes_read = buffer_size;
        }
        delete [] buffer;

        if (result != 0 && result->size() != (*bytes_read / n47::ROW_SIZE)) {
            delete result;
            result = 0;
        }
    }
    return result;
}

}  // namespace n47

Answer 1

我深入挖掘了 lzma 工作的细节，这对我来说很沉重，所以我更改了库并使用了 7z cpp lzma 规范文件。
它有效。
我认为这个问题与在 cpp 程序中使用 c 代码有关。
该库还声明它已针对 bsd 进行了测试谢谢你的帮助。任何有相同情况的人下载 7zip 并使用 cpp 文件

lzma totalread 比 header 未压缩大小大 1

lzma totalread is greater than header uncompressed size by 1

c++

lzma