从客户端和客户端读取文本文件超过 javascript 中单个字符串的最大大小

Question

我想撤消在 javascript 中在客户端上执行的以下步骤，但遇到了 blob 问题。

在 indexedDB 数据库中，在对象存储索引上打开游标：

从数据库中提取数据对象。
使用 JSON.stringify.
制作了 JSON 字符串的新 blob { type: 'text/csv' }。
已将 blob 写入数组。
将光标向下移动一位并从步骤 1 开始重复。

事务成功完成后，从 blob 数组中创建了一个新的相同类型的 blob。

这样做的原因是 JSON 字符串的连接超出了单个字符串的最大允许大小；所以，不能首先连接并使那个大字符串成为一个 blob。但是，可以将 blob 数组制成一个更大的 blob，大约 350MB，然后下载到客户端磁盘。

为了逆转这个过程，我想我可以读入 blob，然后将它切成组件 blob，然后将每个 blob 作为一个字符串读取；但我不知道该怎么做。

如果将 FileReader 作为文本读取，结果是一大块文本无法写入单个变量，因为它超过了最大大小并抛出分配大小溢出错误。

似乎将文件作为数组缓冲区读取是一种允许将 blob 切成块的方法，但似乎存在某种编码问题。

有没有办法按原样反转原始过程，或者可以添加一个编码步骤以允许将数组缓冲区转换回原始字符串？

我试着阅读了一些看似相关的问题，但此时，我不理解他们讨论的编码问题。恢复一个字符串好像比较复杂

感谢您提供的任何指导。

采用已接受答案后的附加信息

我在下面发布的代码当然没有什么特别之处，但我想我会把它分享给那些可能和我一样陌生的人。它是集成到用于读取 blob、解析它们并将它们写入数据库的 asnyc 函数中的公认答案。

此方法使用的内存很少。太糟糕了，没有办法将数据写入磁盘。在将数据库写入磁盘时，内存使用量随着大 blob 的生成而增加，然后在下载完成后不久释放。使用此方法从本地磁盘上传文件，似乎无需在切片之前将整个 blob 加载到内存中即可工作。就好像文件是从磁盘中切片读取的一样。因此，它在内存使用方面非常高效。

在我的具体情况下，仍有工作要做，因为使用它来将总计 350MB 的 50,000 JSON 个字符串写回数据库相当慢，大约需要 7:30 才能完成。

现在，每个单独的字符串都被单独切片，作为文本读取，并在单个事务中写入数据库。是否将 blob 切成由一组 JSON 字符串组成的更大的块，将它们作为块中的文本读取，然后在单个事务中将它们写入数据库，将执行得更快，同时仍然不会使用大量内存是我需要试验的东西，也是一个单独问题的主题。

如果使用替代循环来确定填充大小 const c 所需的 JSON 个字符串的数量，然后将该大小的 blob 切片，将其读取为文本，并将其拆分以解析每个单独的 JSON 字符串，对于 c =250,000 到 1,000,000，完成时间约为 1:30。无论如何，解析大量 JSON 字符串似乎仍然会减慢速度。大型 blob 切片不会转换为将大量文本解析为单个块，并且 50,000 个字符串中的每一个都需要单独解析。

   try

     {

       let i, l, b, result, map, p;

       const c = 1000000;


       // First get the file map from front of blob/file.

       // Read first ten characters to get length of map JSON string.

       b = new Blob( [ f.slice(0,10) ], { type: 'text/csv' } ); 

       result = await read_file( b );

       l = parseInt(result.value);


       // Read the map string and parse to array of objects.

       b = new Blob( [ f.slice( 10, 10 + l) ], { type: 'text/csv' } ); 

       result = await read_file( b );

       map = JSON.parse(result.value); 


       l = map.length;

       p = 10 + result.value.length;


       // Using this loop taks about 7:30 to complete.

       for ( i = 1; i < l; i++ )

         {

           b = new Blob( [ f.slice( p, p + map[i].l ) ], { type: 'text/csv' } ); 

           result = await read_file( b ); // FileReader wrapped in a promise.

           result = await write_qst( JSON.parse( result.value ) ); // Database transaction wrapped in a promise.

           p = p + map[i].l;

           $("#msg").text( result );

         }; // next i


       $("#msg").text( "Successfully wrote all data to the database." );


       i = l = b = result = map = p = null;

     }

   catch(e)

     { 

       alert( "error " + e );

     }

   finally

     {

       f = null;

     }



/* 

  // Alternative loop that completes in about 1:30 versus 7:30 for above loop.


       for ( i = 1; i < l; i++ )

         { 

           let status = false, 

               k, j, n = 0, x = 0, 

               L = map[i].l,

               a_parse = [];



           if ( L < c ) status = true;

           while ( status )

             {

               if ( i+1 < l && L + map[i+1].l <= c ) 

                 {

                   L = L + map[i+1].l;

                   i = i + 1;

                   n = n + 1;

                 }

               else

                 {

                   status = false;

                 };

             }; // loop while


           b = new Blob( [ f.slice( p, p + L ) ], { type: 'text/csv' } ); 

           result = await read_file( b ); 

           j = i - n; 

           for ( k = j; k <= i; k++ )

             {

                a_parse.push( JSON.parse( result.value.substring( x, x + map[k].l ) ) );

                x = x + map[k].l;

             }; // next k

           result = await write_qst_grp( a_parse, i + ' of ' + l );

           p = p + L;

           $("#msg").text( result );

         }; // next i



*/



/*

// Was using this loop when thought the concern may be that the JSON strings were too large,
// but then realized the issue in my case is the opposite one of having 50,000 JSON strings of smaller size.

       for ( i = 1; i < l; i++ )

         {

           let x,

               m = map[i].l,

               str = [];

           while ( m > 0 )

             {

               x = Math.min( m, c );

               m = m - c;

               b = new Blob( [ f.slice( p, p + x ) ], { type: 'text/csv' } ); 

               result = await read_file( b );

               str.push( result.value );

               p = p + x;

             }; // loop while


            result = await write_qst( JSON.parse( str.join("") ) );

            $("#msg").text( result );

            str = null;

         }; // next i
*/

Answer 1

有趣的是你已经在你的问题中说过应该做什么：

切片你的 Blob。

Blob 接口确实有一个 .slice() 方法。
但是要使用它，您应该跟踪合并发生的位置。（可以在您的数据库的其他字段中，甚至可以作为文件的 header：

function readChunks({blob, chunk_size}) {
  console.log('full Blob size', blob.size);
  const strings = [];  
  const reader = new FileReader();
  var cursor = 0;
  reader.onload = onsingleprocessed;
  
  readNext();
  
  function readNext() {
    // here is the magic
    const nextChunk = blob.slice(cursor, (cursor + chunk_size));
    cursor += chunk_size;
    reader.readAsText(nextChunk);
  }
  function onsingleprocessed() {
    strings.push(reader.result);
    if(cursor < blob.size) readNext();
    else {
      console.log('read %s chunks', strings.length);
      console.log('excerpt content of the first chunk',
        strings[0].substring(0, 30));
    }
  }
}



// we will do the demo in a Worker to not kill visitors page
function worker_script() {
  self.onmessage = e => {
    const blobs = [];
    const chunk_size = 1024*1024; // 1MB per chunk
    for(let i=0; i<500; i++) {
      let arr = new Uint8Array(chunk_size);
      arr.fill(97); // only 'a'
      blobs.push(new Blob([arr], {type:'text/plain'}));
    }
    const merged = new Blob(blobs, {type: 'text/plain'});
    self.postMessage({blob: merged, chunk_size: chunk_size});
  }
}
const worker_url = URL.createObjectURL(
  new Blob([`(${worker_script.toString()})()`],
    {type: 'application/javascript'}
  )
);
const worker = new Worker(worker_url);
worker.onmessage = e => readChunks(e.data);
worker.postMessage('do it');

从客户端和客户端读取文本文件超过 javascript 中单个字符串的最大大小

Reading a text file from the client and on the client that exceeds the maximum size of a single string in javascript

javascript

blob

arraybuffer