优化 EmEditor 宏以根据大文件的另一列填充列

Question

我有一个非常大的文件，大约 1000 万行，我试图通过 jsee 宏根据另一列的条件填充一列。虽然对于小文件来说很快，但是对于大文件来说确实需要一些时间。

//pseudocode
//No sorting on Col1, which can have empty cells too
For all lines in file
     IF (cell in Col2 IS empty) AND (cell in Col1 IS NOT empty) AND (cell in Col1 = previous cell in Col1)
          THEN cell in Col2 = previous cell in Col2

//jsee code
document.CellMode = true;   // Must be cell selection mode
totalLines = document.GetLines();
    
for( i = 1; i < totalLines; i++ ) {

     nref = document.GetCell( i, 1, eeCellIncludeNone );
     gsize = document.GetCell( i, 2, eeCellIncludeNone );

     if (gsize == "" && nref != "" && nref == document.GetCell( i-1, 1, eeCellIncludeNone ) ) {
          document.SetCell( i, 2, document.GetCell( i-1, 2, eeCellIncludeNone ) , eeAutoQuote);
      }
 }

输入文件：

Reference	Group Size
14/12/01819	1
14/12/01820	1
15/01/00191	4
15/01/00191
15/01/00191
15/01/00198
15/01/00292	3
15/01/00292
15/01/00292
15/01/00401	5
15/01/00401
15/01/00402
	1
15/01/00403	2
15/01/00403
15/01/00403
15/01/00403
15/01/00404
20/01/01400	1

输出文件：

Reference	Group Size
14/12/01819	1
14/12/01820	1
15/01/00191	4
15/01/00191	4
15/01/00191	4
15/01/00198
15/01/00292	3
15/01/00292	3
15/01/00292	3
15/01/00401	5
15/01/00401	5
15/01/00402
	1
15/01/00403	2
15/01/00403	2
15/01/00403	2
15/01/00403	2
15/01/00404
20/01/01400	1

关于如何优化它并使其运行更快的任何想法？

Answer 1

我为 EmEditor 写了一个 JavaScript 宏给你。您可能需要在 iColReference 和 iColGroupSize.

的前两行中设置正确的数字
iColReference = 1; // the column index of "Reference" iColGroupSize = 2; // the column index of "Group Size" document.CellMode = true; // Must be cell selection mode sDelimiter = document.Csv.Delimiter; // retrieve the delimiter nOldHeadingLines = document.HeadingLines; // retrieve old headings document.HeadingLines = 0; // set No Headings yBottom = document.GetLines(); // retrieve the number of lines if( document.GetLine( yBottom ).length == 0 ) { // -1 if the last line is empty --yBottom; } str = document.GetColumn( iColReference, sDelimiter, eeCellIncludeQuotes, 1, yBottom ); // get whole 1st column from top to bottom, separated by TAB sCol1 = str.split( sDelimiter ); str = document.GetColumn( iColGroupSize, sDelimiter, eeCellIncludeQuotes, 1, yBottom ); // get whole 2nd column from top to bottom, separated by TAB sCol2 = str.split( sDelimiter ); s1 = ""; s2 = ""; for( i = 0; i < yBottom; ++i ) { // loop through all lines if( sCol2[i].length != 0 ) { s1 = sCol1[i]; s2 = sCol2[i]; } else { if( s1.length != 0 && sCol1[i] == s1 ) { // same value as previous line, copy s2 if( s2.length != 0 ) { sCol2[i] = s2; } } else { // different value, empty s1 and s2 s1 = ""; s2 = ""; } } } str = sCol2.join( sDelimiter ); document.SetColumn( iColGroupSize, str, sDelimiter, eeDontQuote ); // set whole 2nd column from top to bottom with the new values document.HeadingLines = nOldHeadingLines; // restore the original number of headings

为了运行这个，将这个代码保存为，例如Macro.jsee，然后select这个文件来自Select... 在宏菜单中。最后，select 运行 Macro.jsee 在 Macros 菜单中。

优化 EmEditor 宏以根据大文件的另一列填充列

Optimised EmEditor macro to populate column based on another column for a large file

performance

emeditor