优化 EmEditor 宏以根据大文件的另一列填充列

Optimised EmEditor macro to populate column based on another column for a large file

我有一个非常大的文件,大约 1000 万行,我试图通过 jsee 宏根据另一列的条件填充一列。虽然对于小文件来说很快,但是对于大文件来说确实需要一些时间。

//pseudocode
//No sorting on Col1, which can have empty cells too
For all lines in file
     IF (cell in Col2 IS empty) AND (cell in Col1 IS NOT empty) AND (cell in Col1 = previous cell in Col1)
          THEN cell in Col2 = previous cell in Col2

//jsee code
document.CellMode = true;   // Must be cell selection mode
totalLines = document.GetLines();
    
for( i = 1; i < totalLines; i++ ) {

     nref = document.GetCell( i, 1, eeCellIncludeNone );
     gsize = document.GetCell( i, 2, eeCellIncludeNone );

     if (gsize == "" && nref != "" && nref == document.GetCell( i-1, 1, eeCellIncludeNone ) ) {
          document.SetCell( i, 2, document.GetCell( i-1, 2, eeCellIncludeNone ) , eeAutoQuote);
      }
 }

输入文件:

Reference Group Size
14/12/01819 1
14/12/01820 1
15/01/00191 4
15/01/00191
15/01/00191
15/01/00198
15/01/00292 3
15/01/00292
15/01/00292
15/01/00401 5
15/01/00401
15/01/00402
1
15/01/00403 2
15/01/00403
15/01/00403
15/01/00403
15/01/00404
20/01/01400 1

输出文件:

Reference Group Size
14/12/01819 1
14/12/01820 1
15/01/00191 4
15/01/00191 4
15/01/00191 4
15/01/00198
15/01/00292 3
15/01/00292 3
15/01/00292 3
15/01/00401 5
15/01/00401 5
15/01/00402
1
15/01/00403 2
15/01/00403 2
15/01/00403 2
15/01/00403 2
15/01/00404
20/01/01400 1

关于如何优化它并使其 运行 更快的任何想法?

我为 EmEditor 写了一个 JavaScript 宏给你。您可能需要在 iColReferenceiColGroupSize.

的前两行中设置正确的数字
iColReference = 1;   // the column index of "Reference"
iColGroupSize = 2;   // the column index of "Group Size"
document.CellMode = true;   // Must be cell selection mode
sDelimiter = document.Csv.Delimiter;  // retrieve the delimiter
nOldHeadingLines = document.HeadingLines;  // retrieve old headings
document.HeadingLines = 0;   // set No Headings
yBottom = document.GetLines();   // retrieve the number of lines
if( document.GetLine( yBottom ).length == 0 ) {  // -1 if the last line is empty
    --yBottom;
}
str = document.GetColumn( iColReference, sDelimiter, eeCellIncludeQuotes, 1, yBottom );  // get whole 1st column from top to bottom, separated by TAB
sCol1 = str.split( sDelimiter );
str = document.GetColumn( iColGroupSize, sDelimiter, eeCellIncludeQuotes, 1, yBottom );  // get whole 2nd column from top to bottom, separated by TAB
sCol2 = str.split( sDelimiter );

s1 = "";
s2 = "";
for( i = 0; i < yBottom; ++i ) {  // loop through all lines
    if( sCol2[i].length != 0 ) {
        s1 = sCol1[i];
        s2 = sCol2[i];
    }
    else {
        if( s1.length != 0 && sCol1[i] == s1 ) {  // same value as previous line, copy s2
            if( s2.length != 0 ) {
                sCol2[i] = s2;
            }
        }
        else {  // different value, empty s1 and s2
            s1 = "";
            s2 = "";
        }
    }
}

str = sCol2.join( sDelimiter );
document.SetColumn( iColGroupSize, str, sDelimiter, eeDontQuote );  // set whole 2nd column from top to bottom with the new values
document.HeadingLines = nOldHeadingLines;   // restore the original number of headings

为了运行这个,将这个代码保存为,例如Macro.jsee,然后select这个文件来自Select... 菜单中。最后,select 运行 Macro.jseeMacros 菜单中。