优化 EmEditor 宏以根据大文件的另一列填充列
Optimised EmEditor macro to populate column based on another column for a large file
我有一个非常大的文件,大约 1000 万行,我试图通过 jsee 宏根据另一列的条件填充一列。虽然对于小文件来说很快,但是对于大文件来说确实需要一些时间。
//pseudocode
//No sorting on Col1, which can have empty cells too
For all lines in file
IF (cell in Col2 IS empty) AND (cell in Col1 IS NOT empty) AND (cell in Col1 = previous cell in Col1)
THEN cell in Col2 = previous cell in Col2
//jsee code
document.CellMode = true; // Must be cell selection mode
totalLines = document.GetLines();
for( i = 1; i < totalLines; i++ ) {
nref = document.GetCell( i, 1, eeCellIncludeNone );
gsize = document.GetCell( i, 2, eeCellIncludeNone );
if (gsize == "" && nref != "" && nref == document.GetCell( i-1, 1, eeCellIncludeNone ) ) {
document.SetCell( i, 2, document.GetCell( i-1, 2, eeCellIncludeNone ) , eeAutoQuote);
}
}
输入文件:
Reference
Group Size
14/12/01819
1
14/12/01820
1
15/01/00191
4
15/01/00191
15/01/00191
15/01/00198
15/01/00292
3
15/01/00292
15/01/00292
15/01/00401
5
15/01/00401
15/01/00402
1
15/01/00403
2
15/01/00403
15/01/00403
15/01/00403
15/01/00404
20/01/01400
1
输出文件:
Reference
Group Size
14/12/01819
1
14/12/01820
1
15/01/00191
4
15/01/00191
4
15/01/00191
4
15/01/00198
15/01/00292
3
15/01/00292
3
15/01/00292
3
15/01/00401
5
15/01/00401
5
15/01/00402
1
15/01/00403
2
15/01/00403
2
15/01/00403
2
15/01/00403
2
15/01/00404
20/01/01400
1
关于如何优化它并使其 运行 更快的任何想法?
我为 EmEditor 写了一个 JavaScript 宏给你。您可能需要在 iColReference 和 iColGroupSize.
的前两行中设置正确的数字
iColReference = 1; // the column index of "Reference"
iColGroupSize = 2; // the column index of "Group Size"
document.CellMode = true; // Must be cell selection mode
sDelimiter = document.Csv.Delimiter; // retrieve the delimiter
nOldHeadingLines = document.HeadingLines; // retrieve old headings
document.HeadingLines = 0; // set No Headings
yBottom = document.GetLines(); // retrieve the number of lines
if( document.GetLine( yBottom ).length == 0 ) { // -1 if the last line is empty
--yBottom;
}
str = document.GetColumn( iColReference, sDelimiter, eeCellIncludeQuotes, 1, yBottom ); // get whole 1st column from top to bottom, separated by TAB
sCol1 = str.split( sDelimiter );
str = document.GetColumn( iColGroupSize, sDelimiter, eeCellIncludeQuotes, 1, yBottom ); // get whole 2nd column from top to bottom, separated by TAB
sCol2 = str.split( sDelimiter );
s1 = "";
s2 = "";
for( i = 0; i < yBottom; ++i ) { // loop through all lines
if( sCol2[i].length != 0 ) {
s1 = sCol1[i];
s2 = sCol2[i];
}
else {
if( s1.length != 0 && sCol1[i] == s1 ) { // same value as previous line, copy s2
if( s2.length != 0 ) {
sCol2[i] = s2;
}
}
else { // different value, empty s1 and s2
s1 = "";
s2 = "";
}
}
}
str = sCol2.join( sDelimiter );
document.SetColumn( iColGroupSize, str, sDelimiter, eeDontQuote ); // set whole 2nd column from top to bottom with the new values
document.HeadingLines = nOldHeadingLines; // restore the original number of headings
为了运行这个,将这个代码保存为,例如Macro.jsee
,然后select这个文件来自Select... 在 宏 菜单中。最后,select 运行 Macro.jsee 在 Macros 菜单中。
我有一个非常大的文件,大约 1000 万行,我试图通过 jsee 宏根据另一列的条件填充一列。虽然对于小文件来说很快,但是对于大文件来说确实需要一些时间。
//pseudocode
//No sorting on Col1, which can have empty cells too
For all lines in file
IF (cell in Col2 IS empty) AND (cell in Col1 IS NOT empty) AND (cell in Col1 = previous cell in Col1)
THEN cell in Col2 = previous cell in Col2
//jsee code
document.CellMode = true; // Must be cell selection mode
totalLines = document.GetLines();
for( i = 1; i < totalLines; i++ ) {
nref = document.GetCell( i, 1, eeCellIncludeNone );
gsize = document.GetCell( i, 2, eeCellIncludeNone );
if (gsize == "" && nref != "" && nref == document.GetCell( i-1, 1, eeCellIncludeNone ) ) {
document.SetCell( i, 2, document.GetCell( i-1, 2, eeCellIncludeNone ) , eeAutoQuote);
}
}
输入文件:
Reference | Group Size |
---|---|
14/12/01819 | 1 |
14/12/01820 | 1 |
15/01/00191 | 4 |
15/01/00191 | |
15/01/00191 | |
15/01/00198 | |
15/01/00292 | 3 |
15/01/00292 | |
15/01/00292 | |
15/01/00401 | 5 |
15/01/00401 | |
15/01/00402 | |
1 | |
15/01/00403 | 2 |
15/01/00403 | |
15/01/00403 | |
15/01/00403 | |
15/01/00404 | |
20/01/01400 | 1 |
输出文件:
Reference | Group Size |
---|---|
14/12/01819 | 1 |
14/12/01820 | 1 |
15/01/00191 | 4 |
15/01/00191 | 4 |
15/01/00191 | 4 |
15/01/00198 | |
15/01/00292 | 3 |
15/01/00292 | 3 |
15/01/00292 | 3 |
15/01/00401 | 5 |
15/01/00401 | 5 |
15/01/00402 | |
1 | |
15/01/00403 | 2 |
15/01/00403 | 2 |
15/01/00403 | 2 |
15/01/00403 | 2 |
15/01/00404 | |
20/01/01400 | 1 |
关于如何优化它并使其 运行 更快的任何想法?
我为 EmEditor 写了一个 JavaScript 宏给你。您可能需要在 iColReference 和 iColGroupSize.
的前两行中设置正确的数字iColReference = 1; // the column index of "Reference"
iColGroupSize = 2; // the column index of "Group Size"
document.CellMode = true; // Must be cell selection mode
sDelimiter = document.Csv.Delimiter; // retrieve the delimiter
nOldHeadingLines = document.HeadingLines; // retrieve old headings
document.HeadingLines = 0; // set No Headings
yBottom = document.GetLines(); // retrieve the number of lines
if( document.GetLine( yBottom ).length == 0 ) { // -1 if the last line is empty
--yBottom;
}
str = document.GetColumn( iColReference, sDelimiter, eeCellIncludeQuotes, 1, yBottom ); // get whole 1st column from top to bottom, separated by TAB
sCol1 = str.split( sDelimiter );
str = document.GetColumn( iColGroupSize, sDelimiter, eeCellIncludeQuotes, 1, yBottom ); // get whole 2nd column from top to bottom, separated by TAB
sCol2 = str.split( sDelimiter );
s1 = "";
s2 = "";
for( i = 0; i < yBottom; ++i ) { // loop through all lines
if( sCol2[i].length != 0 ) {
s1 = sCol1[i];
s2 = sCol2[i];
}
else {
if( s1.length != 0 && sCol1[i] == s1 ) { // same value as previous line, copy s2
if( s2.length != 0 ) {
sCol2[i] = s2;
}
}
else { // different value, empty s1 and s2
s1 = "";
s2 = "";
}
}
}
str = sCol2.join( sDelimiter );
document.SetColumn( iColGroupSize, str, sDelimiter, eeDontQuote ); // set whole 2nd column from top to bottom with the new values
document.HeadingLines = nOldHeadingLines; // restore the original number of headings
为了运行这个,将这个代码保存为,例如Macro.jsee
,然后select这个文件来自Select... 在 宏 菜单中。最后,select 运行 Macro.jsee 在 Macros 菜单中。