根据固定长度将文件中的每一行拆分为带有分隔符的子字符串
Split each row in a file into substrings with delimiter based on fixed length
我需要一些帮助来将文件转换为具有以下要求的新文件:
- Split each row (long string) into sub-string based on fixed length
- use pipe delimiter "|" between each sub-string
- leave last undefined column (sub-string) as-is, but add "|" before it.
这里是例子,假设一个文件 (test.dat) 有 2 行:
PG123ABCD A 000{000
MK789HJKL32H00
Column 1: length(2)
Column 2: length(3)
Column 3: length(4)
Column 4: length(3)
Column 5: undefined, use all remaining value
下面是我需要的最终输出。该示例只有 2 行,假设我有一个包含 1k+ 相似行的文件,并且我需要根据上述要求将原始文件转换为新文件。
PG|123|ABCD| A |000{000
MK|789|HJKL|32H|00
cut -b 1-2,3-5,6-9,10-12,13-500 --output-delimiter='|' test.dat > 1.dat
我写了上面的代码,它输出的正是我需要的。
The only question I have is last column, I used 13-500 as fixed length for the undefined column, however the length of the undefined remaining string varies in different rows, is there a generic way to define the last column's length? e.g., something like 13-max_lengh_of_the_row
我需要一些帮助来将文件转换为具有以下要求的新文件:
- Split each row (long string) into sub-string based on fixed length
- use pipe delimiter "|" between each sub-string
- leave last undefined column (sub-string) as-is, but add "|" before it.
这里是例子,假设一个文件 (test.dat) 有 2 行:
PG123ABCD A 000{000
MK789HJKL32H00
Column 1: length(2)
Column 2: length(3)
Column 3: length(4)
Column 4: length(3)
Column 5: undefined, use all remaining value
下面是我需要的最终输出。该示例只有 2 行,假设我有一个包含 1k+ 相似行的文件,并且我需要根据上述要求将原始文件转换为新文件。
PG|123|ABCD| A |000{000
MK|789|HJKL|32H|00
cut -b 1-2,3-5,6-9,10-12,13-500 --output-delimiter='|' test.dat > 1.dat
我写了上面的代码,它输出的正是我需要的。
The only question I have is last column, I used 13-500 as fixed length for the undefined column, however the length of the undefined remaining string varies in different rows, is there a generic way to define the last column's length? e.g., something like 13-max_lengh_of_the_row