Perl - 在写入过程中更改文件名
Perl - Changing file name in the middle of write
我正在尝试使用我在 Perl 中创建的一个非常大的 txt 文件(超过一百万行),并通过 Perl 中的不同语句 运行 它基本上看起来像这样(注意以下内容是 shell)
a=0
b=1
while read line;
do
echo -n "" > "Write file"${b}
a=($a + 1)
while ( $a <= 5000)
do
echo $line >> "Write file"${b}
a=($a + 1)
done
a=0
b=($b + 1)
done < "read file"
尝试将其大小缩小到每个文件 5k 行,并每次递增(filename1.txt、filename2.txt、filename3.txt 等)
这似乎在 shell 中不起作用,可能是由于输入文件的大小,对于我来说,我想不出如何更改我正在写入的文件循环..
您可以在 shell 中使用 split
执行此操作。
例如:
split -l 5000 filename.txt filename.txt.
会将 filename.txt
分成多个文件,每个文件最多 5,000 行。输出文件将是名称 filename.txt.aa
、filename.txt.ab
、filename.txt.ac
等
来自我的 man split
:
NAME
split -- split a file into pieces
SYNOPSIS
split [-a suffix_length] [-b byte_count[k|m]] [-l line_count] [-p pattern] [file [name]]
DESCRIPTION
The split utility reads the given file and breaks it up into files of 1000 lines each. If file is a single dash (`-') or absent, split reads from the stan-
dard input.
The options are as follows:
-a suffix_length
Use suffix_length letters to form the suffix of the file name.
-b byte_count[k|m]
Create smaller files byte_count bytes in length. If ``k'' is appended to the number, the file is split into byte_count kilobyte pieces. If ``m'' is
appended to the number, the file is split into byte_count megabyte pieces.
-l line_count
Create smaller files n lines in length.
-p pattern
The file is split whenever an input line matches pattern, which is interpreted as an extended regular expression. The matching line will be the
first line of the next output file. This option is incompatible with the -b and -l options.
If additional arguments are specified, the first is used as the name of the input file which is to be split. If a second additional argument is specified,
it is used as a prefix for the names of the files into which the file is split. In this case, each file into which the file is split is named by the prefix
followed by a lexically ordered suffix using suffix_length characters in the range ``a-z''. If -a is not specified, two letters are used as the suffix.
If the name argument is not specified, the file is split into lexically ordered files named with the prefix ``x'' and with suffixes as above.
顺便说一句,这是你的固定脚本:
#!/bin/sh
a=0
b=1
while read line; do
if [ $a -eq 0 ]; then
echo -n '' > out-file-${b}
fi
echo $line >> out-file-${b}
a=$(( $a + 1 ))
if [ $a -eq 10 ]; then
a=0
b=$(( $b + 1 ))
fi
done < in-file
使用 bash
和 dash
测试。
我正在尝试使用我在 Perl 中创建的一个非常大的 txt 文件(超过一百万行),并通过 Perl 中的不同语句 运行 它基本上看起来像这样(注意以下内容是 shell)
a=0
b=1
while read line;
do
echo -n "" > "Write file"${b}
a=($a + 1)
while ( $a <= 5000)
do
echo $line >> "Write file"${b}
a=($a + 1)
done
a=0
b=($b + 1)
done < "read file"
尝试将其大小缩小到每个文件 5k 行,并每次递增(filename1.txt、filename2.txt、filename3.txt 等)
这似乎在 shell 中不起作用,可能是由于输入文件的大小,对于我来说,我想不出如何更改我正在写入的文件循环..
您可以在 shell 中使用 split
执行此操作。
例如:
split -l 5000 filename.txt filename.txt.
会将 filename.txt
分成多个文件,每个文件最多 5,000 行。输出文件将是名称 filename.txt.aa
、filename.txt.ab
、filename.txt.ac
等
来自我的 man split
:
NAME
split -- split a file into pieces
SYNOPSIS
split [-a suffix_length] [-b byte_count[k|m]] [-l line_count] [-p pattern] [file [name]]
DESCRIPTION
The split utility reads the given file and breaks it up into files of 1000 lines each. If file is a single dash (`-') or absent, split reads from the stan-
dard input.
The options are as follows:
-a suffix_length
Use suffix_length letters to form the suffix of the file name.
-b byte_count[k|m]
Create smaller files byte_count bytes in length. If ``k'' is appended to the number, the file is split into byte_count kilobyte pieces. If ``m'' is
appended to the number, the file is split into byte_count megabyte pieces.
-l line_count
Create smaller files n lines in length.
-p pattern
The file is split whenever an input line matches pattern, which is interpreted as an extended regular expression. The matching line will be the
first line of the next output file. This option is incompatible with the -b and -l options.
If additional arguments are specified, the first is used as the name of the input file which is to be split. If a second additional argument is specified,
it is used as a prefix for the names of the files into which the file is split. In this case, each file into which the file is split is named by the prefix
followed by a lexically ordered suffix using suffix_length characters in the range ``a-z''. If -a is not specified, two letters are used as the suffix.
If the name argument is not specified, the file is split into lexically ordered files named with the prefix ``x'' and with suffixes as above.
顺便说一句,这是你的固定脚本:
#!/bin/sh
a=0
b=1
while read line; do
if [ $a -eq 0 ]; then
echo -n '' > out-file-${b}
fi
echo $line >> out-file-${b}
a=$(( $a + 1 ))
if [ $a -eq 10 ]; then
a=0
b=$(( $b + 1 ))
fi
done < in-file
使用 bash
和 dash
测试。