SAS - 导入不带分隔符的可变长度二进制记录
SAS - Importing variable length Binary records without delimiters
我有一个二进制数据集,没有定界符,也没有固定长度的记录。我知道每条记录包含 22 个字节的数据,然后是未知数量的 23 个字节块,最多 50 个块。问题是它只读取 1 行 32767 字节,总共 728 个 obs。我期待 2.7MM 输出 obs。我怎样才能让它读取输入文件到最后?我已经尝试在 infile 行中添加“OBS=”选项和“lrecl=”选项。添加“end=”选项对结果没有影响。
DATA INFILE.MYDATA (drop= i);
INFILE "&Path./UGLYDATA" end=eof;
INPUT
MY_KEY s370fPD9.
...
OCCURS s370fPD2.
@
;
ARRAY MyData{50} MyData1-MyData50;
...
ARRAY Filler{50} $ Filler1-Filler50;
DO I = 1 TO min(50,OCCURS);
INPUT
MyData{I} s370fPD4.
...
Filler{I} $ebcdic10.
@@
;
End;
RUN;
相关日志:
NOTE: 1 record was read from the infile "UGLYDATA".
The minimum record length was 32767.
The maximum record length was 32767.
One or more lines were truncated.
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set INFILE.MYDATA has 728 observations and 356 variables.
NOTE: Compressing data set INFILE.MYDATA decreased size by 47.06 percent.
Compressed is 9 pages; un-compressed would require 17 pages.
NOTE: DATA statement used (Total process time):
real time 2.69 seconds
user cpu time 0.02 seconds
system cpu time 0.11 seconds
memory 1890.40k
OS Memory 10408.00k
Timestamp 12/07/2021 05:17:34 PM
Step Count 1 Switch Count 0
Page Faults 3
Page Reclaims 1028
Page Swaps 0
Voluntary Context Switches 272
Involuntary Context Switches 1226
Block Input Operations 309648
Block Output Operations 2312
听起来文件不是由文本行组成的。因此,请尝试在您的 INFILE 语句中使用 RECFM=N,这样 SAS 就不会寻找 LINEFEED 字符(或 CARRIAGE RETURN 和 LINEFEED 组合)来标记行尾。
INFILE "&Path./UGLYDATA" recfm=n ;
如果您不确定该文件包含什么,只需 运行 一个简单的数据步骤,查看前几百个字节,然后找出答案。如果“行”中的任何字节不是可打印字符,LIST 命令将在写入 SAS 日志时包含行下字节的十六进制代码。
data _null_;
INFILE "&Path./UGLYDATA" recfm-=f lrecl=100 obs=10 ;
input;
list;
run;
根据@Tom,确实RECFM=N
。
示例:
创建并读回二进制文件。
filename foo '%temp%/foo.bin' recfm=n;
data _null_;
file foo;
call streaminit(2021);
filler = repeat('*', 10);
do recnum = 1001 to 1010;
put recnum s370fPD9. @;
put filler $char11. @;
occurs = rand('integer',1,26);
put occurs s370fPD2. @;
do z = 0 to occurs-1;
record = repeat(byte(rank('A')+z), 22);
put record $ebcdic23.;
end;
putlog 'NOTE: ' recnum= occurs=;
end;
stop;
run;
data want;
infile foo;
* read master;
input recnum s370fPD9. filler $char11. occurs s370fPD2.;
* read details;
do index = 1 to occurs;
input content $ebcdic23.;
output;
end;
run;
dm 'vt want';
我有一个二进制数据集,没有定界符,也没有固定长度的记录。我知道每条记录包含 22 个字节的数据,然后是未知数量的 23 个字节块,最多 50 个块。问题是它只读取 1 行 32767 字节,总共 728 个 obs。我期待 2.7MM 输出 obs。我怎样才能让它读取输入文件到最后?我已经尝试在 infile 行中添加“OBS=”选项和“lrecl=”选项。添加“end=”选项对结果没有影响。
DATA INFILE.MYDATA (drop= i);
INFILE "&Path./UGLYDATA" end=eof;
INPUT
MY_KEY s370fPD9.
...
OCCURS s370fPD2.
@
;
ARRAY MyData{50} MyData1-MyData50;
...
ARRAY Filler{50} $ Filler1-Filler50;
DO I = 1 TO min(50,OCCURS);
INPUT
MyData{I} s370fPD4.
...
Filler{I} $ebcdic10.
@@
;
End;
RUN;
相关日志:
NOTE: 1 record was read from the infile "UGLYDATA".
The minimum record length was 32767.
The maximum record length was 32767.
One or more lines were truncated.
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set INFILE.MYDATA has 728 observations and 356 variables.
NOTE: Compressing data set INFILE.MYDATA decreased size by 47.06 percent.
Compressed is 9 pages; un-compressed would require 17 pages.
NOTE: DATA statement used (Total process time):
real time 2.69 seconds
user cpu time 0.02 seconds
system cpu time 0.11 seconds
memory 1890.40k
OS Memory 10408.00k
Timestamp 12/07/2021 05:17:34 PM
Step Count 1 Switch Count 0
Page Faults 3
Page Reclaims 1028
Page Swaps 0
Voluntary Context Switches 272
Involuntary Context Switches 1226
Block Input Operations 309648
Block Output Operations 2312
听起来文件不是由文本行组成的。因此,请尝试在您的 INFILE 语句中使用 RECFM=N,这样 SAS 就不会寻找 LINEFEED 字符(或 CARRIAGE RETURN 和 LINEFEED 组合)来标记行尾。
INFILE "&Path./UGLYDATA" recfm=n ;
如果您不确定该文件包含什么,只需 运行 一个简单的数据步骤,查看前几百个字节,然后找出答案。如果“行”中的任何字节不是可打印字符,LIST 命令将在写入 SAS 日志时包含行下字节的十六进制代码。
data _null_;
INFILE "&Path./UGLYDATA" recfm-=f lrecl=100 obs=10 ;
input;
list;
run;
根据@Tom,确实RECFM=N
。
示例:
创建并读回二进制文件。
filename foo '%temp%/foo.bin' recfm=n;
data _null_;
file foo;
call streaminit(2021);
filler = repeat('*', 10);
do recnum = 1001 to 1010;
put recnum s370fPD9. @;
put filler $char11. @;
occurs = rand('integer',1,26);
put occurs s370fPD2. @;
do z = 0 to occurs-1;
record = repeat(byte(rank('A')+z), 22);
put record $ebcdic23.;
end;
putlog 'NOTE: ' recnum= occurs=;
end;
stop;
run;
data want;
infile foo;
* read master;
input recnum s370fPD9. filler $char11. occurs s370fPD2.;
* read details;
do index = 1 to occurs;
input content $ebcdic23.;
output;
end;
run;
dm 'vt want';