SAS - 导入不带分隔符的可变长度二进制记录

SAS - Importing variable length Binary records without delimiters

我有一个二进制数据集,没有定界符,也没有固定长度的记录。我知道每条记录包含 22 个字节的数据,然后是未知数量的 23 个字节块,最多 50 个块。问题是它只读取 1 行 32767 字节,总共 728 个 obs。我期待 2.7MM 输出 obs。我怎样才能让它读取输入文件到最后?我已经尝试在 infile 行中添加“OBS=”选项和“lrecl=”选项。添加“end=”选项对结果没有影响。

DATA INFILE.MYDATA (drop= i);
INFILE "&Path./UGLYDATA" end=eof; 
INPUT
MY_KEY s370fPD9.
...
OCCURS s370fPD2.
@
;    
ARRAY   MyData{50}  MyData1-MyData50;
...
ARRAY   Filler{50} $ Filler1-Filler50;

DO I = 1 TO min(50,OCCURS);
INPUT
MyData{I}   s370fPD4.
...
Filler{I}   $ebcdic10.
@@
;
End;
RUN;

相关日志:

NOTE: 1 record was read from the infile "UGLYDATA".
      The minimum record length was 32767.
      The maximum record length was 32767.
      One or more lines were truncated.
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set INFILE.MYDATA has 728 observations and 356 variables.
NOTE: Compressing data set INFILE.MYDATA decreased size by 47.06 percent. 
      Compressed is 9 pages; un-compressed would require 17 pages.
NOTE: DATA statement used (Total process time):
      real time           2.69 seconds
      user cpu time       0.02 seconds
      system cpu time     0.11 seconds
      memory              1890.40k
      OS Memory           10408.00k
      Timestamp           12/07/2021 05:17:34 PM
      Step Count                        1  Switch Count  0
      Page Faults                       3
      Page Reclaims                     1028
      Page Swaps                        0
      Voluntary Context Switches        272
      Involuntary Context Switches      1226
      Block Input Operations            309648
      Block Output Operations           2312

听起来文件不是由文本行组成的。因此,请尝试在您的 INFILE 语句中使用 RECFM=N,这样 SAS 就不会寻找 LINEFEED 字符(或 CARRIAGE RETURN 和 LINEFEED 组合)来标记行尾。

INFILE "&Path./UGLYDATA" recfm=n ; 

如果您不确定该文件包含什么,只需 运行 一个简单的数据步骤,查看前几百个字节,然后找出答案。如果“行”中的任何字节不是可打印字符,LIST 命令将在写入 SAS 日志时包含行下字节的十六进制代码。

data _null_;
  INFILE "&Path./UGLYDATA" recfm-=f lrecl=100 obs=10 ;
  input;
  list;
run;

根据@Tom,确实RECFM=N

示例:

创建并读回二进制文件。

filename foo '%temp%/foo.bin' recfm=n;

data _null_;
  file foo;

  call streaminit(2021);

  filler = repeat('*', 10);

  do recnum = 1001 to 1010;
    put recnum s370fPD9. @;
    put filler $char11. @;

    occurs = rand('integer',1,26);
    put occurs s370fPD2. @; 

    do z = 0 to occurs-1;

      record = repeat(byte(rank('A')+z), 22);
      put record $ebcdic23.;

    end;

    putlog 'NOTE: ' recnum= occurs=;
  end;
  stop;
run;

data want;
  infile foo;

  * read master;
  input recnum s370fPD9. filler $char11. occurs s370fPD2.;

  * read details;
  do index = 1 to occurs;
    input content $ebcdic23.;
    output;
  end;
run;

dm 'vt want';