如何有效地读取ocaml中的一行整数

How to efficiently read a line of integers in ocaml

我想高效地阅读 stdin 中的一大行(~4000 个字符)。我也必须阅读 ~4000 行。

行格式如下:

INTEGERwhitespaceINTEGERwhitespace.... 例如,100 33 22 19 485 3601...

之后需要处理数据,所以我用的初始方案read_line() |> String.split_on_char ' ' |> ... 太慢了O(3n)。

我想使用像 Scanf 这样的东西:

bscanf ic "%d" int_of_string

但我不确定如何解释空格,或者它是否足够快。有解决办法吗?

我创建了一个包含 10000 行 4000 个随机整数的文件。

然后我写了这4个主函数(read_int是一个辅助函数),输出都是一样的:

let read_int ic =
  let rec aux acc =
    match input_char ic with
    | ' ' | '\n' -> acc
    | c -> aux ((10 * acc) + (Char.code c - 48))
  in
  aux 0

let read_test_int () =
  let ic = open_in "test" in
  let max = ref 0 in
  try
    while true do
      read_int ic |> fun e -> if e > !max then max := e
    done
  with End_of_file ->
    close_in ic;
    Format.eprintf "%d@." !max

let read_test_line () =
  let ic = open_in "test" in
  let max = ref 0 in
  try
    while true do
      input_line ic |> String.split_on_char ' '
      |> List.iter (fun e ->
             let e = int_of_string e in
             if e > !max then max := e)
    done
  with End_of_file ->
    close_in ic;
    Format.eprintf "%d@." !max

let read_test_line_map () =
  let ic = open_in "test" in
  let max = ref 0 in
  try
    while true do
      input_line ic |> String.split_on_char ' ' |> List.map int_of_string
      |> List.iter (fun e -> if e > !max then max := e)
    done
  with End_of_file ->
    close_in ic;
    Format.eprintf "%d@." !max

let read_test_scanf () =
  let ic = Scanf.Scanning.open_in "test" in
  let max = ref 0 in
  try
    while true do
      Scanf.bscanf ic "%d " (fun i -> i) |> fun e -> if e > !max then max := e
    done
  with End_of_file ->
    Scanf.Scanning.close_in ic;
    Format.eprintf "%d@." !max
  1. read_test_int通过逐个读取字符创建一个整数
  2. read_test_line 是你的初始解
  3. read_test_line_map 是您的初始解决方案,具有从字符串到 int
  4. 的映射
  5. read_test_scanf 是您要测试的解决方案

然后我用 hyperfine 测试了其中的四个,这里是输出:

hyperfine --warmup 3 -P arg 1 4 'dune exec program -- {arg}'

  1. read_int
Benchmark #1: dune exec program -- 1
  Time (mean ± σ):      1.509 s ±  0.072 s    [User: 1.460 s, System: 0.049 s]
  Range (min … max):    1.436 s …  1.618 s    10 runs
  1. read_line
Benchmark #2: dune exec program -- 2
  Time (mean ± σ):      1.818 s ±  0.016 s    [User: 1.717 s, System: 0.100 s]
  Range (min … max):    1.794 s …  1.853 s    10 runs
  1. read_line_map
Benchmark #4: dune exec program -- 4
  Time (mean ± σ):      2.158 s ±  0.127 s    [User: 2.108 s, System: 0.050 s]
  Range (min … max):    2.054 s …  2.482 s    10 runs
  1. read_scanf
Benchmark #3: dune exec program -- 3
  Time (mean ± σ):      5.017 s ±  0.103 s    [User: 4.957 s, System: 0.060 s]
  Range (min … max):    4.893 s …  5.199 s    10 runs

看起来我自己的 read_int 实现更好,而 input_line 稍微差一点,因为您首先创建一个字符串,然后遍历一次以拆分它,然后遍历列表读取整数。可悲的是,scanf 总是最糟糕的。使用这些类型的值(10000 行,4000 个整数),差异开始可见,对于 4000 行 4000 个字符,我找不到任何真正的差异。

Hyperline 总结如下:

Summary
  'dune exec program -- 1' ran
    1.20 ± 0.06 times faster than 'dune exec program -- 2'
    1.43 ± 0.11 times faster than 'dune exec program -- 4'
    3.33 ± 0.17 times faster than 'dune exec program -- 3'

[编辑]

我使用 OCamllex 创建了两个新长凳:

  1. lexer.mll
let digit = ['0'-'9']

rule integers = parse
  | ' ' | '\n'        { integers lexbuf }
  | digit+ as inum { int_of_string inum }
  | _           { failwith "not a digit or a space" }
  | eof         { raise End_of_file }

  1. lexer_list.mll
{ let l = ref [] }

let digit = ['0'-'9']

rule integers = parse
  | ' ' | '\n'        { integers lexbuf }
  | digit+ as inum { l := int_of_string inum :: !l; integers lexbuf }
  | _           { failwith "not a digit or a space" }
  | eof         { !l }

重新运行基准测试结果如下:

❯ hyperfine --warmup 3 -P arg 1 6 'dune exec program -- {arg}'
Benchmark #1: dune exec program -- 1
  Time (mean ± σ):      1.394 s ±  0.044 s    [User: 1.358 s, System: 0.036 s]
  Range (min … max):    1.360 s …  1.483 s    10 runs

Benchmark #2: dune exec program -- 2
  Time (mean ± σ):      1.674 s ±  0.011 s    [User: 1.590 s, System: 0.084 s]
  Range (min … max):    1.657 s …  1.692 s    10 runs

Benchmark #3: dune exec program -- 3
  Time (mean ± σ):      4.886 s ±  0.304 s    [User: 4.847 s, System: 0.037 s]
  Range (min … max):    4.627 s …  5.460 s    10 runs

Benchmark #4: dune exec program -- 4
  Time (mean ± σ):      1.949 s ±  0.023 s    [User: 1.908 s, System: 0.041 s]
  Range (min … max):    1.925 s …  1.984 s    10 runs

Benchmark #5: dune exec program -- 5
  Time (mean ± σ):      2.824 s ±  0.013 s    [User: 2.784 s, System: 0.039 s]
  Range (min … max):    2.798 s …  2.843 s    10 runs

Benchmark #6: dune exec program -- 6
  Time (mean ± σ):      5.832 s ±  0.074 s    [User: 5.493 s, System: 0.333 s]
  Range (min … max):    5.742 s …  5.981 s    10 runs

Summary
  'dune exec program -- 1' ran
    1.20 ± 0.04 times faster than 'dune exec program -- 2'
    1.40 ± 0.05 times faster than 'dune exec program -- 4'
    2.03 ± 0.07 times faster than 'dune exec program -- 5'
    3.51 ± 0.24 times faster than 'dune exec program -- 3'
    4.18 ± 0.14 times faster than 'dune exec program -- 6'

在遍历它之前创建一个列表是最糟糕的解决方案(想象一下,甚至比 scanf 更糟糕!)但是词法分析还不错(但也不是那么好)

所以,总而言之,从最好到最差的解决方案是:

  1. 自定义读取整数
  2. 读取行
  3. 按 int 对 int 进行词法分析
  4. 读取带映射的行
  5. 扫描
  6. 将整个文件词法化为一个 int 列表

[使用 memtrace 进行基准测试]

顺便说一句,这让我意识到了一些事情,以防你读到这篇文章:

  • 如果您尝试对您的解决方案进行基准测试,请不要在您的代码中使用 memtrace。我正在尝试一些东西,并且在我的入口点开始时有 Memtrace.trace_if_requested ();。好吧,它把一切都搞砸了,长凳完全错了:
❯ hyperfine --warmup 3 -P arg 1 6 'dune exec program -- {arg}'
Benchmark #1: dune exec program -- 1
  Time (mean ± σ):      7.003 s ±  0.201 s    [User: 6.959 s, System: 0.043 s]
  Range (min … max):    6.833 s …  7.420 s    10 runs

Benchmark #2: dune exec program -- 2
  Time (mean ± σ):      1.801 s ±  0.060 s    [User: 1.697 s, System: 0.104 s]
  Range (min … max):    1.729 s …  1.883 s    10 runs

Benchmark #3: dune exec program -- 3
  Time (mean ± σ):      4.817 s ±  0.120 s    [User: 4.757 s, System: 0.058 s]
  Range (min … max):    4.679 s …  5.068 s    10 runs

Benchmark #4: dune exec program -- 4
  Time (mean ± σ):      2.028 s ±  0.023 s    [User: 1.994 s, System: 0.032 s]
  Range (min … max):    1.993 s …  2.071 s    10 runs

Benchmark #5: dune exec program -- 5
  Time (mean ± σ):      2.997 s ±  0.108 s    [User: 2.948 s, System: 0.046 s]
  Range (min … max):    2.889 s …  3.191 s    10 runs

Benchmark #6: dune exec program -- 6
  Time (mean ± σ):      6.109 s ±  0.161 s    [User: 5.753 s, System: 0.349 s]
  Range (min … max):    5.859 s …  6.322 s    10 runs

Summary
  'dune exec program -- 2' ran
    1.13 ± 0.04 times faster than 'dune exec program -- 4'
    1.66 ± 0.08 times faster than 'dune exec program -- 5'
    2.67 ± 0.11 times faster than 'dune exec program -- 3'
    3.39 ± 0.14 times faster than 'dune exec program -- 6'
    3.89 ± 0.17 times faster than 'dune exec program -- 1'

我的理解是,memtrace 能够在我的自定义解决方案上做很多工作,因为整个代码都是直接可用的,而对于其余代码,它只能触及表面(我可能完全错了,但它花了我一些时间是时候弄清楚 memtrace 破坏了我的基准测试)


[关注@ivg 的评论]

  1. lexer_parser.mll
{
   open Parser
}

let digit = ['0'-'9']

rule integers = parse
  | ' ' | '\n'        { integers lexbuf }
  | digit+ as inum    { INT (int_of_string inum) }
  | _                 { failwith "not a digit or a space" }
  | eof               { raise End_of_file }

parser.mly

%token <int> INT
%start main             /* the entry point */
%type <int> main
%%
main:
| INT {  }
;

并在 main.ml

let read_test_lexer_parser () =
  let ic = open_in "test" in
  let lexbuf = Lexing.from_channel ic in
  let max = ref 0 in
  try
    while true do
      let result = Parser.main Lexer_parser.integers lexbuf in
      if result > !max then max := result
    done
  with End_of_file ->
    close_in ic;
    Format.eprintf "%d@." !max

(我剪了一些板凳)

❯ hyperfine --warmup 3 -P arg 1 7 'dune exec program -- {arg}'
Benchmark #1: dune exec program -- 1
  Time (mean ± σ):      1.357 s ±  0.030 s    [User: 1.316 s, System: 0.041 s]
  Range (min … max):    1.333 s …  1.431 s    10 runs

Benchmark #6: dune exec program -- 6
  Time (mean ± σ):      5.745 s ±  0.289 s    [User: 5.230 s, System: 0.513 s]
  Range (min … max):    5.549 s …  6.374 s    10 runs

Benchmark #7: dune exec program -- 7
  Time (mean ± σ):      7.195 s ±  0.049 s    [User: 7.161 s, System: 0.034 s]
  Range (min … max):    7.148 s …  7.300 s    10 runs

Summary
  'dune exec program -- 1' ran
    4.23 ± 0.23 times faster than 'dune exec program -- 6'
    5.30 ± 0.12 times faster than 'dune exec program -- 7'

我可能没有正确完成,因此结果很差,但这似乎不太乐观。我这样做的方式是,我想在读取值后立即获取值以处理它,否则我将不得不创建一个值列表,这会更糟(相信我,我试过了,花了 30找到最大值的秒数)。

我的沙丘文件,如果你想知道的话,看起来像这样(我有一个空的 program.opam 文件来取悦沙丘):

(executable
 (name main)
 (public_name program)
)

(ocamllex lexer)
(ocamllex lexer_list)
(ocamllex lexer_parser)
(ocamlyacc parser)