为什么在无类型模块中使用来自 typed/racket 模块的 class 会产生糟糕的性能?
Why using a class from a typed/racket module in a untyped one yields bad performance?
See EDIT 1, 2, and 3 for updates. I leave here the complete research process.
我知道我们可以使用来自 untyped racket 的 typed/racket
模块(反之亦然)。但是这样做时,typed/racket
模块的行为就好像它是 typed/racket/no-check
一样,这会禁用优化并仅将其用作普通的非类型化模块。
例如,如果您有这样的 typed/racket
模块:
#lang typed/racket
(require math)
(provide hello)
(define (hello [str : String])
(define result : (Matrix Flonum) (do-some-crazy-matrix-operations))
(display (format "Hello ~a! Result is ~a" str result)))
并且您想在这样的无类型程序中使用它:
#lang racket/base
(require "hello-matrix.rkt")
(hello "Alan Turing")
你会得到非常糟糕的性能结果(在我的例子中,我做了大约 600000 次矩阵乘法,程序甚至没有完成),而使用 #lang typed/racket
使我的程序在 3 秒内完成.
缺点是我的非类型化代码会感染类型,迫使我在 TR 中编写所有程序,很快就会让我发疯。
但我的救星并不遥远。我偶然发现了 Jay McCarthy 在一个多云的黑夜写的一个有趣的类似愚人节的包,叫做 live-free-or-die
,它几乎是这样做的:
http://docs.racket-lang.org/live-free-or-die/index.html
#lang racket/base
(require (for-syntax racket/base
typed-racket/utils/tc-utils))
(define-syntax (live-free-or-die! stx)
(syntax-case stx ()
[(_)
(syntax/loc stx
(begin-for-syntax
(set-box! typed-context? #t)))]))
(provide live-free-or-die!
(rename-out [live-free-or-die!
Doctor-Tobin-Hochstadt:Tear-down-this-wall!]))
通过在我的 typed/racket
模块中使用它,像这样:
#lang racket
(require live-free-or-die)
(live-free-or-die!)
(require math)
(provide hello)
(define (hello str)
(define result (do-some-crazy-matrix-operations))
(display (format "Hello ~a! Result is ~a" str result)))
现在我的模块不再是 #lang typed/racket
,但是 运行 的结果非常壮观!它在 3 秒内运行,就好像它是一个 typed/racket
模块一样。
当然,我很反感那个 hack,这就是为什么我想知道是否有更好的解决方案,特别是让 math
中的矩阵运算可用。
关于 Jay 写的那个疯狂模块的 Google 小组讨论是我能得到的唯一信息。
https://groups.google.com/forum/#!topic/racket-users/JZoHYxwwJqU
此线程中的人似乎说该模块不再有用:
Matthias Felleisen
Well, now that our youngsters have easily debunked the package, we can let it die because it no longer wants to live.
真的有更好的选择吗?
编辑 1 - 一个可测试的例子
如果您想测试性能差异,请尝试使用 do-some-crazy-matrix-operations
的定义:
#lang typed/racket
(require math)
(provide hello)
(: do-some-crazy-matrix-operations : (-> (Matrix Flonum)))
(define (do-some-crazy-matrix-operations)
(define m1 : (Matrix Flonum) (build-matrix 5 5 (lambda (x y) (add1 (random)))))
(define m2 : (Matrix Flonum) (build-matrix 5 5 (lambda (x y) (add1 (random)))))
(for ([i 60000])
(set! m1 (matrix-map * m1 m2))
(set! m2 (matrix-map * m1 m2)))
(matrix+ m1 m2))
(define (hello [str : String])
(define result : (Matrix Flonum) (do-some-crazy-matrix-operations))
(display (format "Hello ~a! Result is ~a" str result)))
(time (hello "Alan Turing"))
使用 #lang typed/racket
它在 288 毫秒内运行:
cpu time: 288 real time: 286 gc time: 16
使用 #lang typed/racket/no-check
它在 52 秒内运行:
cpu time: 52496 real time: 52479 gc time: 396
使用 #lang racket
和 live-free-or-die
它在 280 毫秒内运行:
cpu time: 280 real time: 279 gc time: 4
编辑 2 - 这不是问题所在!
根据 John Clement 的回答,我发现示例不足以重现实际问题。 在无类型模块中使用 typed/racket
模块实际上工作正常。
我真正的问题是 边界合同 的问题,该合同由 class 创建,从无类型到有类型的球拍。
让我们考虑一下 hello-matrix.rkt
的实现:
#lang typed/racket
(require math)
(provide hello crazy% Crazy)
(define-type CrazyClass (Class (field [m1 (Matrix Flonum)])
(field [m2 (Matrix Flonum)])
(do (-> (Matrix Flonum)))))
(define-type Crazy (Instance CrazyClass))
(: crazy% CrazyClass)
(define crazy%
(class object%
(field [m1 (build-matrix 5 5 (lambda (x y) (add1 (random))))]
[m2 (build-matrix 5 5 (lambda (x y) (add1 (random))))])
(super-new)
(define/public (do)
(set! m1 (matrix* (matrix-transpose m1) m2))
(set! m2 (matrix* (matrix-transpose m1) m2))
(matrix+ m1 m2))))
(: do-some-crazy-matrix-operations : Crazy -> (Matrix Flonum))
(define (do-some-crazy-matrix-operations crazy)
(for ([i 60000])
(send crazy do))
(matrix+ (get-field m1 crazy) (get-field m2 crazy)))
(define (hello [str : String] [crazy : Crazy])
(define result : (Matrix Flonum) (do-some-crazy-matrix-operations crazy))
(display (format "Hello ~a! Result is ~a\n" str result)))
那么那两个用法:
#lang typed/racket
(require "hello-matrix.rkt")
(define crazy : Crazy (new crazy%))
(time (hello "Alan Turing" crazy))
cpu time: 1160 real time: 1178 gc time: 68
#lang racket
(require "hello-matrix.rkt")
(define crazy (new crazy%))
(time (hello "Alan Turing" crazy))
cpu time: 7432 real time: 7433 gc time: 80
使用contract-profile
:
Running time is 83.47% contracts
6320/7572 ms
BY CONTRACT
g66 @ #(struct:srcloc hello-matrix.rkt 3 15 50 6)
3258 ms
(-> String (object/c (do (-> any/c (struct/c Array (vectorof Index) Index (box/c (or/c #f #t)) (-> Void) (-> (vectorof Index) Float)))) (field (m1 (struct/c Array (vectorof Index) Index (box/c (or/c #f #t)) (-> Void) (-> (vectorof Index) Float))) (m2 (struct/c Array (vectorof Index) Index (box/c (or/c #f #t)) (-> Void) (-> (vectorof Index) Float))))) any) @ #(struct:srcloc hello-matrix.rkt 3 9 44 5)
3062 ms
编辑 3 - 将 struct
从有类型传递到无类型比传递 class
性能更高
使用结构而不是 class 解决这个问题:
你好-matrix.rkt:
#lang typed/racket
(require math)
(provide hello (struct-out crazy))
(struct crazy ([m1 : (Matrix Flonum)] [m2 : (Matrix Flonum)]) #:mutable)
(define-type Crazy crazy)
(define (crazy-do [my-crazy : Crazy])
(set-crazy-m1! my-crazy (matrix* (matrix-transpose (crazy-m1 my-crazy))
(crazy-m2 my-crazy)))
(set-crazy-m2! my-crazy (matrix* (matrix-transpose (crazy-m1 my-crazy))
(crazy-m2 my-crazy)))
(matrix+ (crazy-m1 my-crazy) (crazy-m2 my-crazy)))
(: do-some-crazy-matrix-operations : Crazy -> (Matrix Flonum))
(define (do-some-crazy-matrix-operations my-crazy)
(for ([i 60000])
(crazy-do my-crazy))
(matrix+ (crazy-m1 my-crazy) (crazy-m2 my-crazy)))
(define (hello [str : String] [my-crazy : Crazy])
(define result : (Matrix Flonum) (do-some-crazy-matrix-operations my-crazy))
(display (format "Hello ~a! Result is ~a\n" str result)))
用法:
#lang typed/racket
(require "hello-matrix.rkt")
(require math)
(define my-crazy (crazy (build-matrix 5 5 (lambda (x y) (add1 (random))))
(build-matrix 5 5 (lambda (x y) (add1 (random))))))
(time (hello "Alan Turing" my-crazy))
cpu time: 1008 real time: 1008 gc time: 52
#lang racket
cpu time: 996 real time: 995 gc time: 52
我把它写成 "answer" 是为了让我格式化我的代码...我想我们谈得有些过头了。具体来说,我可以在大约半秒内 运行 从非类型化模块中输入您的类型代码。我按照您的建议将您的类型化代码文件命名为 "hello-matrix.rkt",然后将您提供的非类型化模块命名为 运行
(需要 TR 模块的那个)并且花费了相同的时间(大约半秒)。让我小心地说:
"hello-matrix.rkt" 的内容:
#lang typed/racket
(require math)
(provide hello)
(: do-some-crazy-matrix-operations : (-> (Matrix Flonum)))
(define (do-some-crazy-matrix-operations)
(define m1 : (Matrix Flonum) (build-matrix 5 5 (lambda (x y) (add1 (random)))))
(define m2 : (Matrix Flonum) (build-matrix 5 5 (lambda (x y) (add1 (random)))))
(for ([i 60000])
(set! m1 (matrix-map * m1 m2))
(set! m2 (matrix-map * m1 m2)))
(matrix+ m1 m2))
(define (hello [str : String])
(define result : (Matrix Flonum) (do-some-crazy-matrix-operations))
(display (format "Hello ~a! Result is ~a" str result)))
(time (hello "Alan Turing"))
然后,我从 untyped 模块调用它,就像你说的:
#lang racket/base
(require "hello-matrix.rkt")
(time (hello "Alan Turing"))
结果如下:
Hello Alan Turing! Result is (array #[#[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0]])
cpu time: 719 real time: 710 gc time: 231
Hello Alan Turing! Result is (array #[#[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0]])
cpu time: 689 real time: 681 gc time: 184
也就是说,从非类型球拍调用它所花费的时间与从类型球拍调用它所花费的时间相同。
此结果可能在一定程度上取决于您使用的 DrRacket 版本;我正在使用 6.11.
所有这些都是为了证明 TR 代码仍然是 TR 代码,即使您从非类型化代码中调用它也是如此。我确实相信您遇到了性能问题,并且我确实相信它们与矩阵运算有关,但这个特定示例并未说明它们。
See EDIT 1, 2, and 3 for updates. I leave here the complete research process.
我知道我们可以使用来自 untyped racket 的 typed/racket
模块(反之亦然)。但是这样做时,typed/racket
模块的行为就好像它是 typed/racket/no-check
一样,这会禁用优化并仅将其用作普通的非类型化模块。
例如,如果您有这样的 typed/racket
模块:
#lang typed/racket
(require math)
(provide hello)
(define (hello [str : String])
(define result : (Matrix Flonum) (do-some-crazy-matrix-operations))
(display (format "Hello ~a! Result is ~a" str result)))
并且您想在这样的无类型程序中使用它:
#lang racket/base
(require "hello-matrix.rkt")
(hello "Alan Turing")
你会得到非常糟糕的性能结果(在我的例子中,我做了大约 600000 次矩阵乘法,程序甚至没有完成),而使用 #lang typed/racket
使我的程序在 3 秒内完成.
缺点是我的非类型化代码会感染类型,迫使我在 TR 中编写所有程序,很快就会让我发疯。
但我的救星并不遥远。我偶然发现了 Jay McCarthy 在一个多云的黑夜写的一个有趣的类似愚人节的包,叫做 live-free-or-die
,它几乎是这样做的:
http://docs.racket-lang.org/live-free-or-die/index.html
#lang racket/base
(require (for-syntax racket/base
typed-racket/utils/tc-utils))
(define-syntax (live-free-or-die! stx)
(syntax-case stx ()
[(_)
(syntax/loc stx
(begin-for-syntax
(set-box! typed-context? #t)))]))
(provide live-free-or-die!
(rename-out [live-free-or-die!
Doctor-Tobin-Hochstadt:Tear-down-this-wall!]))
通过在我的 typed/racket
模块中使用它,像这样:
#lang racket
(require live-free-or-die)
(live-free-or-die!)
(require math)
(provide hello)
(define (hello str)
(define result (do-some-crazy-matrix-operations))
(display (format "Hello ~a! Result is ~a" str result)))
现在我的模块不再是 #lang typed/racket
,但是 运行 的结果非常壮观!它在 3 秒内运行,就好像它是一个 typed/racket
模块一样。
当然,我很反感那个 hack,这就是为什么我想知道是否有更好的解决方案,特别是让 math
中的矩阵运算可用。
关于 Jay 写的那个疯狂模块的 Google 小组讨论是我能得到的唯一信息。
https://groups.google.com/forum/#!topic/racket-users/JZoHYxwwJqU
此线程中的人似乎说该模块不再有用:
Matthias Felleisen
Well, now that our youngsters have easily debunked the package, we can let it die because it no longer wants to live.
真的有更好的选择吗?
编辑 1 - 一个可测试的例子
如果您想测试性能差异,请尝试使用 do-some-crazy-matrix-operations
的定义:
#lang typed/racket
(require math)
(provide hello)
(: do-some-crazy-matrix-operations : (-> (Matrix Flonum)))
(define (do-some-crazy-matrix-operations)
(define m1 : (Matrix Flonum) (build-matrix 5 5 (lambda (x y) (add1 (random)))))
(define m2 : (Matrix Flonum) (build-matrix 5 5 (lambda (x y) (add1 (random)))))
(for ([i 60000])
(set! m1 (matrix-map * m1 m2))
(set! m2 (matrix-map * m1 m2)))
(matrix+ m1 m2))
(define (hello [str : String])
(define result : (Matrix Flonum) (do-some-crazy-matrix-operations))
(display (format "Hello ~a! Result is ~a" str result)))
(time (hello "Alan Turing"))
使用 #lang typed/racket
它在 288 毫秒内运行:
cpu time: 288 real time: 286 gc time: 16
使用 #lang typed/racket/no-check
它在 52 秒内运行:
cpu time: 52496 real time: 52479 gc time: 396
使用 #lang racket
和 live-free-or-die
它在 280 毫秒内运行:
cpu time: 280 real time: 279 gc time: 4
编辑 2 - 这不是问题所在!
根据 John Clement 的回答,我发现示例不足以重现实际问题。 在无类型模块中使用 typed/racket
模块实际上工作正常。
我真正的问题是 边界合同 的问题,该合同由 class 创建,从无类型到有类型的球拍。
让我们考虑一下 hello-matrix.rkt
的实现:
#lang typed/racket
(require math)
(provide hello crazy% Crazy)
(define-type CrazyClass (Class (field [m1 (Matrix Flonum)])
(field [m2 (Matrix Flonum)])
(do (-> (Matrix Flonum)))))
(define-type Crazy (Instance CrazyClass))
(: crazy% CrazyClass)
(define crazy%
(class object%
(field [m1 (build-matrix 5 5 (lambda (x y) (add1 (random))))]
[m2 (build-matrix 5 5 (lambda (x y) (add1 (random))))])
(super-new)
(define/public (do)
(set! m1 (matrix* (matrix-transpose m1) m2))
(set! m2 (matrix* (matrix-transpose m1) m2))
(matrix+ m1 m2))))
(: do-some-crazy-matrix-operations : Crazy -> (Matrix Flonum))
(define (do-some-crazy-matrix-operations crazy)
(for ([i 60000])
(send crazy do))
(matrix+ (get-field m1 crazy) (get-field m2 crazy)))
(define (hello [str : String] [crazy : Crazy])
(define result : (Matrix Flonum) (do-some-crazy-matrix-operations crazy))
(display (format "Hello ~a! Result is ~a\n" str result)))
那么那两个用法:
#lang typed/racket
(require "hello-matrix.rkt")
(define crazy : Crazy (new crazy%))
(time (hello "Alan Turing" crazy))
cpu time: 1160 real time: 1178 gc time: 68
#lang racket
(require "hello-matrix.rkt")
(define crazy (new crazy%))
(time (hello "Alan Turing" crazy))
cpu time: 7432 real time: 7433 gc time: 80
使用contract-profile
:
Running time is 83.47% contracts
6320/7572 ms
BY CONTRACT
g66 @ #(struct:srcloc hello-matrix.rkt 3 15 50 6)
3258 ms
(-> String (object/c (do (-> any/c (struct/c Array (vectorof Index) Index (box/c (or/c #f #t)) (-> Void) (-> (vectorof Index) Float)))) (field (m1 (struct/c Array (vectorof Index) Index (box/c (or/c #f #t)) (-> Void) (-> (vectorof Index) Float))) (m2 (struct/c Array (vectorof Index) Index (box/c (or/c #f #t)) (-> Void) (-> (vectorof Index) Float))))) any) @ #(struct:srcloc hello-matrix.rkt 3 9 44 5)
3062 ms
编辑 3 - 将 struct
从有类型传递到无类型比传递 class
性能更高
使用结构而不是 class 解决这个问题:
你好-matrix.rkt:
#lang typed/racket
(require math)
(provide hello (struct-out crazy))
(struct crazy ([m1 : (Matrix Flonum)] [m2 : (Matrix Flonum)]) #:mutable)
(define-type Crazy crazy)
(define (crazy-do [my-crazy : Crazy])
(set-crazy-m1! my-crazy (matrix* (matrix-transpose (crazy-m1 my-crazy))
(crazy-m2 my-crazy)))
(set-crazy-m2! my-crazy (matrix* (matrix-transpose (crazy-m1 my-crazy))
(crazy-m2 my-crazy)))
(matrix+ (crazy-m1 my-crazy) (crazy-m2 my-crazy)))
(: do-some-crazy-matrix-operations : Crazy -> (Matrix Flonum))
(define (do-some-crazy-matrix-operations my-crazy)
(for ([i 60000])
(crazy-do my-crazy))
(matrix+ (crazy-m1 my-crazy) (crazy-m2 my-crazy)))
(define (hello [str : String] [my-crazy : Crazy])
(define result : (Matrix Flonum) (do-some-crazy-matrix-operations my-crazy))
(display (format "Hello ~a! Result is ~a\n" str result)))
用法:
#lang typed/racket
(require "hello-matrix.rkt")
(require math)
(define my-crazy (crazy (build-matrix 5 5 (lambda (x y) (add1 (random))))
(build-matrix 5 5 (lambda (x y) (add1 (random))))))
(time (hello "Alan Turing" my-crazy))
cpu time: 1008 real time: 1008 gc time: 52
#lang racket
cpu time: 996 real time: 995 gc time: 52
我把它写成 "answer" 是为了让我格式化我的代码...我想我们谈得有些过头了。具体来说,我可以在大约半秒内 运行 从非类型化模块中输入您的类型代码。我按照您的建议将您的类型化代码文件命名为 "hello-matrix.rkt",然后将您提供的非类型化模块命名为 运行 (需要 TR 模块的那个)并且花费了相同的时间(大约半秒)。让我小心地说:
"hello-matrix.rkt" 的内容:
#lang typed/racket
(require math)
(provide hello)
(: do-some-crazy-matrix-operations : (-> (Matrix Flonum)))
(define (do-some-crazy-matrix-operations)
(define m1 : (Matrix Flonum) (build-matrix 5 5 (lambda (x y) (add1 (random)))))
(define m2 : (Matrix Flonum) (build-matrix 5 5 (lambda (x y) (add1 (random)))))
(for ([i 60000])
(set! m1 (matrix-map * m1 m2))
(set! m2 (matrix-map * m1 m2)))
(matrix+ m1 m2))
(define (hello [str : String])
(define result : (Matrix Flonum) (do-some-crazy-matrix-operations))
(display (format "Hello ~a! Result is ~a" str result)))
(time (hello "Alan Turing"))
然后,我从 untyped 模块调用它,就像你说的:
#lang racket/base
(require "hello-matrix.rkt")
(time (hello "Alan Turing"))
结果如下:
Hello Alan Turing! Result is (array #[#[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0]])
cpu time: 719 real time: 710 gc time: 231
Hello Alan Turing! Result is (array #[#[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0] #[+inf.0 +inf.0 +inf.0 +inf.0 +inf.0]])
cpu time: 689 real time: 681 gc time: 184
也就是说,从非类型球拍调用它所花费的时间与从类型球拍调用它所花费的时间相同。
此结果可能在一定程度上取决于您使用的 DrRacket 版本;我正在使用 6.11.
所有这些都是为了证明 TR 代码仍然是 TR 代码,即使您从非类型化代码中调用它也是如此。我确实相信您遇到了性能问题,并且我确实相信它们与矩阵运算有关,但这个特定示例并未说明它们。