当在 C 中管理(取消)分配时,Haskell 运行时中的垃圾收集器问题
Garbage collector issues in Haskell runtime when (de)allocations are managed in C
我想使用 Haskell 的 FFI 功能在 C 和 Haskell 之间共享数据(在最简单的情况下是整数数组)。 C 端创建数据(相应地分配内存),但在它被释放之前从不修改它,所以我认为以下方法是“安全的”:
- 创建数据后,C 函数传递数组的长度和指向其开始的指针。
- 在 Haskell 端,我们创建一个
ForeignPtr
,设置一个终结器调用释放指针的 C 函数。
- 我们使用可以(不变地)在 Haskell 代码中使用的外部指针构建
Vector
。
但是,使用这种方法会导致相当不确定的崩溃。小例子往往有效,但是“一旦 GC 启动”,我开始在 GHC GC 的“疏散”部分的 this or this 行出现从分段错误到“barf”的各种错误。
我在这里做错了什么?做这样的事情的“正确方法”是什么?
一个例子
我有一个包含以下声明的 C 头文件:
typedef struct CVector {
const int32_t *pointer;
size_t length;
} Vector;
void create_c_vector(struct CVector *vector);
void free_buffer(void *buff);
Haskell 代码是使用 c2hs 从以下 .chs
文件生成的:
import Foreign.C.Types
import Foreign.Concurrent
import Foreign.Marshal.Alloc
import Foreign.Ptr
import Foreign.Storable
import qualified Data.Vector.Storable as V
#include <cvector.h>
data ForeignVector = ForeignVector
{ pointerFV :: Ptr CInt
, lengthFV :: CULong
}
instance Storable ForeignVector where
sizeOf _ = {#sizeof CVector #}
alignment _ = {#alignof CVector #}
peek p =
ForeignVector
<$> {#get CVector->pointer #} p
<*> {#get CVector->length #} p
poke p (ForeignVector vecP l) =
do {#set CVector.pointer #} p (castPtr vecP)
{#set CVector.length #} p l
peekUnit :: Storable a => Ptr () -> IO a
peekUnit = peek . castPtr
{#fun create_c_vector as ^ { alloca- `ForeignVector' peekUnit*} -> `()' #}
{#fun free_buffer as ^ { `Ptr ()' } -> `()' #}
fromForeign :: ForeignVector -> IO (V.Vector CInt)
fromForeign (ForeignVector p l) =
V.unsafeFromForeignPtr0
<$> newForeignPtr p (freeBuffer . castPtr $ p)
<*> pure (fromIntegral l)
createVector :: IO (V.Vector CInt)
createVector = fromForeign =<< createCVector
我做的一个特定测试在对 createVector
.
进行了数千次调用后产生了 internal error: evacuate: strange closure type 177
PS: 这就是为什么我想使用 Foreign.Concurrent.newForeignPtr
而不是更“标准”的原因
Foreign.ForeignPtr.newForeignPtr
: In some more complicated cases I am anticipating, while freeing the pointer one should also clean up other things which can potentially depend on parameters that are passed from Haskell. Therefore I would like to have a "finalizer with multiple arguments" and pass a partial application as the actual finalizer. This means that I can't use a pointer to a C function as the finalizer. While I've read that one can cook up the FinalizerPtr
required for the finalizer from Haskell functions using a "wrapping" mechanism, according to the documentation, function pointers obtained this way need to be explicitly deallocated with freeHaskellFunPtr
我不想为此记账。
PPS: 这是一个 base64 编码的 tarball,包含上述示例的完整源代码(包括重现上述错误的可执行文件的代码) :
H4sIAAAAAAAAA+1Ze1PbOhbv3/oUZ0JnSQAb50VmeM1QKNvMwIUpLZ2dbjdRbDnx4liuZAO5vXz3
PUd+hAS4LC2XbmejYYitc3Se0u9Isu8HVqB1KqzxlbAcu27X13kcr796xuZg67Tb5hfb/K95rrec
dqPptFtO85VTb9abzVfQfk4jHmqpTrhCU35Uzrxzv0jz78m/GwaD55wAT8j/xkazTflvdFqL/L9E
uy//Wrk/Z/03Njr4Y9Z/e7H+X6Q9lP/jT2+fbQ781/lvNTbwj/K/4Szy/yLtz/J/KJUIhtG5cBOp
bHekv1MHxWOj1Xoo/41mcz7/HdwIvIIXCeL/ef7H0ktDAZhueybdcDUSSjAWjGOpEshp9r79YRIL
fadbRm6qlIgSZlkwR8x/TxM1P+yYKz3iob0XhtKdJ97Df4aG8UE4NetrysPAD4QHBzzhdj5TCzbg
Gs4ZWwoiN0w9AdvuZcYw2mWMeTgCZn3emX1nAN8glkGUCHV4DrC5CWgU7HfRTYA1CEU0TEZEIdL+
xyMZDZFwg+ZFOKsiV0Bpyn3BBdDB7+LEhx5q/rZEL9KH/Zxn6QYZ0L1hNMa45jzmfZ4pFuICYtjB
R7jjAbXt17s4diiSYpy1m7uFAiAuuFbucGUeFkyxvBCopzrrC8b0FMJart6T5MlUhn1bEVRdrhOK
IQ2q5XrnBtzSCSFj5NzHKEgoxNPEws6uyUW1BtYudE+ATxl3soDYkCtj7NuSn0bgKsET0XN72Syg
2fEvTDCnycct6M+4tQyFvJUbUtGv1pYp2pkoXwnRG6S+L0ox/cycZZhhZ34mtFgTrovqoKITD/fY
9gj+RpIqGAj3EB/Ix8M0MpLoH8+dq9ZqKEnJcW6i4ZtJQs53ni8BM0drM0PmshYXKTu300hzXxxO
eVG1w4p5E4mraTelkCx+k7lehhheQ4yZsECqzbkRmWPZHKMZFqdKkBA5RhPFUPEQLWEsS05uHLp3
jzczLDtw27md7e08vfksYj8bV3+Vdl/9p/P/MQ8i+7sr/mx7pP7X662N4vxXbzjIV281286i/r9E
K+o/pnuu5N/ZEpQUrPaJkqF9IER8Jr7Odx/LiHtFp6nLR4FOHtlJnE10IsZ29+RptZ3pIAwnZ+nY
YAaWyQwITYkuSZEBOl+GXrgM1X/CNUyIr3oNqzCpQR9j0Idmu1OvgUOYZ7BKiTgMXESUYxyPUFQM
X82Z4DYcIYCNKYI5cNWyN9KK9fAStq0Z7qzujU7T5CxRRxFgNRCKMLSyb7g8yCrUJlRgdRX0SF7B
5QODKr8h0QgPoiFg9ZOXVJMiL38Y0jq27Uo2XIuvqcB9Sa8ovZ/JQE0GTqNV0LIR0PcwzTiwj8oV
1vJCc8n3BwYIBXyuO84aNBwHbBvw2flSMHxZoPH/bLsP/2f6bJcPePhDOh7D/412vTz/tZuE/3j8
by7w/yWaya6FUKEDGW0Cpr/B6Az3YRRo8AME2hEi7UCICIYiEsrAE229IObuBR8Ke8LHIQwmMKIO
yCWBYzdbdstGUSRNC7EJoySJ9eb6+jBIRunAduV4Xctw3YxjLOJj5Jm2mUnISgPzZiYq05NIxjrQ
RfcejIMoGOOW8kqqCwJEcc3HMTrh034/gsPDLhihDLFdRHqq8TQdYNeBJOBmgzQIPSvB+pTRzwIS
wsR1orilZapcYVFs9KaBuPdv9w6O39pjz7yZ2/PypHm3y2WofKC4miBNXMdSC8/KynAuD+6pvQAy
wfI8z3jKk5HuYax6xq0exQqrhC6s9AJV8mrl4tNw5FoyTjCYGDbrpAHWJzqSWMOGMYc8zMwLplpy
0/Etj4yU4ZTYGOmS4olYRF5JG3AtzOMalCI84fM0TKyQR8MUJ9AmvOP6QoRhw6k7DIMs3DQxJX4W
mwYBVRzKj0UZv7VJfVpwHg4AWCrRSNJ34vL9bufFM3+bndRPCsxftP7vw/9yPj+Tjkf3/5329PtP
q2Puf7EkLPD/BdqSwcQuzQA4zsHzUw6ebzPwZMwUg1jJf+NUxWk6xqWTUCXQwHHvyfUIcLf793f7
yxqGXA1w7oIrwzA7qSMJuYRKaEuMyphG5EV4kTbrJtPqgtKotFxhcYCzJKstcPThDOodrEk2Y0tL
8IYWG1rG2EkkIBLC05DIbA0CIgDsQw6tKJNuXjIS+ULUfDlh5VLJGp52AncELke4F7gLjsSaqRJ5
xVijkqbSKCJ1/X6fDV0XLD3iCo20JOkpAF1LsPzT7j5MEZ4GoLektE/g3wcEYkOc2GS8K+bMxfji
Ht6brM0amoccjUSHjDMJpJr8qdAFo8eVxwwqVVDShcgCB3RjQmU9C9r73AnWjZCCW3cTMjxAEcTi
IzpplCK+kiX9O6jbXwN5O9xxmtAANu8ZFB6/PjroHXXfvN97/4/e6d6Hd30Q0WWg8HhobjIvMfmk
3F4cC35+e/D7f7mD+XEdj+C/06yX3/+x1NUR/9vIv8D/l2jTjyN4rMfTPX0bmekz99S7jNFuGHco
ePpXqVve1sO3bINLFQHXf9Js9BJYye/8twyNPmtgZ3atv8VuIBu5xdilDLz5W/nqnPyVrLu2lXHf
univmo4VekHqz47jr9oeXf/uj+t4bP1v4Jm/uP/tdOj83647ncX6f4lWrvVKCfiVp61MAwEzixx2
oNWgxV8CAi1S7K2WHTU8yNF3t2qrASuQffksqLWaGetD1aztHGJyDQGKcTJgod1adQt7tgtwgWB1
teCno5nvfw6+4IAgG3Bj/l/OfQHdMYxbM7TSjwK1cDCIUItc+F0Zv308OnpAhjH3ht2wP4UwI5lo
1RzRbhaYtmiLtmh/afsPAHfp2gAuAAA=
从 复制和扩展。
您的转换可能有误或 poke
。作为防御指南和调试时,我特别强调的一件事是:
明确注释可能破坏类型的所有内容的类型。 这样,您始终知道自己得到了什么。即使 poke
、castPtr
或 unsafeCoerce
具有我想要的类型 now,在代码移动下也可能不稳定。即使这不能确定问题,它至少可以帮助思考。
例如,我曾经将一个空终止符写入一个字节缓冲区......它通过超出末尾的写入破坏了相邻的内存,因为我使用的是 '\NUL'
,它不是 char
,但是 Char
—32 位!原因是 pokeByteOff
是多态的:它有类型 (Storable a) => Ptr b -> Int -> a -> IO ()
,not … => Ptr <strong>a</strong> -> …
.
原来你的代码是这样的! :
The createVector
generated by c2hs was equivalent to something like alloca $ \ ptr -> createCVector'_ ptr >> peek ptr
, where createCVector'_ :: Ptr () -> IO ()
, which meant that alloca
allocated only enough space to hold a unit. Changing the in-marshaller to alloca' f = alloca $ f . (castPtr :: Ptr ForeignVector -> Ptr ())
seems to solve the issue.
事实并非如此,但可能是这样的:
当闭包被某人(阅读:我)在数组之外写入时损坏时,我遇到了类似的崩溃。如果你在没有边界检查的情况下进行任何写入,将它们替换为检查过的版本可能会有所帮助,看看你是否可以获得异常而不是堆损坏。在某种程度上,这 是 这里发生的事情,只是写入的是 alloca
分配的区域,而不是数组。
或者,考虑生命周期问题:ForeignPtr
是否会比您预期的更早被删除并释放缓冲区,从而为您提供释放后使用。在一个特别令人沮丧的案例中,出于这个原因,我不得不使用 touchForeignPtr
来保持 ForeignPtr
存活。
我想使用 Haskell 的 FFI 功能在 C 和 Haskell 之间共享数据(在最简单的情况下是整数数组)。 C 端创建数据(相应地分配内存),但在它被释放之前从不修改它,所以我认为以下方法是“安全的”:
- 创建数据后,C 函数传递数组的长度和指向其开始的指针。
- 在 Haskell 端,我们创建一个
ForeignPtr
,设置一个终结器调用释放指针的 C 函数。 - 我们使用可以(不变地)在 Haskell 代码中使用的外部指针构建
Vector
。
但是,使用这种方法会导致相当不确定的崩溃。小例子往往有效,但是“一旦 GC 启动”,我开始在 GHC GC 的“疏散”部分的 this or this 行出现从分段错误到“barf”的各种错误。
我在这里做错了什么?做这样的事情的“正确方法”是什么?
一个例子
我有一个包含以下声明的 C 头文件:
typedef struct CVector {
const int32_t *pointer;
size_t length;
} Vector;
void create_c_vector(struct CVector *vector);
void free_buffer(void *buff);
Haskell 代码是使用 c2hs 从以下 .chs
文件生成的:
import Foreign.C.Types
import Foreign.Concurrent
import Foreign.Marshal.Alloc
import Foreign.Ptr
import Foreign.Storable
import qualified Data.Vector.Storable as V
#include <cvector.h>
data ForeignVector = ForeignVector
{ pointerFV :: Ptr CInt
, lengthFV :: CULong
}
instance Storable ForeignVector where
sizeOf _ = {#sizeof CVector #}
alignment _ = {#alignof CVector #}
peek p =
ForeignVector
<$> {#get CVector->pointer #} p
<*> {#get CVector->length #} p
poke p (ForeignVector vecP l) =
do {#set CVector.pointer #} p (castPtr vecP)
{#set CVector.length #} p l
peekUnit :: Storable a => Ptr () -> IO a
peekUnit = peek . castPtr
{#fun create_c_vector as ^ { alloca- `ForeignVector' peekUnit*} -> `()' #}
{#fun free_buffer as ^ { `Ptr ()' } -> `()' #}
fromForeign :: ForeignVector -> IO (V.Vector CInt)
fromForeign (ForeignVector p l) =
V.unsafeFromForeignPtr0
<$> newForeignPtr p (freeBuffer . castPtr $ p)
<*> pure (fromIntegral l)
createVector :: IO (V.Vector CInt)
createVector = fromForeign =<< createCVector
我做的一个特定测试在对 createVector
.
internal error: evacuate: strange closure type 177
PS: 这就是为什么我想使用 Foreign.Concurrent.newForeignPtr
而不是更“标准”的原因
Foreign.ForeignPtr.newForeignPtr
: In some more complicated cases I am anticipating, while freeing the pointer one should also clean up other things which can potentially depend on parameters that are passed from Haskell. Therefore I would like to have a "finalizer with multiple arguments" and pass a partial application as the actual finalizer. This means that I can't use a pointer to a C function as the finalizer. While I've read that one can cook up the FinalizerPtr
required for the finalizer from Haskell functions using a "wrapping" mechanism, according to the documentation, function pointers obtained this way need to be explicitly deallocated with freeHaskellFunPtr
我不想为此记账。
PPS: 这是一个 base64 编码的 tarball,包含上述示例的完整源代码(包括重现上述错误的可执行文件的代码) :
H4sIAAAAAAAAA+1Ze1PbOhbv3/oUZ0JnSQAb50VmeM1QKNvMwIUpLZ2dbjdRbDnx4liuZAO5vXz3
PUd+hAS4LC2XbmejYYitc3Se0u9Isu8HVqB1KqzxlbAcu27X13kcr796xuZg67Tb5hfb/K95rrec
dqPptFtO85VTb9abzVfQfk4jHmqpTrhCU35Uzrxzv0jz78m/GwaD55wAT8j/xkazTflvdFqL/L9E
uy//Wrk/Z/03Njr4Y9Z/e7H+X6Q9lP/jT2+fbQ781/lvNTbwj/K/4Szy/yLtz/J/KJUIhtG5cBOp
bHekv1MHxWOj1Xoo/41mcz7/HdwIvIIXCeL/ef7H0ktDAZhueybdcDUSSjAWjGOpEshp9r79YRIL
fadbRm6qlIgSZlkwR8x/TxM1P+yYKz3iob0XhtKdJ97Df4aG8UE4NetrysPAD4QHBzzhdj5TCzbg
Gs4ZWwoiN0w9AdvuZcYw2mWMeTgCZn3emX1nAN8glkGUCHV4DrC5CWgU7HfRTYA1CEU0TEZEIdL+
xyMZDZFwg+ZFOKsiV0Bpyn3BBdDB7+LEhx5q/rZEL9KH/Zxn6QYZ0L1hNMa45jzmfZ4pFuICYtjB
R7jjAbXt17s4diiSYpy1m7uFAiAuuFbucGUeFkyxvBCopzrrC8b0FMJart6T5MlUhn1bEVRdrhOK
IQ2q5XrnBtzSCSFj5NzHKEgoxNPEws6uyUW1BtYudE+ATxl3soDYkCtj7NuSn0bgKsET0XN72Syg
2fEvTDCnycct6M+4tQyFvJUbUtGv1pYp2pkoXwnRG6S+L0ox/cycZZhhZ34mtFgTrovqoKITD/fY
9gj+RpIqGAj3EB/Ix8M0MpLoH8+dq9ZqKEnJcW6i4ZtJQs53ni8BM0drM0PmshYXKTu300hzXxxO
eVG1w4p5E4mraTelkCx+k7lehhheQ4yZsECqzbkRmWPZHKMZFqdKkBA5RhPFUPEQLWEsS05uHLp3
jzczLDtw27md7e08vfksYj8bV3+Vdl/9p/P/MQ8i+7sr/mx7pP7X662N4vxXbzjIV281286i/r9E
K+o/pnuu5N/ZEpQUrPaJkqF9IER8Jr7Odx/LiHtFp6nLR4FOHtlJnE10IsZ29+RptZ3pIAwnZ+nY
YAaWyQwITYkuSZEBOl+GXrgM1X/CNUyIr3oNqzCpQR9j0Idmu1OvgUOYZ7BKiTgMXESUYxyPUFQM
X82Z4DYcIYCNKYI5cNWyN9KK9fAStq0Z7qzujU7T5CxRRxFgNRCKMLSyb7g8yCrUJlRgdRX0SF7B
5QODKr8h0QgPoiFg9ZOXVJMiL38Y0jq27Uo2XIuvqcB9Sa8ovZ/JQE0GTqNV0LIR0PcwzTiwj8oV
1vJCc8n3BwYIBXyuO84aNBwHbBvw2flSMHxZoPH/bLsP/2f6bJcPePhDOh7D/412vTz/tZuE/3j8
by7w/yWaya6FUKEDGW0Cpr/B6Az3YRRo8AME2hEi7UCICIYiEsrAE229IObuBR8Ke8LHIQwmMKIO
yCWBYzdbdstGUSRNC7EJoySJ9eb6+jBIRunAduV4Xctw3YxjLOJj5Jm2mUnISgPzZiYq05NIxjrQ
RfcejIMoGOOW8kqqCwJEcc3HMTrh034/gsPDLhihDLFdRHqq8TQdYNeBJOBmgzQIPSvB+pTRzwIS
wsR1orilZapcYVFs9KaBuPdv9w6O39pjz7yZ2/PypHm3y2WofKC4miBNXMdSC8/KynAuD+6pvQAy
wfI8z3jKk5HuYax6xq0exQqrhC6s9AJV8mrl4tNw5FoyTjCYGDbrpAHWJzqSWMOGMYc8zMwLplpy
0/Etj4yU4ZTYGOmS4olYRF5JG3AtzOMalCI84fM0TKyQR8MUJ9AmvOP6QoRhw6k7DIMs3DQxJX4W
mwYBVRzKj0UZv7VJfVpwHg4AWCrRSNJ34vL9bufFM3+bndRPCsxftP7vw/9yPj+Tjkf3/5329PtP
q2Puf7EkLPD/BdqSwcQuzQA4zsHzUw6ebzPwZMwUg1jJf+NUxWk6xqWTUCXQwHHvyfUIcLf793f7
yxqGXA1w7oIrwzA7qSMJuYRKaEuMyphG5EV4kTbrJtPqgtKotFxhcYCzJKstcPThDOodrEk2Y0tL
8IYWG1rG2EkkIBLC05DIbA0CIgDsQw6tKJNuXjIS+ULUfDlh5VLJGp52AncELke4F7gLjsSaqRJ5
xVijkqbSKCJ1/X6fDV0XLD3iCo20JOkpAF1LsPzT7j5MEZ4GoLektE/g3wcEYkOc2GS8K+bMxfji
Ht6brM0amoccjUSHjDMJpJr8qdAFo8eVxwwqVVDShcgCB3RjQmU9C9r73AnWjZCCW3cTMjxAEcTi
IzpplCK+kiX9O6jbXwN5O9xxmtAANu8ZFB6/PjroHXXfvN97/4/e6d6Hd30Q0WWg8HhobjIvMfmk
3F4cC35+e/D7f7mD+XEdj+C/06yX3/+x1NUR/9vIv8D/l2jTjyN4rMfTPX0bmekz99S7jNFuGHco
ePpXqVve1sO3bINLFQHXf9Js9BJYye/8twyNPmtgZ3atv8VuIBu5xdilDLz5W/nqnPyVrLu2lXHf
univmo4VekHqz47jr9oeXf/uj+t4bP1v4Jm/uP/tdOj83647ncX6f4lWrvVKCfiVp61MAwEzixx2
oNWgxV8CAi1S7K2WHTU8yNF3t2qrASuQffksqLWaGetD1aztHGJyDQGKcTJgod1adQt7tgtwgWB1
teCno5nvfw6+4IAgG3Bj/l/OfQHdMYxbM7TSjwK1cDCIUItc+F0Zv308OnpAhjH3ht2wP4UwI5lo
1RzRbhaYtmiLtmh/afsPAHfp2gAuAAA=
从
您的转换可能有误或 poke
。作为防御指南和调试时,我特别强调的一件事是:
明确注释可能破坏类型的所有内容的类型。 这样,您始终知道自己得到了什么。即使 poke
、castPtr
或 unsafeCoerce
具有我想要的类型 now,在代码移动下也可能不稳定。即使这不能确定问题,它至少可以帮助思考。
例如,我曾经将一个空终止符写入一个字节缓冲区......它通过超出末尾的写入破坏了相邻的内存,因为我使用的是 '\NUL'
,它不是 char
,但是 Char
—32 位!原因是 pokeByteOff
是多态的:它有类型 (Storable a) => Ptr b -> Int -> a -> IO ()
,not … => Ptr <strong>a</strong> -> …
.
原来你的代码是这样的!
The
createVector
generated by c2hs was equivalent to something likealloca $ \ ptr -> createCVector'_ ptr >> peek ptr
, wherecreateCVector'_ :: Ptr () -> IO ()
, which meant thatalloca
allocated only enough space to hold a unit. Changing the in-marshaller toalloca' f = alloca $ f . (castPtr :: Ptr ForeignVector -> Ptr ())
seems to solve the issue.
事实并非如此,但可能是这样的:
当闭包被某人(阅读:我)在数组之外写入时损坏时,我遇到了类似的崩溃。如果你在没有边界检查的情况下进行任何写入,将它们替换为检查过的版本可能会有所帮助,看看你是否可以获得异常而不是堆损坏。在某种程度上,这 是 这里发生的事情,只是写入的是 alloca
分配的区域,而不是数组。
或者,考虑生命周期问题:ForeignPtr
是否会比您预期的更早被删除并释放缓冲区,从而为您提供释放后使用。在一个特别令人沮丧的案例中,出于这个原因,我不得不使用 touchForeignPtr
来保持 ForeignPtr
存活。