CPython 内存堆损坏问题

CPython Memory Heap Corruption Issue

我有一个 Windows fatal exception: code 0xc0000374 - 是的,有多重处理(等待但是...)。 Google 表示异常代码 0xc0000374 表示堆损坏。是的,多处理是必须的。它是我正在使用的框架的一部分,因为每个机器人都有可能在 运行 中拥有自己的核心。TL;DR 我无法改变存在多处理的事实。但是,我的 bot 在一个线程上只有 运行,所以应该没有问题,事实上,这个问题相对较新。

我想我已经找到了问题所在,但它并没有多大意义。我正在用 C 扩展 Python 以提高 运行 倍,我认为这就是错误所在。我已经将它缩小到一个名为 ground_shot_is_viable 的函数,因为当我在 Python 中注释掉它时,错误永远不会发生。但是,当我尝试打印垃圾邮件时(在这种情况下,我实际上写入了一个文件,因为它更适合数百次打印),我发现该功能已成功完成。我认为错误是该函数超出了它的内存边界,这破坏了一部分数据,导致崩溃回溯指向其他地方。 (在这种情况下,这是我正在使用的框架中的一条无辜行 - File "G:\RLBotGUIX\venv\lib\site-packages\rlbot\utils\rendering\rendering_manager.py", line 104 in end_rendering 将变量设置为 False

我还测试了其他功能,但由于某种原因它们不会导致此问题。有一个很小的潜力,因为他们不像 ground_shot_is_viable.

那样经常被调用

错误仅在几分钟后发生,概率总计至少几百次,也许一千次。 (机器人 运行 的速度超过 120tps,因此该函数有可能在一秒钟内被调用 120 次。)

我只是通过将环境变量 PYTHONFAULTHANLDER 设置为 1 来设法获得回溯 - 如果我不这样做,我的程序就会无声地崩溃。

当我使用 python.exe 启动程序时,我也没有得到故障转储,但是使用 pythonw.exe 我确实得到了故障转储。

回溯:

Windows fatal exception: code 0xc0000374

Thread 0x00008004 (most recent call first):
  File "G:\RLBotGUIX\Python37\lib\threading.py", line 296 in wait
  File "G:\RLBotGUIX\Python37\lib\multiprocessing\queues.py", line 224 in _feed
  File "G:\RLBotGUIX\Python37\lib\threading.py", line 870 in run
  File "G:\RLBotGUIX\Python37\lib\threading.py", line 926 in _bootstrap_inner
  File "G:\RLBotGUIX\Python37\lib\threading.py", line 890 in _bootstrap

Current thread 0x000031e0 (most recent call first):
  File "G:\RLBotGUIX\venv\lib\site-packages\rlbot\utils\rendering\rendering_manager.py", line 104 in end_rendering
  File "G:\RLBotGUIX\venv\lib\site-packages\rlbot\botmanager\bot_manager_struct.py", line 69 in call_agent
  File "G:\RLBotGUIX\venv\lib\site-packages\rlbot\botmanager\bot_manager.py", line 250 in perform_tick
  File "G:\RLBotGUIX\venv\lib\site-packages\rlbot\botmanager\bot_manager.py", line 206 in run
  File "G:\RLBotGUIX\venv\lib\site-packages\rlbot\setup_manager.py", line 617 in run_agent
  File "G:\RLBotGUIX\Python37\lib\multiprocessing\process.py", line 99 in run
  File "G:\RLBotGUIX\Python37\lib\multiprocessing\process.py", line 297 in _bootstrap
  File "G:\RLBotGUIX\Python37\lib\multiprocessing\spawn.py", line 118 in _main
  File "G:\RLBotGUIX\Python37\lib\multiprocessing\spawn.py", line 105 in spawn_main
  File "<string>", line 1 in <module>

故障转储:

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ffb96b7dace (ucrtbase!abort+0x000000000000004e)
   ExceptionCode: c0000409 (Security check failure or stack buffer overrun)
  ExceptionFlags: 00000001
NumberParameters: 1
   Parameter[0]: 0000000000000007
Subcode: 0x7 FAST_FAIL_FATAL_APP_EXIT 

PROCESS_NAME:  pythonw.exe

ERROR_CODE: (NTSTATUS) 0xc0000409 - The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application.

EXCEPTION_CODE_STR:  c0000409

EXCEPTION_PARAMETER1:  0000000000000007

STACK_TEXT:  
000000d4`287ef660 00007ffb`2baf1bb7     : 00000295`00000003 00000000`00000003 00000000`ffffffff 00007ffb`2bc0a3d8 : ucrtbase!abort+0x4e
000000d4`287ef690 00007ffb`2baf17c3     : 000000d4`287ef9a0 000000d4`287ef800 00000000`00000000 00000000`00000000 : python37!Py_RestoreSignals+0x14b
000000d4`287ef6d0 00007ffb`2b9e94a9     : 000000d4`287ef9a0 00000000`00000000 00000295`edd52050 00000000`00000000 : python37!Py_FatalInitError+0x1f
000000d4`287ef700 00007ffb`2b9a09ce     : 000000d4`287ef9a0 00000295`edd52050 00000000`00000000 00000000`00000000 : python37!PyErr_NoMemory+0x2ad5d
000000d4`287ef930 00007ffb`2b9a09b6     : 0000ab32`10364489 00007ff7`28481e7e 00000000`00000000 00007ffb`96b29f66 : python37!Py_Main+0x6e
000000d4`287ef960 00007ff7`28481277     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : python37!Py_Main+0x56
000000d4`287efa10 00007ffb`97f07c24     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : pythonw+0x1277
000000d4`287efa50 00007ffb`98f8d4d1     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0x14
000000d4`287efa80 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21


SYMBOL_NAME:  ucrtbase!abort+4e

MODULE_NAME: ucrtbase

IMAGE_NAME:  ucrtbase.dll

STACK_COMMAND:  ~0s ; .ecxr ; kb

FAILURE_BUCKET_ID:  FAIL_FAST_FATAL_APP_EXIT_c0000409_ucrtbase.dll!abort

最小可重现示例(实际上未经测试,但我的程序的完整版本已经过测试)

#include <math.h>
#include <Python.h>
#include <string.h>

// Constants

#define simulation_dt 0.05
#define physics_dt 0.008333333333333333333333

#define jump_max_duration 0.2
#define jump_speed 291.6666666666666666666666
#define jump_acc 1458.3333333333333333333333

#define aerial_throttle_accel 66.6666666666666666666666
#define boost_consumption 33.3333333333333333333333

#define brake_force 3500.
#define max_speed 2300.
#define max_speed_no_boost 1410.

#define start_throttle_accel_m -1.02857142857142857
#define start_throttle_accel_b 1600.
#define end_throttle_accel_m -16.
#define end_throttle_accel_b 160.

#define PI 3.14159265358979323846

// simple math stuff

static inline signed char sign(int value)
{
    return (value > 0) - (value < 0);
}

// Vector stuff

typedef struct vector
{
    double x;
    double y;
    double z;
} Vector;

static inline double dot(Vector vec1, Vector vec2)
{
    return vec1.x * vec2.x + vec1.y * vec2.y + vec1.z * vec2.z;
}

static inline double magnitude(Vector vec)
{
    return sqrt(dot(vec, vec));
}

static inline Vector flatten(Vector vec)
{
    return (Vector){vec.x, vec.y, 0};
}

Vector normalize(Vector vec)
{
    double mag = magnitude(vec);

    if (mag != 0)
        return (Vector){vec.x / mag, vec.y / mag, vec.z / mag};

    return (Vector){0, 0, 0};
}

static inline double angle(Vector vec1, Vector vec2)
{
    return acos(cap(dot(normalize(vec1), normalize(vec2)), -1, 1));
}

static inline double angle2D(Vector vec1, Vector vec2)
{
    return angle(flatten(vec1), flatten(vec2));
}

// orientation matrix stuff

typedef struct orientation
{
    Vector forward;
    Vector left;
    Vector up;
} Orientation;

static inline Vector localize(Orientation or_mat, Vector vec)
{
    return (Vector){dot(or_mat.forward, vec), dot(or_mat.left, vec), dot(or_mat.up, vec)};
}

// hitboxes

typedef struct hitbox
{
    double length;
    double width;
    double height;
    Vector offset;
} HitBox;

// other definitions

typedef struct car
{
    Vector location;
    Vector velocity;
    Orientation orientation;
    Vector angular_velocity;
    _Bool demolished;
    _Bool airborne;
    _Bool supersonic;
    _Bool jumped;
    _Bool doublejumped;
    unsigned char boost;
    unsigned char index;
    HitBox hitbox;
} Car;

// extra math functions

double throttle_acceleration(double car_velocity_x)
{
    double x = fabs(car_velocity_x);
    if (x >= 1410)
        return 0;

    // use y = mx + b to find the throttle acceleration
    if (x < 1400)
        return start_throttle_accel_m * x + start_throttle_accel_b;

    x -= 1400;
    return end_throttle_accel_m * x + end_throttle_accel_b;
}

// physics simulations

_Bool can_reach_target_forwards(double max_time, double jump_time, double boost_accel, double distance_remaining, double car_speed, int car_boost)
{
    double *v = &car_speed;
    double t = 0;
    double b = car_boost;
    double *d = &distance_remaining;
    double ba_dt = boost_accel * simulation_dt;
    double ms_ba_dt = max_speed - ba_dt;
    double bc_dt = boost_consumption * simulation_dt;
    double bk_dt = brake_force * simulation_dt;

    while (*d > 25 && t <= max_time && (*v <= 0 || *d / *v > jump_time))
    {
        *v += (*v < 0) ? bk_dt : throttle_acceleration(*v) * simulation_dt;

        if (b > bc_dt && *v < ms_ba_dt)
        {
            *v += ba_dt;
            if (b <= 100)
                b -= bc_dt;
        }

        *d -= *v * simulation_dt;
        t += simulation_dt;
    }

    double th_dt = aerial_throttle_accel * simulation_dt;

    while (*d > 25 && t <= max_time)
    {
        // yes, this IS max_speed, NOT max_speed_no_boost!
        if (*v <= max_speed - th_dt)
            *v += th_dt;

        if (b > bc_dt && *v < ms_ba_dt)
        {
            *v += ba_dt;
            if (b <= 100)
                b -= bc_dt;
        }

        *d -= *v * simulation_dt;
        t += simulation_dt;
    }

    return *d <= 25;
}

_Bool can_reach_target_backwards(double max_time, double jump_time, double distance_remaining, double car_speed)
{
    double *v = &car_speed;
    double t = 0;
    double *d = &distance_remaining;
    double bk_dt = brake_force * simulation_dt;

    while (*d > 25 && t <= max_time && (*v >= 0 || *d / (-*v) > jump_time))
    {
        *v -= (*v > 0) ? bk_dt : throttle_acceleration(*v) * simulation_dt;
        *d += *v * simulation_dt;
        t += simulation_dt;
    }

    double th_dt = aerial_throttle_accel * simulation_dt;
    double ms_th_dt = max_speed - th_dt;

    while (*d > 25 && t <= max_time)
    {
        // yes, this IS max_speed, NOT max_speed_no_boost!
        if (-*v <= ms_th_dt)
            *v -= th_dt;

        *d += *v * simulation_dt;
        t += simulation_dt;
    }

    return *d <= 25;
}

// Parsing shot slices

_Bool generic_is_viable(double *T, double jump_time, double *boost_accel, Car *me, Vector *direction, double *distance_remaining)
{
    if (*T <= 0 || *distance_remaining / *T > 2300)
        return 0;

    double forward_angle = angle2D(*direction, me->orientation.forward);
    double backward_angle = PI - forward_angle;

    double forward_time = *T - (forward_angle * 0.418);
    double backward_time = *T - (backward_angle * 0.318);

    double true_car_speed = dot(me->orientation.forward, me->velocity);
    double car_speed = magnitude(me->velocity) * sign((int)true_car_speed);

    jump_time *= 1.05;

    _Bool forward = forward_time > 0 && can_reach_target_forwards(forward_time, jump_time, *boost_accel, *distance_remaining, car_speed, me->boost);
    _Bool backward = backward_time > 0 && forward_angle >= 1.6 && true_car_speed < 1000 && can_reach_target_backwards(backward_time, jump_time, *distance_remaining, car_speed);

    return forward || backward;
}
_Bool ground_shot_is_viable(double *T, double *boost_accel, Car *me, double *offset_target_z, Vector *direction, double *distance_remaining)
{
    if (*offset_target_z >= (92.75 + (me->hitbox.height / 2)) || me->airborne)
        return 0;

    return generic_is_viable(T, 0, boost_accel, me, direction, distance_remaining);
}

// Linking the C functions to Python methods

static PyObject *method_ground_shot_is_viable(PyObject *self, PyObject *args)
{
    double T, boost_accel, offset_target_z, distance_remaining;
    Vector direction;
    Car me;
    _Bool shot_viable = 0;

    // args are for >= 1.11
    if (!PyArg_ParseTuple(args, "dd((ddd)(ddd)((ddd)(ddd)(ddd))(ddd)bbbbbbb(ddd)(ddd))d(ddd)d", &T, &boost_accel, &me.location.x, &me.location.y, &me.location.z, &me.velocity.x, &me.velocity.y, &me.velocity.z, &me.orientation.forward.x, &me.orientation.forward.y, &me.orientation.forward.z, &me.orientation.left.x, &me.orientation.left.y, &me.orientation.left.z, &me.orientation.up.x, &me.orientation.up.y, &me.orientation.up.z, &me.angular_velocity.x, &me.angular_velocity.y, &me.angular_velocity.z, &me.demolished, &me.airborne, &me.supersonic, &me.jumped, &me.doublejumped, &me.boost, &me.index, &me.hitbox.length, &me.hitbox.width, &me.hitbox.height, &me.hitbox.offset.x, &me.hitbox.offset.y, &me.hitbox.offset.z, &offset_target_z, &direction.x, &direction.y, &direction.z, &distance_remaining))
    {
        return NULL;
        // removed legacy code stuff
    }
    else
    {
        shot_viable = ground_shot_is_viable(&T, &boost_accel, &me, &offset_target_z, &direction, &distance_remaining);
    }

    return (shot_viable) ? Py_True : Py_False;
}

static PyMethodDef methods[] = {
    {"ground_shot_is_viable", method_ground_shot_is_viable, METH_VARARGS, "Check if a ground shot is viable"},
    {NULL, NULL, 0, NULL}};

static struct PyModuleDef module = {
    PyModuleDef_HEAD_INIT,
    "gstest",
    "Test thing",
    -1,
    methods};

PyMODINIT_FUNC PyInit_test(void)
{
    return PyModule_Create(&module);
};

此代码与 Python (ofc...) 和 RLBot python 包结合使用。机器人 运行 在 Rocket League 中,它玩游戏。 RLBot 框架本身不会崩溃,只是机器人崩溃。我已经通过 运行 同时测试了 2 个机器人,只有我的机器人崩溃了。其他机器人(实际上是机器人)未受影响。

我的系统有 20GB 的 ram,问题发生在我的容量为 50%-75% 时,所以我系统的 ram 数量不是问题。我不是最擅长测试内存泄漏的,但它似乎每 30 秒到一分钟增加 0.1MB 左右。这并不多,因为机器人一开始占用大约 30MB 的内存。

这个问题困扰了我将近一个月,我想把它带到行刑队面前,但该死的事情很烦人。

我在 SO 上发布这个问题时非常犹豫,因为我不太确定该放什么。我希望我已经提供了所需的一切!如果您想要完整的 C 程序,或者如果您想要 setup.py 文件和其他东西,请询问,我会提供一个 hastebin 或其他东西。或者,如果您想要完整的机器人,我可以将一个 zip 文件上传到保管箱,因为该机器人相当大,有数千行代码和多个文件。

错误最有可能出现在 method_ground_shot_is_viable:

这一行
    return (shot_viable) ? Py_True : Py_False;

使用 PyMethodDef 注册的函数必须 return 一个 new reference,这意味着它们必须在 return 像 [=14= 这样的全局对象时显式增加引用计数] 或 Py_False。如果不这样做,调用者会减少 returned 对象的引用计数,而之前它不会增加。在 method_ground_shot_is_viable 被调用足够多次后,TrueFalse 的引用计数降为零并被释放,导致释放后使用。

增加引用计数可以通过在 return 之前将 Py_INCREF 宏应用到 Py_TruePy_False 来完成。您还可以使用 boolean-returning 方便的宏来处理引用计数:

   if (shot_viable)
       Py_RETURN_TRUE;
   else
       Py_RETURN_FALSE;