出现错误时安全退出到特定状态

Question

在编写代码时，我经常检查是否发生错误。一个例子是：

char *x = malloc( some_bytes ); 
if( x == NULL ){
    fprintf( stderr, "Malloc failed.\n" ); 
    exit(EXIT_FAILURE); 
}

我过去也用过 strerror( errno )。

我只编写过小型桌面应用程序，在这些应用程序中，程序是否 exit()ed 并不重要，以防出现错误。

但是，现在我正在为嵌入式系统 (Arduino) 编写 C 代码，我不希望系统在出现错误时退出。我希望它转到特定的 state/function ，在那里它可以关闭系统电源、发送错误报告和安全空闲。

我可以简单地调用一个 error_handler() 函数，但我可能处于堆栈深处并且内存不足，导致 error_handler() 无法运行。

相反，我希望执行能够有效地折叠堆栈，释放大量内存并开始整理掉电和错误报告。如果系统不能安全关闭电源，则存在严重的火灾风险。

是否有在低内存嵌入式系统中实现安全错误处理的标准方法？

编辑 1：我将限制在嵌入式系统中使用 malloc()。在这种特殊情况下，如果文件格式不正确，则在读取文件时会发生错误。

Answer 1

使用setjmp(3)设置恢复点，然后longjmp(3)跳转到它，将堆栈恢复到setjmp点处的状态。它不会释放 malloced 内存。

一般在嵌入式程序中not a good idea to use malloc/free如果可以避免的话。例如，静态数组可能就足够了，甚至使用 alloca() 稍微好一些。

Answer 2

也许你正在等待神圣的setjmp/longjmp，那个来拯救他们所有的罪孽的记忆饥饿的人？

#include <setjmp.h>

jmp_buf jumpToMeOnAnError;
void someUpperFunctionOnTheStack() {
    if(setjmp(jumpToMeOnAnError) != 0) {
        // Error handling code goes here

        // Return, abort(), while(1) {}, or whatever here...
    }

    // Do routinary stuff
}

void someLowerFunctionOnTheStack() {
    if(theWorldIsOver)
       longjmp(jumpToMeOnAnError, -1);
}

Edit：出于与您所说相同的原因，最好不要在嵌入式系统上执行 malloc()/free()s。简直拿不准。除非你使用很多的 return 代码/setjmp() 来释放堆栈中的所有内存...

Answer 3

如果您的系统有看门狗，您可以使用：

char *x = malloc( some_bytes ); 
assert(x != NULL);

assert() 的实现可能是这样的：

#define assert (condition) \
    if (!(condition)) while(true)

万一发生故障，看门狗会触发，系统会进行重置。重启时系统会检查重置原因，如果重置原因是"watchdog reset"，系统会进入安全状态。

更新

在进入while循环之前，assertcold也会输出一条错误信息，打印堆栈跟踪或者在非易失性内存中保存一些数据。

Answer 4

尽量减少堆栈使用：

编写程序，使调用是并行的，而不是函数调用子函数调用子函数调用子函数.... I.E.顶级函数调用子函数，其中子函数立即 returns，带有状态信息。顶层函数然后调用下一个子函数...等等

程序架构的嵌套方法（不利于堆栈限制）：

top level function
    second level function
        third level function
            forth level function

应避免在嵌入式系统中使用

嵌入式系统程序架构的首选方法是：

top level function (the reset event handler)
    (variations in the following depending on if 'warm' or 'cold' start)
    initialize hardware
    initialize peripherals
    initialize communication I/O
    initialize interrupts
    initialize status info
    enable interrupts
    enter background  processing

interrupt handler
    re-enable the interrupt
    using 'scheduler' 
        select a foreground function 
        trigger dispatch for selected foreground function        
    return from interrupt

background processing 

(this can be, and often is implemented as a 'state' machine rather than a loop)
    loop:
        if status info indicates need to call second level function 1 
            second level function 1, which updates status info
        if status info indicates need to call second level function 2
            second level function 2, which updates status info
        etc
    end loop:

注意，尽量没有'third level function x'

请注意，前台功能必须在再次安排之前完成。

注意：还有很多我在上面省略的其他细节，比如

kicking the watchdog, 
the other interrupt events,
'critical' code sections and use of mutex(),
considerations between 'soft real-time' and 'hard real-time',
context switching
continuous BIT, commanded BIT, and error handling 
etc

Answer 5

Is there a standard way that safe error handling is implemented in low memory embedded systems?

是的，有一种行业实际的处理方式。一切都很简单：

对于程序中的每个模块，您都需要有一个结果类型，例如自定义枚举，它描述了该模块内的函数可能出错的所有可能的事情。
您正确记录了每个函数，说明出错时 return 的代码以及成功时 return 的代码。
您将所有错误处理留给调用者。
如果调用者是另一个模块，它也会将错误传递给它自己的调用者。在适用的情况下，可能将错误重命名为更合适的名称。
错误处理机制位于调用堆栈底部的 main() 中。

这与经典状态机配合使用效果很好。一个典型的主要是：

void main (void)
{
  for(;;)
  {
    serve_watchdog();

    result = state_machine();

    if(result != good)
    {
      error_handler(result);
    }
  }
}

您不应在裸机或 RTOS 微控制器应用程序中使用 malloc，与其说是出于安全原因，不如说是因为 it doesn't make any sense whatsoever 要使用它。编程时运用常识。

出现错误时安全退出到特定状态

Safely Exiting to a Particular State in Case of Error

c

embedded

malloc

exit

safety-critical