RestrictedPython:在用户指定的代码中调用其他函数?

RestrictedPython: Call other functions within user-specified code?

使用带有自定义 _import 定义的 来指定要限制的模块是一个很好的基础,但是当调用上述 user_code 中的函数时,由于必须将所有内容列入白名单,自然而然有什么方法可以调用其他用户定义的函数吗?对其他沙盒解决方案开放,尽管 Jupyter 似乎不能直接嵌入到 Web 界面中。

from RestrictedPython import safe_builtins, compile_restricted
from RestrictedPython.Eval import default_guarded_getitem

def _import(name, globals=None, locals=None, fromlist=(), level=0):
    safe_modules = ["math"]
    if name in safe_modules:
       globals[name] = __import__(name, globals, locals, fromlist, level)
    else:
        raise Exception("Don't you even think about it {0}".format(name))

safe_builtins['__import__'] = _import # Must be a part of builtins

def execute_user_code(user_code, user_func, *args, **kwargs):
    """ Executed user code in restricted env
        Args:
            user_code(str) - String containing the unsafe code
            user_func(str) - Function inside user_code to execute and return value
            *args, **kwargs - arguments passed to the user function
        Return:
            Return value of the user_func
    """

    def _apply(f, *a, **kw):
        return f(*a, **kw)

    try:
        # This is the variables we allow user code to see. @result will contain return value.
        restricted_locals = { 
            "result": None,
            "args": args,
            "kwargs": kwargs,
        }   

        # If you want the user to be able to use some of your functions inside his code,
        # you should add this function to this dictionary.
        # By default many standard actions are disabled. Here I add _apply_ to be able to access
        # args and kwargs and _getitem_ to be able to use arrays. Just think before you add
        # something else. I am not saying you shouldn't do it. You should understand what you
        # are doing thats all.
        restricted_globals = { 
            "__builtins__": safe_builtins,
            "_getitem_": default_guarded_getitem,
            "_apply_": _apply,
        }   

        # Add another line to user code that executes @user_func
        user_code += "\nresult = {0}(*args, **kwargs)".format(user_func)

        # Compile the user code
        byte_code = compile_restricted(user_code, filename="<user_code>", mode="exec")

        # Run it
        exec(byte_code, restricted_globals, restricted_locals)
       # User code has modified result inside restricted_locals. Return it.
        return restricted_locals["result"]

    except SyntaxError as e:
        # Do whaever you want if the user has code that does not compile
        raise
    except Exception as e:
        # The code did something that is not allowed. Add some nasty punishment to the user here.
        raise

i_example = """
import math

def foo():
    return 7

def myceil(x):
    return math.ceil(x)+foo()
"""
print(execute_user_code(i_example, "myceil", 1.5))

运行这个returns'foo'没有定义

exec(byte_code, restricted_globals, restricted_locals)中:

  1. def foo(): 修改 restricted_locals.
  2. myceil只能看到它的全局变量,即myceil.__globals__,也就是restricted_globals.

您可以在 _apply 中更新 f.__globals__/restricted_globals:

def _apply(f, *a, **kw):
    for k, v in restricted_locals.items():
        if k not in _restricted_locals_keys and k not in f.__globals__:
            f.__globals__[k] = v
    return f(*a, **kw)

其中 _restricted_locals_keys 是:

restricted_locals = {
    "result": None,
    "args": args,
    "kwargs": kwargs,
}
_restricted_locals_keys = set(restricted_locals.keys())

如果您不想修改 restricted_globals,则在 _apply 中使用新的全局变量复制 f

import copy
import types

def _copy_func(f, func_globals=None):
    func_globals = func_globals or f.__globals__
    return types.FunctionType(f.__code__, func_globals, name=f.__name__, argdefs=f.__defaults__, closure=f.__closure__)

def _apply(f, *a, **kw):
    func_globals = copy.copy(f.__globals__)
    for k, v in restricted_locals.items():
        if k not in _restricted_locals_keys and k not in func_globals:
            func_globals[k] = v
    f = _copy_func(f, func_globals=func_globals)
    return f(*a, **kw)

首先,__import__ built-in 的替换实现不正确。 built-in 应该 return 导入的模块,而不是改变全局变量以包含它:

Python 3.9.12 (main, Mar 24 2022, 13:02:21)
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> __import__('math')
<module 'math' (built-in)>
>>> math
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'math' is not defined

重新实现 __import__ 的更好方法是:

_SAFE_MODULES = frozenset(("math",))

def _safe_import(name, *args, **kwargs):
    if name not in _SAFE_MODULES:
        raise Exception(f"Don't you even think about {name!r}")
    return __import__(name, *args, **kwargs)

您在原始实现中改变全局变量的事实部分掩盖了主要错误。即:受限代码(函数定义、变量赋值和导入)中的名称分配会改变局部字典,但名称 look-ups 默认情况下作为全局 look-ups 完成,完全绕过当地人。您可以通过使用 __import__('dis').dis(byte_code):

反汇编受限字节码来查看这一点
  2           0 LOAD_CONST               0 (0)
              2 LOAD_CONST               1 (None)
              4 IMPORT_NAME              0 (math)
              6 STORE_NAME               0 (math)

  4           8 LOAD_CONST               2 (<code object foo at 0x7fbef4eef3a0, file "<user_code>", line 4>)
             10 LOAD_CONST               3 ('foo')
             12 MAKE_FUNCTION            0
             14 STORE_NAME               1 (foo)

  7          16 LOAD_CONST               4 (<code object myceil at 0x7fbef4eef660, file "<user_code>", line 7>)
             18 LOAD_CONST               5 ('myceil')
             20 MAKE_FUNCTION            0
             22 STORE_NAME               2 (myceil)
             24 LOAD_CONST               1 (None)
             26 RETURN_VALUE

Disassembly of <code object foo at 0x7fbef4eef3a0, file "<user_code>", line 4>:
  5           0 LOAD_CONST               1 (7)
              2 RETURN_VALUE

Disassembly of <code object myceil at 0x7fbef4eef660, file "<user_code>", line 7>:
  8           0 LOAD_GLOBAL              0 (_getattr_)
              2 LOAD_GLOBAL              1 (math)
              4 LOAD_CONST               1 ('ceil')
              6 CALL_FUNCTION            2
              8 LOAD_FAST                0 (x)
             10 CALL_FUNCTION            1
             12 LOAD_GLOBAL              2 (foo)
             14 CALL_FUNCTION            0
             16 BINARY_ADD
             18 RETURN_VALUE

exec 的文档解释(强调我的):

If only globals is provided, it must be a dictionary (and not a subclass of dictionary), which will be used for both the global and the local variables. If globals and locals are given, they are used for the global and local variables, respectively. If provided, locals can be any mapping object. Remember that at the module level, globals and locals are the same dictionary. If exec gets two separate objects as globals and locals, the code will be executed as if it were embedded in a class definition.

这使得局部变量和全局变量的单独映射完全虚假。因此,我们可以简单地摆脱 locals dict,并将所有内容放入 globals 中。整个代码应如下所示:

from RestrictedPython import safe_builtins, compile_restricted


_SAFE_MODULES = frozenset(("math",))


def _safe_import(name, *args, **kwargs):
    if name not in _SAFE_MODULES:
        raise Exception(f"Don't you even think about {name!r}")
    return __import__(name, *args, **kwargs)


def execute_user_code(user_code, user_func, *args, **kwargs):
    my_globals = {
        "__builtins__": {
            **safe_builtins,
            "__import__": _safe_import,
        },
    }

    try:
        byte_code = compile_restricted(
            user_code, filename="<user_code>", mode="exec")
    except SyntaxError:
        # syntax error in the sandboxed code
        raise

    try:
        exec(byte_code, my_globals)
        return my_globals[user_func](*args, **kwargs)
    except BaseException:
        # runtime error (probably) in the sandboxed code
        raise

以上我还设法解决了一些切线问题:

  • 我没有将函数调用注入编译代码段,而是直接在全局字典中查找函数。如果 user_func 恰好来自不受信任的来源,这可以避免潜在的代码注入向量,并且避免必须将 argskwargsresult 注入沙箱,这将启用沙盒代码来破坏它。
  • 我避免改变 RestrictedPython 模块提供的 safe_builtins 对象。否则,如果您的程序中的任何其他代码恰好正在使用 RestrictedPython,它可能会受到影响。
  • 我将异常处理分为两个步骤:编译和执行。这最大限度地减少了沙盒代码中的错误被错误归因于沙盒代码的可能性。
  • 我将捕获的运行时异常类型更改为 BaseException,以便在沙盒代码尝试引发 KeyboardInterruptSystemExit(不是从 Exception, 但只有 BaseException).
  • 我还删除了对 _getitem__apply_ 的引用,它们似乎没有任何用途。如果最终证明它们是必要的,您可以恢复它们。

(但请注意,这仍然无法通过沙箱内的无限循环防止 DoS。)