使用用户提供的脚本解析文本文件的安全方法

Question

我正在寻找一种方法来解析来自用户提供的 url 中的文本，以及用户也提供的解析器脚本。 eval 很简单，但显然超级可怕。最终目标是只允许用户将我的服务器指向数据源并告诉我的服务器如何读取数据。

最安全的方法是什么？ python 或 node 首选，但我不限于任何特定语言。

例如。这是一个 cvs 文件，但有时我只有一个文本文件。 url: http://www.ams.usda.gov/mnreports/lm_xb803.txt

这个python脚本可以从url读取文件，并存储在数据库中：

expected_length = 6
requiredFeilds = ['low','high']
requiredNonZero = ['low','high']

response = urllib2.urlopen(url)
reader = csv.reader(response)
grade = None
date = None
first_row = True
keep_list = []

for row in reader:
    if len(row) != expected_length:
        continue

    if first_row:
        date_text = row[2]
    date_object = datetime.strptime(date_text, '%m/%d/%Y')
    date = date_object.strftime("%Y-%m-%d")
    first_row = False


row_label = row[0].strip()

row_label = re.sub('\s\s+',' ',row_label)
grade_labels = {
    'Select Cuts':'sl',
    'Choice Cuts':'ch',
    'Choice and Select Cuts':'slch',
    'Ground Beef':'grnd',
    'Beef Trimmings':'trim'
}
if row_label in grade_labels.keys():
    grade = grade_labels[row_label]
    continue

row.insert(0,grade)
row.append(date)

# ignore until grade is selected
if row[0] is None:
    continue

# check rqs
try:
    for field in requiredFeilds:
        if len(row[ormMap[field]]) == 0:
            raise Exception('required field missing')
except:
    continue

try:
    for field in requiredNonZero:
        if row[ormMap[field]] < 1:
            raise Exception('required field missing')
except:
    continue

keep_list.append(row)

Answer 1

我不知道有什么足够强大的语言可以 "safely sandboxed" 来确保熟练的恶意用户不会用他或她提供的脚本（程序）造成破坏运行 -- 并非没有 OS 支持，即

幸运的是，OS 支持是可行的——在这一点上，script/program 是用什么语言编写的就变得无关紧要了。

如果你启动一个虚拟机，运行其中的用户提供的程序具有有限的资源和严密的监督，你可以通过这种方式使事情变得非常安全。

如果您愿意为了减少开销而牺牲一些安全保证，您可以运行 BSD jail 中的用户程序 -- BSD jail 有已经存在了很长时间，非常成熟，经验证明是可靠的。

Linux 容器提供了一种非常相似的方法，并且很有前途，但还没有出现那么久，因此，有些人可能认为它们风险更大。

更进一步的是 Chrome 的 Portable Native Client、https://developer.chrome.com/native-client，它运行是用户的程序（经过适当编译到机器代码）在（Chrome）浏览器中的一个大概安全的沙箱中。

我确信还存在其他解决方案，它们具有类似的总体方法，位于 VM -> jails/containers -> NaCl 谱的某个位置或几乎不在它之外。根据您可以承受的开销，我会尽可能接近这个范围的 "left" (VM) 端——而不是依赖任何 "supposedly sandboxed" 运行时间一种特定的语言...但也许我对此持悲观态度！-)

使用用户提供的脚本解析文本文件的安全方法

Safe method parsing text files with a user provided script

python

etl

node.js