django 和命令行中的不同 nltk 结果

different nltk results in django and at command line

我有一个 django 1.8 视图,如下所示:

def sourcedoc_parse(request, sourcedoc_id):
    sourcedoc = Sourcedoc.objects.get(pk=sourcedoc_id)
    nltk.data.path.append('/root/nltk_data')
    new_words = []
    english_vocab = set(w.lower() for w in nltk.corpus.gutenberg.words())    #<---the line where the error occurs
    results = {}

    template = 'sourcedoc_parse.html'
    params = {'sourcedoc': sourcedoc,'results': results, 'new_words': new_words, 'BASE_URL': BASE_URL}

    return render_to_response(template, params, context_instance=RequestContext(request))

它给我以下错误:

Django Version: 1.8
Python Version: 2.7.6
...
Traceback:
File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py" in get_response
132.                     response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/home/rosshartshorn/htdocs/worldmaker/sourcedocs/views.py" in sourcedoc_parse
107.     english_vocab = set(w.lower() for w in nltk.corpus.gutenberg.words())
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/util.py" in __getattr__
68.         self.__load()

__load 中的文件“/usr/local/lib/python2.7/dist-packages/nltk/corpus/util.py” 56. 除了 LookupError: raise e

Exception Type: LookupError at /sourcedoc/parse/13/
Exception Value: 
**********************************************************************
Resource 'corpora/gutenberg' not found.  Please use the NLTK
Downloader to obtain the resource:  >>> nltk.download()
Searched in:
- '/var/www/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/root/nltk_data'
**********************************************************************

特别奇怪的是,当我在 python shell 的同一目录中执行它时它工作正常,它工作正常:

Python 2.7.6 (default, Mar 22 2014, 22:59:38) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> english_vocab = set(w.lower() for w in nltk.corpus.gutenberg.words())
>>> 'jabberwocky' in english_vocab
False
>>> 'monster' in english_vocab
True
>>> nltk.data.path
['/root/nltk_data', '/usr/share/nltk_data', '/usr/local/share/nltk_data', '/usr/lib/nltk_data', '/usr/local/lib/nltk_data']

有谁知道 运行 它在 django 的视图中与在 python 命令行中做同样的事情之间的区别来源是什么?我用 'python manage.py shell' 做了同样的事情,它也是这样工作的。

也欢迎任何关于发现差异的调试建议。

这里的问题是用户 运行 django 没有权限读取 /root。

当 运行 django shell 时不会发生,因为你是 运行 shell 作为 root,但服务器是 运行 作为 www用户(请参阅,nltk 搜索所在的第一个目录 /var/www/nltk_data,www 用户的主目录)。