Python 子进程在发出 HTTP 请求时无提示地崩溃

Question

我运行在组合多处理、请求（或 urllib2）和 nltk 时遇到问题。这是一个非常简单的代码：

>>> from multiprocessing import Process
>>> import requests
>>> from pprint import pprint
>>> Process(target=lambda: pprint(
        requests.get('https://api.github.com'))).start()
>>> <Response [200]>  # this is the response displayed by the call to `pprint`.

关于这段代码的更多细节：

导入一些必需的模块
启动子进程
从子进程向 'api.github.com' 发出 HTTP GET 请求
显示结果

效果很好。导入nltk时出现问题：

>>> import nltk
>>> Process(target=lambda: pprint(
        requests.get('https://api.github.com'))).start()
>>> # nothing happens!

导入 NLTK 后，请求实际上悄悄地使线程崩溃（如果您尝试使用命名函数而不是 lambda 函数，在调用前后添加一些 print 语句，您会看到执行在调用 requests.get 时停止）有没有人知道 NLTK 中的什么可以解释这种行为，以及如何克服这个问题？

这是我使用的版本：

$> python --version
Python 2.7.5
$> pip freeze | grep nltk
nltk==2.0.5
$> pip freeze | grep requests
requests==2.2.1

我是运行 Mac OS X v. 10.9.5.

谢谢！

Answer 1

似乎很少在子进程中使用 Nltk 和 Python 请求。尝试使用 Thread 而不是 Process，我遇到了与其他一些库和 Requests 完全相同的问题，并将 Process 替换为 Thread 对我有用。

Answer 2

更新您的 python 库和 python 应该可以解决问题：

alvas@ubi:~$ pip freeze | grep nltk
nltk==3.0.3
alvas@ubi:~$ pip freeze | grep requests
requests==2.7.0
alvas@ubi:~$ python --version
Python 2.7.6
alvas@ubi:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.2 LTS
Release:    14.04
Codename:   trusty

来自代码：

from multiprocessing import Process
import nltk
import time


def child_fn():
    print "Fetch URL"
    import urllib2
    print urllib2.urlopen("https://www.google.com").read()[:100]
    print "Done"


while True:
    child_process = Process(target=child_fn)
    child_process.start()
    child_process.join()
    print "Child process returned"
    time.sleep(1)

[输出]:

Fetch URL
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de"><head><meta content
Done
Child process returned
Fetch URL
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de"><head><meta content
Done
Child process returned
Fetch URL
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de"><head><meta content
Done
Child process returned

来自代码：

alvas@ubi:~$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing import Process
>>> import requests
>>> from pprint import pprint
>>> Process(target=lambda: pprint(
...         requests.get('https://api.github.com'))).start()
>>> <Response [200]>

>>> import nltk
>>> Process(target=lambda: pprint(
...         requests.get('https://api.github.com'))).start()
>>> <Response [200]>

它也应该与 python3 一起使用：

alvas@ubi:~$ python3
Python 3.4.0 (default, Jun 19 2015, 14:20:21) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing import Process
>>> import requests
>>> Process(target=lambda: print(requests.get('https://api.github.com'))).start()
>>> 
>>> <Response [200]>

>>> import nltk
>>> Process(target=lambda: print(requests.get('https://api.github.com'))).start()
>>> <Response [200]>

Answer 3

这个问题好像还没有解决。 https://github.com/nltk/nltk/issues/947 我认为这是一个严重的问题（除非你正在玩 NLTK，做 POC 和尝试模型，而不是实际的应用程序）我是运行 RQ worker 中的 NLP 管道 (http://python-rq.org/)

nltk==3.2.1
requests==2.9.1

Python 子进程在发出 HTTP 请求时无提示地崩溃

Python child process silently crashes when issuing an HTTP request

python

nltk

python-2.7

python-requests

python-multiprocessing