Python 在线程中启动的 SimpleHTTPServer 不会关闭端口
Python's SimpleHTTPServer started in a thread won't close the port
我有以下代码:
import os
from ghost import Ghost
import urlparse, urllib
import SimpleHTTPServer
import SocketServer
import sys, traceback
from threading import Thread, Event
from time import sleep
please_die = Event() # this is my enemy
httpd = None
PORT = 8001
address = 'http://localhost:'+str(PORT)+'/'
search_dir = './category'
def main():
"""
basic run script routine,
FIXME: is supossed to exits gracefully
"""
thread = Thread(target = simpleServe)
try:
thread.start()
run()
except KeyboardInterrupt:
print "Shutdown requested"
except Exception:
traceback.print_exc(file=sys.stdout)
shutdown()
sys.exit(0)
def shutdown():
global httpd
global please_die
print "Shutting down"
# A try - except for the shutdown routine
try:
please_die.wait() # how do you do?
httpd.shutdown() # Please! I whant to run you multiple times.
print "Have you died?"
except Exception:
traceback.print_exc(file=sys.stdout)
def path2url(path):
"""
constructs an url from a relative path / concatenates the global address
variable with the path given
"""
global address
return urlparse.urljoin(address, urllib.pathname2url(path))
def simpleServe():
global httpd, PORT
please_die.set() # Attaching the event to this thread
# Start the service
Handler = SimpleHTTPServer.SimpleHTTPRequestHandler
httpd = SocketServer.TCPServer(("", PORT), Handler)
print "serving at port", PORT
# And loop infinetly in the hope that I can stop you later
httpd.serve_forever()
def run():
global search_dir;
ghost = Ghost() # the webkit facade
with ghost.start() as session:
session.set_viewport_size(2560, 1600) # "retina" size
for directory, subdirectories, files in os.walk(search_dir):
for file in files:
path = os.path.join(directory, file)
urlPath = path2url(path)
process(session, urlPath);
def process(session, urlPath):
page, resources = session.open(urlPath)
assert page.http_status == 200
# ... other asserts here
if __name__ == '__main__':
main()
我们的想法是制作一个启动 "simple http server" 的脚本,对其执行一些请求然后退出。
第一次运行没有任何问题:
...
127.0.0.1 - - [31/Jul/2015 13:16:17] "GET /category/52003.html HTTP/1.1" 200 -
127.0.0.1 - - [31/Jul/2015 13:16:17] "GET /category/52003.html HTTP/1.1" 200 -
127.0.0.1 - - [31/Jul/2015 13:16:17] "GET /category/52003.html HTTP/1.1" 200 -
127.0.0.1 - - [31/Jul/2015 13:16:17] "GET /static/img/glyphicons-halflings.png HTTP/1.1" 200 -
Shutting down
Have you died?
第二次启动时崩溃并提示:
Address already in use
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "download-images.py", line 51, in simpleServe
httpd = SocketServer.TCPServer(("", PORT), Handler)
File "/usr/lib/python2.7/SocketServer.py", line 420, in __init__
self.server_bind()
File "/usr/lib/python2.7/SocketServer.py", line 434, in server_bind
self.socket.bind(self.server_address)
File "/usr/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 98] Address already in use
如果我终止了所有 python 进程,脚本再次运行,因此我假设我使用了错误的线程,但我找不到位置。
更新
忘记说了,
我的 OS 是:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 15.04
Release: 15.04
Codename: vivid
我使用的 python 是:
$ python --version
Python 2.7.9
$ netstat -putelan | grep 8001 打印:
$ netstat -putelan | grep 8001
(Not all processes could be identified, non-owned process info
cp 0 0 127.0.0.1:34691 127.0.0.1:8001 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:8001 127.0.0.1:34866 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:34798 127.0.0.1:8001 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:8001 127.0.0.1:34588 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:34647 127.0.0.1:8001 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:34915 127.0.0.1:8001 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:34674 127.0.0.1:8001 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:34451 127.0.0.1:8001 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:8001 127.0.0.1:34930 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:8001 127.0.0.1:34606 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:34505 127.0.0.1:8001 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:34717 127.0.0.1:8001 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:8001 127.0.0.1:34670 0 0 127.0.0.1:8001 127.0.0.1:34626
...
我不能 post 整个序列(由于 Whosebug 的 post 限制)。其余同理,34***端口和8001端口统一顺序混合。
我看到了 TCP 服务器源代码:
def server_bind(self):
"""Called by constructor to bind the socket.
May be overridden.
"""
if self.allow_reuse_address:
self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self.socket.bind(self.server_address)
self.server_address = self.socket.getsockname()
allow_reuse_address 应该在绑定之前设置。所以试试这个:
SocketServer.TCPServer.allow_reuse_address=True
httpd = SocketServer.TCPServer(("", PORT), Handler)
正如@LFJ 所说,这可能是由于 TCPServer
的 allow_reuse_address
属性造成的。
httpd = SocketServer.TCPServer(("", PORT), Handler, bind_and_activate=False)
httpd.allow_reuse_address = True
try:
httpd.server_bind()
httpd.server_activate()
except:
httpd.server_close()
raise
等效代码:
SocketServer.TCPServer.allow_reuse_address = True
https = SocketServer.TCPServer(("", PORT), Handler)
让我们解释一下原因。
当您启用 TCPServer.allow_reuse_address
时,它会在套接字上添加一个选项:
class TCPServer:
[...]
def server_bind(self):
if self.allow_reuse_address:
self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
[...]
什么是 socket.SO_REUSEADDR
?
This socket option tells the kernel that even if this port is busy (in
the TIME_WAIT state), go ahead and reuse it anyway. If it is busy,
but with another state, you will still get an address already in use
error. It is useful if your server has been shut down, and then
restarted right away while sockets are still active on its port. You
should be aware that if any unexpected data comes in, it may confuse
your server, but while this is possible, it is not likely.
事实上,它允许重复使用你的套接字套接字绑定地址。如果另一个进程在套接字未侦听时尝试绑定,则该进程将被允许使用此套接字绑定地址。
您需要启用它的原因是您没有正确关闭 TCPServer
。为了正确关闭它,您必须使用 运行 shutdown
方法,这将关闭由 server_forever
启动的线程,然后通过调用 server_close
方法正确关闭套接字。
def shutdown():
global httpd
global please_die
print "Shutting down"
try:
please_die.wait() # how do you do?
httpd.shutdown() # Stop the serve_forever
httpd.server_close() # Close also the socket.
except Exception:
traceback.print_exc(file=sys.stdout)
您没有在关闭后清理服务器。这意味着您将留下死套接字资源,OS 不会在进程结束后立即清理这些资源。
您需要在调用 httpd.serve_forever()
之后在 finally 块中调用 httpd.server_close()
。此调用告诉 OS 释放可能与给定服务器实例关联的任何资源。
try:
httpd.serve_forever()
finally:
httpd.server_close()
我有以下代码:
import os
from ghost import Ghost
import urlparse, urllib
import SimpleHTTPServer
import SocketServer
import sys, traceback
from threading import Thread, Event
from time import sleep
please_die = Event() # this is my enemy
httpd = None
PORT = 8001
address = 'http://localhost:'+str(PORT)+'/'
search_dir = './category'
def main():
"""
basic run script routine,
FIXME: is supossed to exits gracefully
"""
thread = Thread(target = simpleServe)
try:
thread.start()
run()
except KeyboardInterrupt:
print "Shutdown requested"
except Exception:
traceback.print_exc(file=sys.stdout)
shutdown()
sys.exit(0)
def shutdown():
global httpd
global please_die
print "Shutting down"
# A try - except for the shutdown routine
try:
please_die.wait() # how do you do?
httpd.shutdown() # Please! I whant to run you multiple times.
print "Have you died?"
except Exception:
traceback.print_exc(file=sys.stdout)
def path2url(path):
"""
constructs an url from a relative path / concatenates the global address
variable with the path given
"""
global address
return urlparse.urljoin(address, urllib.pathname2url(path))
def simpleServe():
global httpd, PORT
please_die.set() # Attaching the event to this thread
# Start the service
Handler = SimpleHTTPServer.SimpleHTTPRequestHandler
httpd = SocketServer.TCPServer(("", PORT), Handler)
print "serving at port", PORT
# And loop infinetly in the hope that I can stop you later
httpd.serve_forever()
def run():
global search_dir;
ghost = Ghost() # the webkit facade
with ghost.start() as session:
session.set_viewport_size(2560, 1600) # "retina" size
for directory, subdirectories, files in os.walk(search_dir):
for file in files:
path = os.path.join(directory, file)
urlPath = path2url(path)
process(session, urlPath);
def process(session, urlPath):
page, resources = session.open(urlPath)
assert page.http_status == 200
# ... other asserts here
if __name__ == '__main__':
main()
我们的想法是制作一个启动 "simple http server" 的脚本,对其执行一些请求然后退出。
第一次运行没有任何问题:
...
127.0.0.1 - - [31/Jul/2015 13:16:17] "GET /category/52003.html HTTP/1.1" 200 -
127.0.0.1 - - [31/Jul/2015 13:16:17] "GET /category/52003.html HTTP/1.1" 200 -
127.0.0.1 - - [31/Jul/2015 13:16:17] "GET /category/52003.html HTTP/1.1" 200 -
127.0.0.1 - - [31/Jul/2015 13:16:17] "GET /static/img/glyphicons-halflings.png HTTP/1.1" 200 -
Shutting down
Have you died?
第二次启动时崩溃并提示:
Address already in use
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "download-images.py", line 51, in simpleServe
httpd = SocketServer.TCPServer(("", PORT), Handler)
File "/usr/lib/python2.7/SocketServer.py", line 420, in __init__
self.server_bind()
File "/usr/lib/python2.7/SocketServer.py", line 434, in server_bind
self.socket.bind(self.server_address)
File "/usr/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 98] Address already in use
如果我终止了所有 python 进程,脚本再次运行,因此我假设我使用了错误的线程,但我找不到位置。
更新
忘记说了,
我的 OS 是:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 15.04
Release: 15.04
Codename: vivid
我使用的 python 是:
$ python --version
Python 2.7.9
$ netstat -putelan | grep 8001 打印:
$ netstat -putelan | grep 8001
(Not all processes could be identified, non-owned process info
cp 0 0 127.0.0.1:34691 127.0.0.1:8001 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:8001 127.0.0.1:34866 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:34798 127.0.0.1:8001 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:8001 127.0.0.1:34588 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:34647 127.0.0.1:8001 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:34915 127.0.0.1:8001 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:34674 127.0.0.1:8001 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:34451 127.0.0.1:8001 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:8001 127.0.0.1:34930 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:8001 127.0.0.1:34606 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:34505 127.0.0.1:8001 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:34717 127.0.0.1:8001 TIME_WAIT 0 0 -
tcp 0 0 127.0.0.1:8001 127.0.0.1:34670 0 0 127.0.0.1:8001 127.0.0.1:34626
...
我不能 post 整个序列(由于 Whosebug 的 post 限制)。其余同理,34***端口和8001端口统一顺序混合。
我看到了 TCP 服务器源代码:
def server_bind(self):
"""Called by constructor to bind the socket.
May be overridden.
"""
if self.allow_reuse_address:
self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self.socket.bind(self.server_address)
self.server_address = self.socket.getsockname()
allow_reuse_address 应该在绑定之前设置。所以试试这个:
SocketServer.TCPServer.allow_reuse_address=True
httpd = SocketServer.TCPServer(("", PORT), Handler)
正如@LFJ 所说,这可能是由于 TCPServer
的 allow_reuse_address
属性造成的。
httpd = SocketServer.TCPServer(("", PORT), Handler, bind_and_activate=False)
httpd.allow_reuse_address = True
try:
httpd.server_bind()
httpd.server_activate()
except:
httpd.server_close()
raise
等效代码:
SocketServer.TCPServer.allow_reuse_address = True
https = SocketServer.TCPServer(("", PORT), Handler)
让我们解释一下原因。
当您启用 TCPServer.allow_reuse_address
时,它会在套接字上添加一个选项:
class TCPServer:
[...]
def server_bind(self):
if self.allow_reuse_address:
self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
[...]
什么是 socket.SO_REUSEADDR
?
This socket option tells the kernel that even if this port is busy (in the TIME_WAIT state), go ahead and reuse it anyway. If it is busy, but with another state, you will still get an address already in use error. It is useful if your server has been shut down, and then restarted right away while sockets are still active on its port. You should be aware that if any unexpected data comes in, it may confuse your server, but while this is possible, it is not likely.
事实上,它允许重复使用你的套接字套接字绑定地址。如果另一个进程在套接字未侦听时尝试绑定,则该进程将被允许使用此套接字绑定地址。
您需要启用它的原因是您没有正确关闭 TCPServer
。为了正确关闭它,您必须使用 运行 shutdown
方法,这将关闭由 server_forever
启动的线程,然后通过调用 server_close
方法正确关闭套接字。
def shutdown():
global httpd
global please_die
print "Shutting down"
try:
please_die.wait() # how do you do?
httpd.shutdown() # Stop the serve_forever
httpd.server_close() # Close also the socket.
except Exception:
traceback.print_exc(file=sys.stdout)
您没有在关闭后清理服务器。这意味着您将留下死套接字资源,OS 不会在进程结束后立即清理这些资源。
您需要在调用 httpd.serve_forever()
之后在 finally 块中调用 httpd.server_close()
。此调用告诉 OS 释放可能与给定服务器实例关联的任何资源。
try:
httpd.serve_forever()
finally:
httpd.server_close()