如果无限 wait() 已经启动,则信号处理程序挂起 Popen.wait(timeout)
signal handler hangs in Popen.wait(timeout) if an infinite wait() was started already
我遇到了一个 Python 子流程问题,我在 Python 3.6 和 3.7 上复制了这个问题,但我不明白。我有一个程序,称之为 Main,它使用 subprocess.Popen() 启动一个外部进程,称之为“Slave”。主程序注册了一个 SIGTERM 信号处理程序。 Main 使用 proc.wait(None) 或 proc.wait(timeout) 等待 Slave 进程完成。可以通过向 Main 发送 SIGTERM 信号来中断 Slave 进程。 sigterm 处理程序将 SIGINT 信号发送到从设备并等待(30)它终止。如果 Main 使用 wait(None),那么 sigterm 处理程序的 wait(30) 将等待整整 30 秒,即使从属进程已终止。如果 Main 使用 wait(timeout) 版本,那么一旦 Slave 终止,sigterm 处理程序的 wait(30) 将 return。
这是一个演示该问题的小型测试应用程序。 运行 它通过 python wait_test.py
使用非超时等待(None)。 运行 它通过 python wait_test.py <timeout value>
为 Main 等待提供特定的超时。
程序 运行ning 后,执行 kill -15 <pid>
并查看应用的反应。
#
# Save this to a file called wait_test.py
#
import signal
import subprocess
import sys
from datetime import datetime
slave_proc = None
def sigterm_handler(signum, stack):
print("Process received SIGTERM signal {} while processing job!".format(signum))
print("slave_proc is {}".format(slave_proc))
if slave_proc is not None:
try:
print("{}: Sending SIGINT to slave.".format(datetime.now()))
slave_proc.send_signal(signal.SIGINT)
slave_proc.wait(30)
print("{}: Handler wait completed.".format(datetime.now()))
except subprocess.TimeoutExpired:
slave_proc.terminate()
except Exception as exception:
print('Sigterm Exception: {}'.format(exception))
slave_proc.terminate()
slave_proc.send_signal(signal.SIGKILL)
def main(wait_val=None):
with open("stdout.txt", 'w+') as stdout:
with open("stderr.txt", 'w+') as stderr:
proc = subprocess.Popen(["python", "wait_test.py", "slave"],
stdout=stdout,
stderr=stderr,
universal_newlines=True)
print('Slave Started')
global slave_proc
slave_proc = proc
try:
proc.wait(wait_val) # If this is a no-timeout wait, ie: wait(None), then will hang in sigterm_handler.
print('Slave Finished by itself.')
except subprocess.TimeoutExpired as te:
print(te)
print('Slave finished by timeout')
proc.send_signal(signal.SIGINT)
proc.wait()
print("Job completed")
if __name__ == '__main__':
if len(sys.argv) > 1 and sys.argv[1] == 'slave':
while True:
pass
signal.signal(signal.SIGTERM, sigterm_handler)
main(int(sys.argv[1]) if len(sys.argv) > 1 else None)
print("{}: Exiting main.".format(datetime.now()))
这是两个 运行 的示例:
Note here the 30 second delay
--------------------------------
[mkurtz@localhost testing]$ python wait_test.py
Slave Started
Process received SIGTERM signal 15 while processing job!
slave_proc is <subprocess.Popen object at 0x7f79b50e8d90>
2022-03-30 11:08:15.526319: Sending SIGINT to slave. <--- 11:08:15
Slave Finished by itself.
Job completed
2022-03-30 11:08:45.526942: Exiting main. <--- 11:08:45
Note here the instantaneous shutdown
-------------------------------------
[mkurtz@localhost testing]$ python wait_test.py 100
Slave Started
Process received SIGTERM signal 15 while processing job!
slave_proc is <subprocess.Popen object at 0x7fa2412a2dd0>
2022-03-30 11:10:03.649931: Sending SIGINT to slave. <--- 11:10:03.649
2022-03-30 11:10:03.653170: Handler wait completed. <--- 11:10:03.653
Slave Finished by itself.
Job completed
2022-03-30 11:10:03.673234: Exiting main. <--- 11:10:03.673
这些特定测试是 运行 在 CentOS 7 上使用 Python 3.7.9。
有人可以解释这种行为吗?
Popen
class 有一个 internal lock for wait
operations:
# Held while anything is calling waitpid before returncode has been
# updated to prevent clobbering returncode if wait() or poll() are
# called from multiple threads at once. After acquiring the lock,
# code must re-check self.returncode to see if another thread just
# finished a waitpid() call.
self._waitpid_lock = threading.Lock()
与wait()
and wait(timeout=...)
的主要区别是前者持有锁无限期等待,而后者是释放锁的忙循环锁定每次迭代.
if timeout is not None:
...
while True:
if self._waitpid_lock.acquire(False):
try:
...
# wait without any delay
(pid, sts) = self._try_wait(os.WNOHANG)
...
finally:
self._waitpid_lock.release()
...
time.sleep(delay)
else:
while self.returncode is None:
with self._waitpid_lock: # acquire lock unconditionally
...
# wait indefinitley
(pid, sts) = self._try_wait(0)
这对于常规并发代码来说不是问题 - 即 threading
- 因为线程 运行 wait()
和持有锁将在子进程完成后立即被唤醒.这反过来又允许等待 lock/subprocess 的所有其他线程迅速进行。
但是,当 a) main 线程持有 wait()
中的锁和 b) 信号处理程序 试图等待。信号处理程序的一个微妙之处在于它们会中断主线程:
signal
: Signals and Threads
Python signal handlers are always executed in the main Python thread of the main interpreter, even if the signal was received in another thread. […]
由于信号处理程序在主线程中运行,主线程的常规代码执行将暂停,直到信号处理程序完成!
通过信号处理程序中的 运行 wait
,a) 信号处理程序阻塞等待锁,b) 锁阻塞等待信号处理程序。只有在信号处理程序 wait
超时后,“主线程”才会恢复,收到 suprocess 完成的确认,设置 return 代码并释放锁。
我遇到了一个 Python 子流程问题,我在 Python 3.6 和 3.7 上复制了这个问题,但我不明白。我有一个程序,称之为 Main,它使用 subprocess.Popen() 启动一个外部进程,称之为“Slave”。主程序注册了一个 SIGTERM 信号处理程序。 Main 使用 proc.wait(None) 或 proc.wait(timeout) 等待 Slave 进程完成。可以通过向 Main 发送 SIGTERM 信号来中断 Slave 进程。 sigterm 处理程序将 SIGINT 信号发送到从设备并等待(30)它终止。如果 Main 使用 wait(None),那么 sigterm 处理程序的 wait(30) 将等待整整 30 秒,即使从属进程已终止。如果 Main 使用 wait(timeout) 版本,那么一旦 Slave 终止,sigterm 处理程序的 wait(30) 将 return。
这是一个演示该问题的小型测试应用程序。 运行 它通过 python wait_test.py
使用非超时等待(None)。 运行 它通过 python wait_test.py <timeout value>
为 Main 等待提供特定的超时。
程序 运行ning 后,执行 kill -15 <pid>
并查看应用的反应。
#
# Save this to a file called wait_test.py
#
import signal
import subprocess
import sys
from datetime import datetime
slave_proc = None
def sigterm_handler(signum, stack):
print("Process received SIGTERM signal {} while processing job!".format(signum))
print("slave_proc is {}".format(slave_proc))
if slave_proc is not None:
try:
print("{}: Sending SIGINT to slave.".format(datetime.now()))
slave_proc.send_signal(signal.SIGINT)
slave_proc.wait(30)
print("{}: Handler wait completed.".format(datetime.now()))
except subprocess.TimeoutExpired:
slave_proc.terminate()
except Exception as exception:
print('Sigterm Exception: {}'.format(exception))
slave_proc.terminate()
slave_proc.send_signal(signal.SIGKILL)
def main(wait_val=None):
with open("stdout.txt", 'w+') as stdout:
with open("stderr.txt", 'w+') as stderr:
proc = subprocess.Popen(["python", "wait_test.py", "slave"],
stdout=stdout,
stderr=stderr,
universal_newlines=True)
print('Slave Started')
global slave_proc
slave_proc = proc
try:
proc.wait(wait_val) # If this is a no-timeout wait, ie: wait(None), then will hang in sigterm_handler.
print('Slave Finished by itself.')
except subprocess.TimeoutExpired as te:
print(te)
print('Slave finished by timeout')
proc.send_signal(signal.SIGINT)
proc.wait()
print("Job completed")
if __name__ == '__main__':
if len(sys.argv) > 1 and sys.argv[1] == 'slave':
while True:
pass
signal.signal(signal.SIGTERM, sigterm_handler)
main(int(sys.argv[1]) if len(sys.argv) > 1 else None)
print("{}: Exiting main.".format(datetime.now()))
这是两个 运行 的示例:
Note here the 30 second delay
--------------------------------
[mkurtz@localhost testing]$ python wait_test.py
Slave Started
Process received SIGTERM signal 15 while processing job!
slave_proc is <subprocess.Popen object at 0x7f79b50e8d90>
2022-03-30 11:08:15.526319: Sending SIGINT to slave. <--- 11:08:15
Slave Finished by itself.
Job completed
2022-03-30 11:08:45.526942: Exiting main. <--- 11:08:45
Note here the instantaneous shutdown
-------------------------------------
[mkurtz@localhost testing]$ python wait_test.py 100
Slave Started
Process received SIGTERM signal 15 while processing job!
slave_proc is <subprocess.Popen object at 0x7fa2412a2dd0>
2022-03-30 11:10:03.649931: Sending SIGINT to slave. <--- 11:10:03.649
2022-03-30 11:10:03.653170: Handler wait completed. <--- 11:10:03.653
Slave Finished by itself.
Job completed
2022-03-30 11:10:03.673234: Exiting main. <--- 11:10:03.673
这些特定测试是 运行 在 CentOS 7 上使用 Python 3.7.9。 有人可以解释这种行为吗?
Popen
class 有一个 internal lock for wait
operations:
# Held while anything is calling waitpid before returncode has been
# updated to prevent clobbering returncode if wait() or poll() are
# called from multiple threads at once. After acquiring the lock,
# code must re-check self.returncode to see if another thread just
# finished a waitpid() call.
self._waitpid_lock = threading.Lock()
与wait()
and wait(timeout=...)
的主要区别是前者持有锁无限期等待,而后者是释放锁的忙循环锁定每次迭代.
if timeout is not None:
...
while True:
if self._waitpid_lock.acquire(False):
try:
...
# wait without any delay
(pid, sts) = self._try_wait(os.WNOHANG)
...
finally:
self._waitpid_lock.release()
...
time.sleep(delay)
else:
while self.returncode is None:
with self._waitpid_lock: # acquire lock unconditionally
...
# wait indefinitley
(pid, sts) = self._try_wait(0)
这对于常规并发代码来说不是问题 - 即 threading
- 因为线程 运行 wait()
和持有锁将在子进程完成后立即被唤醒.这反过来又允许等待 lock/subprocess 的所有其他线程迅速进行。
但是,当 a) main 线程持有 wait()
中的锁和 b) 信号处理程序 试图等待。信号处理程序的一个微妙之处在于它们会中断主线程:
signal
: Signals and ThreadsPython signal handlers are always executed in the main Python thread of the main interpreter, even if the signal was received in another thread. […]
由于信号处理程序在主线程中运行,主线程的常规代码执行将暂停,直到信号处理程序完成!
通过信号处理程序中的 运行 wait
,a) 信号处理程序阻塞等待锁,b) 锁阻塞等待信号处理程序。只有在信号处理程序 wait
超时后,“主线程”才会恢复,收到 suprocess 完成的确认,设置 return 代码并释放锁。