ThreadPoolExecutor 正在泄漏内存
ThreadPoolExecutor is leaking memory
我正在使用 ThreadPoolExecutor,但我不明白它为什么会泄漏内存:
def callback(message):
# Memory intensive operation
x = [n for n in range(int(1e6))]
return message
@profile
def main():
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
executor.map(callback, [x for x in range(10)])
time.sleep(2)
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
executor.map(callback, [x for x in range(10)])
time.sleep(2)
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
executor.map(callback, [x for x in range(10)])
time.sleep(2)
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
executor.map(callback, [x for x in range(10)])
time.sleep(2)
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
executor.map(callback, [x for x in range(10)])
time.sleep(2)
if __name__ == '__main__':
main()
生成此图
到底发生了什么事?如果我 运行 同步此代码,则不会发生这种情况。
回答您的评论请问您是否真的尝试过您所说的?那么,在那种情况下如何呢?
我用那个程序测试:
import concurrent.futures
import time,sys,os,gc
def callback(message):
# Memory intensive operation
x = [n for n in range(int(1e6))]
return message
def main():
delay = int(sys.argv[2])
for _ in range(0, int(sys.argv[1])):
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
executor.map(callback, [x for x in range(10)])
time.sleep(delay)
gc.collect()
os.system("ps l -C python3")
if __name__ == '__main__':
main()
每次睡眠 0 秒时的一些执行:
pi@raspberrypi:/tmp $ python3 p.py 1 0
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 16784 2657 20 0 57616 11840 do_wai S+ pts/0 0:01 python3 p.py 1 0
pi@raspberrypi:/tmp $ python3 p.py 2 0
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 16811 2657 20 0 57872 12100 do_wai S+ pts/0 0:03 python3 p.py 2 0
pi@raspberrypi:/tmp $ python3 p.py 10 0
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 16835 2657 20 0 58412 12760 do_wai S+ pts/0 0:18 python3 p.py 10 0
pi@raspberrypi:/tmp $ python3 p.py 20 0
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 16950 2657 20 0 58924 13240 do_wai S+ pts/0 0:35 python3 p.py 20 0
pi@raspberrypi:/tmp $ python3 p.py 100 0
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 17179 2657 20 0 57900 12320 do_wai S+ pts/0 2:58 python3 p.py 100 0
pi@raspberrypi:/tmp $
如你所见,100 圈后的尺寸比 10 圈和 20 圈后的尺寸小
现在,如果我每转一圈等待 1 秒,结果将与之前不同且更小或相等:
pi@raspberrypi:/tmp $ python3 p.py 1 1
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 18251 2657 20 0 57360 11708 do_wai S+ pts/0 0:01 python3 p.py 1 1
pi@raspberrypi:/tmp $ python3 p.py 2 1
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 18276 2657 20 0 57616 11840 do_wai S+ pts/0 0:03 python3 p.py 2 1
pi@raspberrypi:/tmp $ python3 p.py 10 1
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 18299 2657 20 0 58412 12748 do_wai S+ pts/0 0:17 python3 p.py 10 1
pi@raspberrypi:/tmp $ python3 p.py 20 1
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 18417 2657 20 0 58412 12824 do_wai S+ pts/0 0:34 python3 p.py 20 1
pi@raspberrypi:/tmp $ python3 p.py 100 1
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 20469 2657 20 0 57900 12328 do_wai S+ pts/0 3:13 python3 p.py 100 1
pi@raspberrypi:/tmp $
对于 python 的观点,因为没有意义,可能可变性并非(仅)来自 Python,而是(也)时间 Linux 需要管理线程消失。当然,如果你同步地做所有事情,你就不会有这个问题
我在 PI4 上做过:
pi@raspberrypi:/tmp $ uname -a
Linux raspberrypi 5.10.17-v7l+ #1403 SMP Mon Feb 22 11:33:35 GMT 2021 armv7l GNU/Linux
pi@raspberrypi:/tmp $ python3 --version
Python 3.7.3
pi@raspberrypi:/tmp $
您还可以在图表中看到最后 3 个较低的级别是相等的(例如赋值 100),这与内存泄漏或至少是可见的内存泄漏不兼容
我正在使用 ThreadPoolExecutor,但我不明白它为什么会泄漏内存:
def callback(message):
# Memory intensive operation
x = [n for n in range(int(1e6))]
return message
@profile
def main():
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
executor.map(callback, [x for x in range(10)])
time.sleep(2)
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
executor.map(callback, [x for x in range(10)])
time.sleep(2)
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
executor.map(callback, [x for x in range(10)])
time.sleep(2)
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
executor.map(callback, [x for x in range(10)])
time.sleep(2)
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
executor.map(callback, [x for x in range(10)])
time.sleep(2)
if __name__ == '__main__':
main()
生成此图
到底发生了什么事?如果我 运行 同步此代码,则不会发生这种情况。
回答您的评论请问您是否真的尝试过您所说的?那么,在那种情况下如何呢?
我用那个程序测试:
import concurrent.futures
import time,sys,os,gc
def callback(message):
# Memory intensive operation
x = [n for n in range(int(1e6))]
return message
def main():
delay = int(sys.argv[2])
for _ in range(0, int(sys.argv[1])):
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
executor.map(callback, [x for x in range(10)])
time.sleep(delay)
gc.collect()
os.system("ps l -C python3")
if __name__ == '__main__':
main()
每次睡眠 0 秒时的一些执行:
pi@raspberrypi:/tmp $ python3 p.py 1 0
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 16784 2657 20 0 57616 11840 do_wai S+ pts/0 0:01 python3 p.py 1 0
pi@raspberrypi:/tmp $ python3 p.py 2 0
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 16811 2657 20 0 57872 12100 do_wai S+ pts/0 0:03 python3 p.py 2 0
pi@raspberrypi:/tmp $ python3 p.py 10 0
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 16835 2657 20 0 58412 12760 do_wai S+ pts/0 0:18 python3 p.py 10 0
pi@raspberrypi:/tmp $ python3 p.py 20 0
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 16950 2657 20 0 58924 13240 do_wai S+ pts/0 0:35 python3 p.py 20 0
pi@raspberrypi:/tmp $ python3 p.py 100 0
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 17179 2657 20 0 57900 12320 do_wai S+ pts/0 2:58 python3 p.py 100 0
pi@raspberrypi:/tmp $
如你所见,100 圈后的尺寸比 10 圈和 20 圈后的尺寸小
现在,如果我每转一圈等待 1 秒,结果将与之前不同且更小或相等:
pi@raspberrypi:/tmp $ python3 p.py 1 1
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 18251 2657 20 0 57360 11708 do_wai S+ pts/0 0:01 python3 p.py 1 1
pi@raspberrypi:/tmp $ python3 p.py 2 1
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 18276 2657 20 0 57616 11840 do_wai S+ pts/0 0:03 python3 p.py 2 1
pi@raspberrypi:/tmp $ python3 p.py 10 1
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 18299 2657 20 0 58412 12748 do_wai S+ pts/0 0:17 python3 p.py 10 1
pi@raspberrypi:/tmp $ python3 p.py 20 1
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 18417 2657 20 0 58412 12824 do_wai S+ pts/0 0:34 python3 p.py 20 1
pi@raspberrypi:/tmp $ python3 p.py 100 1
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 20469 2657 20 0 57900 12328 do_wai S+ pts/0 3:13 python3 p.py 100 1
pi@raspberrypi:/tmp $
对于 python 的观点,因为没有意义,可能可变性并非(仅)来自 Python,而是(也)时间 Linux 需要管理线程消失。当然,如果你同步地做所有事情,你就不会有这个问题
我在 PI4 上做过:
pi@raspberrypi:/tmp $ uname -a
Linux raspberrypi 5.10.17-v7l+ #1403 SMP Mon Feb 22 11:33:35 GMT 2021 armv7l GNU/Linux
pi@raspberrypi:/tmp $ python3 --version
Python 3.7.3
pi@raspberrypi:/tmp $
您还可以在图表中看到最后 3 个较低的级别是相等的(例如赋值 100),这与内存泄漏或至少是可见的内存泄漏不兼容