我如何提高 python 中的多线程速度和效率?
How can i improve my multithreading speed and effciency in python?
如何提高我的代码中的多线程速度?
我的代码用 100 个线程执行 700 个请求需要 130 秒,假设我使用 100 个线程,这真的很慢而且令人沮丧。
我的代码编辑 url 中的参数值并向其发出请求,包括原始 url(未编辑) url 是从文件(urls.txt)
举个例子:
让我们考虑以下 url:
https://www.test.com/index.php?parameter=value1¶meter2=value2
url 包含 2 个参数,因此我的代码将发出 3 个请求。
1 请求原url:
https://www.test.com/index.php?parameter=value1¶meter2=value2
1请求到第一个修改值:
https://www.test.com/index.php?parameter=replaced_value¶meter2=value2
1请求到第二个修改值:
https://www.test.com/index.php?parameter=value1¶meter2=replaced_value
我曾尝试使用 asyncio
来完成此操作,但我使用 concurrent.futures
更成功
我什至尝试增加线程数,起初我认为这是问题所在,但在这种情况下,如果我大幅增加线程数,脚本将在启动时冻结 30-50 秒,但实际上并没有增加速度如我所料
我认为这是我如何构建多线程的代码问题,因为我看到其他人使用 concurrent.futures
获得了令人难以置信的速度
import requests
import uuid
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
start = time.time()
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
def make_request(url2):
try:
if '?' and '=':
request_1 = requests.get(url2, headers=headers, timeout=10)
url2_modified = url2.split("?")[1]
times = url2_modified.count("&") + 1
for x in range(0, times):
split1 = url2_modified.split("&")[x]
value = split1.split("=")[1]
parameter = split1.split("=")[0]
url = url2.replace('='+value, '=1')
request_2 = requests.get(url, stream=True, headers=headers, timeout=10)
html_1 = request_1.text
html_2 = request_2.text
print(request_1.status_code + ' - ' + url2)
print(request_2.status_code + ' - ' + url)
except requests.exceptions.RequestException as e:
return e
def runner():
threads= []
with ThreadPoolExecutor(max_workers=100) as executor:
file1 = open('urls.txt', 'r', errors='ignore')
Lines = file1.readlines()
count = 0
for line in Lines:
count += 1
threads.append(executor.submit(make_request, line.strip()))
runner()
end = time.time()
print(end - start)
在 make_request
中的 loop
你 运行 正常 requests.get
并且它不使用 thread
(或任何其他方法)来使其更快- 所以它必须等待上一个请求结束才能 运行 下一个请求。
在make_request
中我使用另一个ThreadPoolExecutor
到运行每个requests.get
(在循环中创建)分开thread
executor.submit(make_modified_request, modified_url)
它给了我时间~1.2s
如果我用普通的
make_modified_request(modified_url)
然后它给了我时间~3.2s
最小工作示例:
我使用的是真实网址https://httpbin.org/get所以大家可以直接复制运行。
from concurrent.futures import ThreadPoolExecutor
import requests
import time
#import urllib.parse
# --- constansts --- (PEP8: UPPER_CASE_NAMES)
HEADERS = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
# --- functions ---
def make_modified_request(url):
"""Send modified url."""
print('send:', url)
response = requests.get(url, stream=True, headers=HEADERS)
print(response.status_code, '-', url)
html = response.text # ???
# ... code to process HTML ...
def make_request(url):
"""Send normal url and create threads with modified urls."""
threads = []
with ThreadPoolExecutor(max_workers=10) as executor:
print('send:', url)
# send base url
response = requests.get(url, headers=HEADERS)
print(response.status_code, '-', url)
html = response.text # ???
#parts = urllib.parse.urlparse(url)
#print('query:', parts.query)
#arguments = urllib.parse.parse_qs(parts.query)
#print('arguments:', arguments) # dict {'a': ['A'], 'b': ['B'], 'c': ['C'], 'd': ['D'], 'e': ['E']}
arguments = url.split("?")[1]
arguments = arguments.split("&")
arguments = [arg.split("=") for arg in arguments]
print('arguments:', arguments) # list [['a', 'A'], ['b', 'B'], ['c', 'C'], ['d', 'D'], ['e', 'E']]
for name, value in arguments:
modified_url = url.replace('='+value, '=1')
print('modified_url:', modified_url)
# run thread with modified url
threads.append(executor.submit(make_modified_request, modified_url))
# run normal function with modified url
#make_modified_request(modified_url)
print('[make_request] len(threads):', len(threads))
def runner():
threads = []
with ThreadPoolExecutor(max_workers=10) as executor:
#fh = open('urls.txt', errors='ignore')
fh = [
'https://httpbin.org/get?a=A&b=B&c=C&d=D&e=E',
'https://httpbin.org/get?f=F&g=G&h=H&i=I&j=J',
'https://httpbin.org/get?k=K&l=L&m=M&n=N&o=O',
'https://httpbin.org/get?a=A&b=B&c=C&d=D&e=E',
'https://httpbin.org/get?f=F&g=G&h=H&i=I&j=J',
'https://httpbin.org/get?k=K&l=L&m=M&n=N&o=O',
]
for line in fh:
url = line.strip()
# create thread with url
threads.append(executor.submit(make_request, url))
print('[runner] len(threads):', len(threads))
# --- main ---
start = time.time()
runner()
end = time.time()
print('time:', end - start)
顺便说一句:
我想用单机
executor = ThreadPoolExecutor(max_workers=10)
以后在所有函数中使用相同的 executor
- 也许 运行 会快一点 - 但目前我没有工作代码。
如何提高我的代码中的多线程速度?
我的代码用 100 个线程执行 700 个请求需要 130 秒,假设我使用 100 个线程,这真的很慢而且令人沮丧。
我的代码编辑 url 中的参数值并向其发出请求,包括原始 url(未编辑) url 是从文件(urls.txt)
举个例子:
让我们考虑以下 url:
https://www.test.com/index.php?parameter=value1¶meter2=value2
url 包含 2 个参数,因此我的代码将发出 3 个请求。
1 请求原url:
https://www.test.com/index.php?parameter=value1¶meter2=value2
1请求到第一个修改值:
https://www.test.com/index.php?parameter=replaced_value¶meter2=value2
1请求到第二个修改值:
https://www.test.com/index.php?parameter=value1¶meter2=replaced_value
我曾尝试使用 asyncio
来完成此操作,但我使用 concurrent.futures
我什至尝试增加线程数,起初我认为这是问题所在,但在这种情况下,如果我大幅增加线程数,脚本将在启动时冻结 30-50 秒,但实际上并没有增加速度如我所料
我认为这是我如何构建多线程的代码问题,因为我看到其他人使用 concurrent.futures
获得了令人难以置信的速度import requests
import uuid
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
start = time.time()
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
def make_request(url2):
try:
if '?' and '=':
request_1 = requests.get(url2, headers=headers, timeout=10)
url2_modified = url2.split("?")[1]
times = url2_modified.count("&") + 1
for x in range(0, times):
split1 = url2_modified.split("&")[x]
value = split1.split("=")[1]
parameter = split1.split("=")[0]
url = url2.replace('='+value, '=1')
request_2 = requests.get(url, stream=True, headers=headers, timeout=10)
html_1 = request_1.text
html_2 = request_2.text
print(request_1.status_code + ' - ' + url2)
print(request_2.status_code + ' - ' + url)
except requests.exceptions.RequestException as e:
return e
def runner():
threads= []
with ThreadPoolExecutor(max_workers=100) as executor:
file1 = open('urls.txt', 'r', errors='ignore')
Lines = file1.readlines()
count = 0
for line in Lines:
count += 1
threads.append(executor.submit(make_request, line.strip()))
runner()
end = time.time()
print(end - start)
在 make_request
中的 loop
你 运行 正常 requests.get
并且它不使用 thread
(或任何其他方法)来使其更快- 所以它必须等待上一个请求结束才能 运行 下一个请求。
在make_request
中我使用另一个ThreadPoolExecutor
到运行每个requests.get
(在循环中创建)分开thread
executor.submit(make_modified_request, modified_url)
它给了我时间~1.2s
如果我用普通的
make_modified_request(modified_url)
然后它给了我时间~3.2s
最小工作示例:
我使用的是真实网址https://httpbin.org/get所以大家可以直接复制运行。
from concurrent.futures import ThreadPoolExecutor
import requests
import time
#import urllib.parse
# --- constansts --- (PEP8: UPPER_CASE_NAMES)
HEADERS = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
# --- functions ---
def make_modified_request(url):
"""Send modified url."""
print('send:', url)
response = requests.get(url, stream=True, headers=HEADERS)
print(response.status_code, '-', url)
html = response.text # ???
# ... code to process HTML ...
def make_request(url):
"""Send normal url and create threads with modified urls."""
threads = []
with ThreadPoolExecutor(max_workers=10) as executor:
print('send:', url)
# send base url
response = requests.get(url, headers=HEADERS)
print(response.status_code, '-', url)
html = response.text # ???
#parts = urllib.parse.urlparse(url)
#print('query:', parts.query)
#arguments = urllib.parse.parse_qs(parts.query)
#print('arguments:', arguments) # dict {'a': ['A'], 'b': ['B'], 'c': ['C'], 'd': ['D'], 'e': ['E']}
arguments = url.split("?")[1]
arguments = arguments.split("&")
arguments = [arg.split("=") for arg in arguments]
print('arguments:', arguments) # list [['a', 'A'], ['b', 'B'], ['c', 'C'], ['d', 'D'], ['e', 'E']]
for name, value in arguments:
modified_url = url.replace('='+value, '=1')
print('modified_url:', modified_url)
# run thread with modified url
threads.append(executor.submit(make_modified_request, modified_url))
# run normal function with modified url
#make_modified_request(modified_url)
print('[make_request] len(threads):', len(threads))
def runner():
threads = []
with ThreadPoolExecutor(max_workers=10) as executor:
#fh = open('urls.txt', errors='ignore')
fh = [
'https://httpbin.org/get?a=A&b=B&c=C&d=D&e=E',
'https://httpbin.org/get?f=F&g=G&h=H&i=I&j=J',
'https://httpbin.org/get?k=K&l=L&m=M&n=N&o=O',
'https://httpbin.org/get?a=A&b=B&c=C&d=D&e=E',
'https://httpbin.org/get?f=F&g=G&h=H&i=I&j=J',
'https://httpbin.org/get?k=K&l=L&m=M&n=N&o=O',
]
for line in fh:
url = line.strip()
# create thread with url
threads.append(executor.submit(make_request, url))
print('[runner] len(threads):', len(threads))
# --- main ---
start = time.time()
runner()
end = time.time()
print('time:', end - start)
顺便说一句:
我想用单机
executor = ThreadPoolExecutor(max_workers=10)
以后在所有函数中使用相同的 executor
- 也许 运行 会快一点 - 但目前我没有工作代码。