循环无法正常工作并且按键输出错误
Loop not working properly and key output wrong
好的,我要开始展示我的代码了:
import requests
import json
import csv
import pandas as pd
with open('AcoesURLJsonCompleta.csv', newline='') as csvfile:
urlreader = csv.reader(csvfile, delimiter=',')
for obj_id in urlreader:
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'}
jsonData = requests.get(row, headers=headers).json()
mapper = (
('Ticker', 'ric'),
('Beta', 'beta'),
('DY', 'current_dividend_yield_ttm'),
('VOL', 'share_volume_3m'),
('P/L', 'pe_normalized_annual'),
('Cresc5A', 'eps_growth_5y'),
('LPA', 'eps_normalized_annual'),
('VPA', 'book_value_share_quarterly'),
('LAST', 'last')
)
data = {}
for dataKey, jsonDataKey in mapper:
d = jsonData.get(jsonDataKey, '')
try:
flt_d = float(d)
except ValueError:
d = ''
finally:
data[dataKey] = [d]
table = pd.DataFrame(data, columns=['Ticker', 'Beta', 'DY', 'VOL', 'P/L', 'Cresc5A', 'LPA', 'VPA', 'Last'])
table.index = table.index + 1
table.to_csv('CompleteData.csv', sep=',', encoding='utf-8', index=False)
print(table)
好的,让我们开始吧:
- 我的第一个循环
for rows in Urls
是正确的吗?我想遍历存储在我的 CSV 文件中的 Urls,但我不知道我是否正确使用了 split 和 strip。
- 我的json请求可以吗?
- 如果其中任何
jsonData
请求 return NaN 或 Null 或未找到任何内容,我应该如何将其放在我的代码中以便它跳到另一个 URL 并在发生这种情况时附加“”(无)?
整个代码的输出是line 25, in <module>
Beta = jsonData['beta']
KeyError: 'beta'
谢谢
你可以做的一件事是将需求 3 的思想封装在它自己的函数中:
def transfer(src, dest, src_name, dest_name):
try:
value = src[src_name]
except KeyError:
return
try:
value = float(value)
except TypeError:
return
dest[dest_name].append(value)
# Sample call:
transfer(jsonData, data, 'ric', 'Ticker')
这消除了由于缺失值和 json null
而导致的错误。由于json没有NaN
的概念,这里无法处理
更新代码
我已将您提供的 URL 几行代码和 运行 下面的代码与之对应并打印出结果。此版本使用多个线程来获取 URLs 和 requests
会话。这大大加快了处理速度。
代码顶部附近有一个常量 NUMBER_OF_CONCURRENT_URL_REQUESTS
,它确定将发出的并发 URL get 请求的数量。我尝试了从 8 到 30 的各种数字。这是我学到的(或看起来是真的):
- 无论
NUMBER_OF_CONCURRENT_URL_REQUESTS
的设置如何,如果您连续快速 运行 程序两次,您会得到相同的结果。看起来服务器正在缓存请求结果一段时间。
- 但是,如果您等待的时间足够长以至于缓存没有发挥作用,您会得到不同的结果,即数据丢失方面的不同错误。为什么这是我不能说的。
NUMBER_OF_CONCURRENT_URL_REQUESTS
的值越大,程序越快运行s。可能有一些值太大以至于服务器可能会不高兴并认为您正在尝试进行拒绝服务攻击。我看不出有任何理由让这个值大于 30。
-
NUMBER_OF_CONCURRENT_URL_REQUESTS
的值越大与缺失数据的可能性之间是否存在相关性?我不能肯定地说,但情况似乎是这样,这对我来说毫无意义。 您可以尝试不同的值,然后以一种或另一种方式亲眼看看。
代码:
import csv, requests, pandas as pd
from decimal import Decimal, DecimalException
from collections import defaultdict
from concurrent.futures import ThreadPoolExecutor
from functools import partial
from time import sleep
NUMBER_OF_CONCURRENT_URL_REQUESTS = 8
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'}
def request_getter(session, url):
ric = url.split('/')[-1] # in case results does not contain 'ric' key
for t in (0, 1000, 2000, 4000, 4000):
if t:
sleep(t)
print(f"Retrying request '{ric}' ...", flush=True)
data = session.get(url, headers=headers).json()
if 'retry' not in data:
break
return ric, data
mapper = (
('Ticker', 'ric'),
('Beta', 'beta'),
('DY', 'current_dividend_yield_ttm'),
('VOL', 'share_volume_3m'),
('P/L', 'pe_normalized_annual'),
('Cresc5A', 'eps_growth_5y'),
('LPA', 'eps_normalized_annual'),
('VPA', 'book_value_share_quarterly'),
('LAST', 'last')
)
data = defaultdict(list)
with open('AcoesURLJsonCompleta.csv', newline='') as csvfile:
urlreader = csv.reader(csvfile, delimiter=',')
# set max_workers to # cpu processors you have and use a requests Session for even more perofrmance
with ThreadPoolExecutor(max_workers=NUMBER_OF_CONCURRENT_URL_REQUESTS) as executor, requests.Session() as session:
request_getter_with_session = partial(request_getter, session)
for ric, results in executor.map(request_getter_with_session, (row[0] for row in urlreader)):
if 'market_data' not in results:
print(f"Missing 'market_data' key for request '{ric}'", flush=True)
for k, v in results.items():
print(f' {repr(k)} -> {repr(v)}', flush=True)
print(flush=True)
continue
market_data = results['market_data']
if 'ric' not in market_data:
# see if any of the mapper keys are present:
found = False
for _, jsonDataKey in mapper:
if jsonDataKey in market_data:
found = True
break
if not found:
print(f"Request '{ric}' has nothing recognizable in market_data:", flush=True)
for k, v in market_data.items():
print(f' {repr(k)} -> {repr(v)}', flush=True)
print(flush=True)
continue
# We have at least one data value present
print(f"Results missing 'ric' key; inferring 'ric' value '{ric}' from request URL.", flush=True)
market_data['ric'] = ric
for dataKey, jsonDataKey in mapper: # for example, 'Ticker', 'ric'
d = market_data.get(jsonDataKey)
if d is None:
print(f"Data missing for request = '{ric}', key = '{jsonDataKey}'", flush=True)
d = '' if jsonDataKey == 'ric' else Decimal('NaN')
else:
try:
if jsonDataKey != 'ric': d = Decimal(d)
except DecimalException:
print(f"Bad value for '{jsonDataKey}': {repr(d)}", flush=True)
d = Decimal('NaN') # Decimal class has it's own version
data[dataKey].append(d) # add to data
table = pd.DataFrame(data)
table.index = table.index + 1
table.to_csv('CompleteData.csv', sep=',', encoding='utf-8', index=False)
print(table)
"""
# to read back table:
table2 = pd.read_csv('CompleteData.csv', sep=',', encoding='utf-8', converters={
'Ticker': str,
'Beta': Decimal,
'DY': Decimal,
'VOL': Decimal,
'P/L': Decimal,
'Cresc5A': Decimal,
'LPA': Decimal,
'VPA': Decimal,
'LAST': Decimal
})
print(table2)
"""
打印:
Missing 'market_data' key for request CPLE6.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request EQMA3B.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Data missing for ric GNDI3.sa, key beta
Data missing for ric GNDI3.sa, key current_dividend_yield_ttm
Data missing for ric GNDI3.sa, key share_volume_3m
Data missing for ric GNDI3.sa, key pe_normalized_annual
Data missing for ric GNDI3.sa, key eps_growth_5y
Data missing for ric GNDI3.sa, key eps_normalized_annual
Data missing for ric GNDI3.sa, key book_value_share_quarterly
Missing 'market_data' key for request MDNE3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request MMXM11.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request PCAR3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Results missing ric key; inferring ric value from request URL.
Data missing for ric RAIL3.sa, key last
Results missing ric key; inferring ric value from request URL.
Data missing for ric SANB4.sa, key last
Missing 'market_data' key for request TIMP3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request VIVT3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Ticker Beta DY VOL P/L Cresc5A LPA VPA LAST
1 AALR3.sa 1.04339 0.80591 11.00223 26.44449 -99999.99000 0.39668 10.83966 10.490000
2 ABCB4.sa 1.20526 7.34780 18.61900 5.78866 5.42894 2.46862 18.87782 14.290000
3 ABEV3.sa 0.46311 4.32628 688.21043 15.04597 -0.71223 0.75369 3.89563 11.340000
4 ADHM3.sa 1.69780 0.00000 2.36460 -99999.99000 -99999.99000 -0.65331 -2.61497 2.480000
5 AGRO3.sa 0.35568 4.53332 2.54323 41.17127 -99999.99000 0.49792 17.47838 20.500000
.. ... ... ... ... ... ... ... ... ...
255 WEGE3.sa 0.50580 1.02429 165.72543 50.11481 17.06485 0.79697 4.59658 39.940000
256 WHRL3.sa 0.59263 8.86991 1.24990 12.72584 0.65648 0.50920 2.00868 6.700000
257 WHRL4.sa 0.59263 8.86991 1.24990 12.72584 0.65648 0.50920 2.00868 6.480000
258 WIZS3.sa 0.76719 12.18673 19.00407 6.67135 21.23109 1.36704 1.16978 9.120000
259 YDUQ3.sa 1.42218 1.68099 94.00410 13.83419 9.13751 2.19384 10.31845 30.350000
[259 rows x 9 columns]
下一个运行:
Missing 'market_data' key for request CPLE6.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request EQMA3B.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request MDNE3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request MMXM11.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request PCAR3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request TIMP3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request VIVT3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Ticker Beta DY VOL P/L Cresc5A LPA VPA LAST
1 AALR3.sa 1.04339 0.80591 11.00223 26.44449 -99999.99000 0.39668 10.83966 10.490000
2 ABCB4.sa 1.20526 7.34780 18.61900 5.78866 5.42894 2.46862 18.87782 14.290000
3 ABEV3.sa 0.46311 4.32628 688.21043 15.04597 -0.71223 0.75369 3.89563 11.340000
4 ADHM3.sa 1.69780 0.00000 2.36460 -99999.99000 -99999.99000 -0.65331 -2.61497 2.480000
5 AGRO3.sa 0.35568 4.53332 2.54323 41.17127 -99999.99000 0.49792 17.47838 20.500000
.. ... ... ... ... ... ... ... ... ...
255 WEGE3.sa 0.50580 1.02429 165.72543 50.11481 17.06485 0.79697 4.59658 39.940000
256 WHRL3.sa 0.59263 8.86991 1.24990 12.72584 0.65648 0.50920 2.00868 6.700000
257 WHRL4.sa 0.59263 8.86991 1.24990 12.72584 0.65648 0.50920 2.00868 6.480000
258 WIZS3.sa 0.76719 12.18673 19.00407 6.67135 21.23109 1.36704 1.16978 9.120000
259 YDUQ3.sa 1.42218 1.68099 94.00410 13.83419 9.13751 2.19384 10.31845 30.350000
[259 rows x 9 columns]
讨论
通过使用线程和请求 Session 对象,代码变得更加复杂,但复杂性对于大大减少程序的 运行 宁时间是必要的。
要理解代码,你需要理解ThreadPoolExecutor
,map
函数(ThreadPoolExcecutor.map
方法是这个的变体,分配一个线程来执行函数调用)和 functools.partial
,这是必需的,因为 map
期望它的函数参数是一个接受单个参数的函数,但我们需要用两个参数调用 request_getter
,一个 requests
Session对象,它永远不会变化,还有一个 URL。 partial
允许我们将接受两个参数的函数 t运行sform 转换为接受一个参数并自动提供另一个参数的函数。例如:
def foo(x, y):
return x + y
def foo7(y):
return partial(foo, 7) # the first argument to foo now will always be 7
foo7(9) # equivalent to foo(7, 9)
回读csv文件:
from decimal import Decimal
import pandas as pd
table = pd.read_csv('CompleteData.csv', sep=',', encoding='utf-8', converters={
'Ticker': str,
'Beta': Decimal,
'DY': Decimal,
'VOL': Decimal,
'P/L': Decimal,
'Cresc5A': Decimal,
'LPA': Decimal,
'VPA': Decimal,
'LAST': Decimal
})
好的,我要开始展示我的代码了:
import requests
import json
import csv
import pandas as pd
with open('AcoesURLJsonCompleta.csv', newline='') as csvfile:
urlreader = csv.reader(csvfile, delimiter=',')
for obj_id in urlreader:
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'}
jsonData = requests.get(row, headers=headers).json()
mapper = (
('Ticker', 'ric'),
('Beta', 'beta'),
('DY', 'current_dividend_yield_ttm'),
('VOL', 'share_volume_3m'),
('P/L', 'pe_normalized_annual'),
('Cresc5A', 'eps_growth_5y'),
('LPA', 'eps_normalized_annual'),
('VPA', 'book_value_share_quarterly'),
('LAST', 'last')
)
data = {}
for dataKey, jsonDataKey in mapper:
d = jsonData.get(jsonDataKey, '')
try:
flt_d = float(d)
except ValueError:
d = ''
finally:
data[dataKey] = [d]
table = pd.DataFrame(data, columns=['Ticker', 'Beta', 'DY', 'VOL', 'P/L', 'Cresc5A', 'LPA', 'VPA', 'Last'])
table.index = table.index + 1
table.to_csv('CompleteData.csv', sep=',', encoding='utf-8', index=False)
print(table)
好的,让我们开始吧:
- 我的第一个循环
for rows in Urls
是正确的吗?我想遍历存储在我的 CSV 文件中的 Urls,但我不知道我是否正确使用了 split 和 strip。 - 我的json请求可以吗?
- 如果其中任何
jsonData
请求 return NaN 或 Null 或未找到任何内容,我应该如何将其放在我的代码中以便它跳到另一个 URL 并在发生这种情况时附加“”(无)?
整个代码的输出是line 25, in <module>
Beta = jsonData['beta']
KeyError: 'beta'
谢谢
你可以做的一件事是将需求 3 的思想封装在它自己的函数中:
def transfer(src, dest, src_name, dest_name):
try:
value = src[src_name]
except KeyError:
return
try:
value = float(value)
except TypeError:
return
dest[dest_name].append(value)
# Sample call:
transfer(jsonData, data, 'ric', 'Ticker')
这消除了由于缺失值和 json null
而导致的错误。由于json没有NaN
的概念,这里无法处理
更新代码
我已将您提供的 URL 几行代码和 运行 下面的代码与之对应并打印出结果。此版本使用多个线程来获取 URLs 和 requests
会话。这大大加快了处理速度。
代码顶部附近有一个常量 NUMBER_OF_CONCURRENT_URL_REQUESTS
,它确定将发出的并发 URL get 请求的数量。我尝试了从 8 到 30 的各种数字。这是我学到的(或看起来是真的):
- 无论
NUMBER_OF_CONCURRENT_URL_REQUESTS
的设置如何,如果您连续快速 运行 程序两次,您会得到相同的结果。看起来服务器正在缓存请求结果一段时间。 - 但是,如果您等待的时间足够长以至于缓存没有发挥作用,您会得到不同的结果,即数据丢失方面的不同错误。为什么这是我不能说的。
NUMBER_OF_CONCURRENT_URL_REQUESTS
的值越大,程序越快运行s。可能有一些值太大以至于服务器可能会不高兴并认为您正在尝试进行拒绝服务攻击。我看不出有任何理由让这个值大于 30。-
NUMBER_OF_CONCURRENT_URL_REQUESTS
的值越大与缺失数据的可能性之间是否存在相关性?我不能肯定地说,但情况似乎是这样,这对我来说毫无意义。 您可以尝试不同的值,然后以一种或另一种方式亲眼看看。
代码:
import csv, requests, pandas as pd
from decimal import Decimal, DecimalException
from collections import defaultdict
from concurrent.futures import ThreadPoolExecutor
from functools import partial
from time import sleep
NUMBER_OF_CONCURRENT_URL_REQUESTS = 8
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'}
def request_getter(session, url):
ric = url.split('/')[-1] # in case results does not contain 'ric' key
for t in (0, 1000, 2000, 4000, 4000):
if t:
sleep(t)
print(f"Retrying request '{ric}' ...", flush=True)
data = session.get(url, headers=headers).json()
if 'retry' not in data:
break
return ric, data
mapper = (
('Ticker', 'ric'),
('Beta', 'beta'),
('DY', 'current_dividend_yield_ttm'),
('VOL', 'share_volume_3m'),
('P/L', 'pe_normalized_annual'),
('Cresc5A', 'eps_growth_5y'),
('LPA', 'eps_normalized_annual'),
('VPA', 'book_value_share_quarterly'),
('LAST', 'last')
)
data = defaultdict(list)
with open('AcoesURLJsonCompleta.csv', newline='') as csvfile:
urlreader = csv.reader(csvfile, delimiter=',')
# set max_workers to # cpu processors you have and use a requests Session for even more perofrmance
with ThreadPoolExecutor(max_workers=NUMBER_OF_CONCURRENT_URL_REQUESTS) as executor, requests.Session() as session:
request_getter_with_session = partial(request_getter, session)
for ric, results in executor.map(request_getter_with_session, (row[0] for row in urlreader)):
if 'market_data' not in results:
print(f"Missing 'market_data' key for request '{ric}'", flush=True)
for k, v in results.items():
print(f' {repr(k)} -> {repr(v)}', flush=True)
print(flush=True)
continue
market_data = results['market_data']
if 'ric' not in market_data:
# see if any of the mapper keys are present:
found = False
for _, jsonDataKey in mapper:
if jsonDataKey in market_data:
found = True
break
if not found:
print(f"Request '{ric}' has nothing recognizable in market_data:", flush=True)
for k, v in market_data.items():
print(f' {repr(k)} -> {repr(v)}', flush=True)
print(flush=True)
continue
# We have at least one data value present
print(f"Results missing 'ric' key; inferring 'ric' value '{ric}' from request URL.", flush=True)
market_data['ric'] = ric
for dataKey, jsonDataKey in mapper: # for example, 'Ticker', 'ric'
d = market_data.get(jsonDataKey)
if d is None:
print(f"Data missing for request = '{ric}', key = '{jsonDataKey}'", flush=True)
d = '' if jsonDataKey == 'ric' else Decimal('NaN')
else:
try:
if jsonDataKey != 'ric': d = Decimal(d)
except DecimalException:
print(f"Bad value for '{jsonDataKey}': {repr(d)}", flush=True)
d = Decimal('NaN') # Decimal class has it's own version
data[dataKey].append(d) # add to data
table = pd.DataFrame(data)
table.index = table.index + 1
table.to_csv('CompleteData.csv', sep=',', encoding='utf-8', index=False)
print(table)
"""
# to read back table:
table2 = pd.read_csv('CompleteData.csv', sep=',', encoding='utf-8', converters={
'Ticker': str,
'Beta': Decimal,
'DY': Decimal,
'VOL': Decimal,
'P/L': Decimal,
'Cresc5A': Decimal,
'LPA': Decimal,
'VPA': Decimal,
'LAST': Decimal
})
print(table2)
"""
打印:
Missing 'market_data' key for request CPLE6.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request EQMA3B.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Data missing for ric GNDI3.sa, key beta
Data missing for ric GNDI3.sa, key current_dividend_yield_ttm
Data missing for ric GNDI3.sa, key share_volume_3m
Data missing for ric GNDI3.sa, key pe_normalized_annual
Data missing for ric GNDI3.sa, key eps_growth_5y
Data missing for ric GNDI3.sa, key eps_normalized_annual
Data missing for ric GNDI3.sa, key book_value_share_quarterly
Missing 'market_data' key for request MDNE3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request MMXM11.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request PCAR3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Results missing ric key; inferring ric value from request URL.
Data missing for ric RAIL3.sa, key last
Results missing ric key; inferring ric value from request URL.
Data missing for ric SANB4.sa, key last
Missing 'market_data' key for request TIMP3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request VIVT3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Ticker Beta DY VOL P/L Cresc5A LPA VPA LAST
1 AALR3.sa 1.04339 0.80591 11.00223 26.44449 -99999.99000 0.39668 10.83966 10.490000
2 ABCB4.sa 1.20526 7.34780 18.61900 5.78866 5.42894 2.46862 18.87782 14.290000
3 ABEV3.sa 0.46311 4.32628 688.21043 15.04597 -0.71223 0.75369 3.89563 11.340000
4 ADHM3.sa 1.69780 0.00000 2.36460 -99999.99000 -99999.99000 -0.65331 -2.61497 2.480000
5 AGRO3.sa 0.35568 4.53332 2.54323 41.17127 -99999.99000 0.49792 17.47838 20.500000
.. ... ... ... ... ... ... ... ... ...
255 WEGE3.sa 0.50580 1.02429 165.72543 50.11481 17.06485 0.79697 4.59658 39.940000
256 WHRL3.sa 0.59263 8.86991 1.24990 12.72584 0.65648 0.50920 2.00868 6.700000
257 WHRL4.sa 0.59263 8.86991 1.24990 12.72584 0.65648 0.50920 2.00868 6.480000
258 WIZS3.sa 0.76719 12.18673 19.00407 6.67135 21.23109 1.36704 1.16978 9.120000
259 YDUQ3.sa 1.42218 1.68099 94.00410 13.83419 9.13751 2.19384 10.31845 30.350000
[259 rows x 9 columns]
下一个运行:
Missing 'market_data' key for request CPLE6.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request EQMA3B.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request MDNE3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request MMXM11.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request PCAR3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request TIMP3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Missing 'market_data' key for request VIVT3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None
Ticker Beta DY VOL P/L Cresc5A LPA VPA LAST
1 AALR3.sa 1.04339 0.80591 11.00223 26.44449 -99999.99000 0.39668 10.83966 10.490000
2 ABCB4.sa 1.20526 7.34780 18.61900 5.78866 5.42894 2.46862 18.87782 14.290000
3 ABEV3.sa 0.46311 4.32628 688.21043 15.04597 -0.71223 0.75369 3.89563 11.340000
4 ADHM3.sa 1.69780 0.00000 2.36460 -99999.99000 -99999.99000 -0.65331 -2.61497 2.480000
5 AGRO3.sa 0.35568 4.53332 2.54323 41.17127 -99999.99000 0.49792 17.47838 20.500000
.. ... ... ... ... ... ... ... ... ...
255 WEGE3.sa 0.50580 1.02429 165.72543 50.11481 17.06485 0.79697 4.59658 39.940000
256 WHRL3.sa 0.59263 8.86991 1.24990 12.72584 0.65648 0.50920 2.00868 6.700000
257 WHRL4.sa 0.59263 8.86991 1.24990 12.72584 0.65648 0.50920 2.00868 6.480000
258 WIZS3.sa 0.76719 12.18673 19.00407 6.67135 21.23109 1.36704 1.16978 9.120000
259 YDUQ3.sa 1.42218 1.68099 94.00410 13.83419 9.13751 2.19384 10.31845 30.350000
[259 rows x 9 columns]
讨论
通过使用线程和请求 Session 对象,代码变得更加复杂,但复杂性对于大大减少程序的 运行 宁时间是必要的。
要理解代码,你需要理解ThreadPoolExecutor
,map
函数(ThreadPoolExcecutor.map
方法是这个的变体,分配一个线程来执行函数调用)和 functools.partial
,这是必需的,因为 map
期望它的函数参数是一个接受单个参数的函数,但我们需要用两个参数调用 request_getter
,一个 requests
Session对象,它永远不会变化,还有一个 URL。 partial
允许我们将接受两个参数的函数 t运行sform 转换为接受一个参数并自动提供另一个参数的函数。例如:
def foo(x, y):
return x + y
def foo7(y):
return partial(foo, 7) # the first argument to foo now will always be 7
foo7(9) # equivalent to foo(7, 9)
回读csv文件:
from decimal import Decimal
import pandas as pd
table = pd.read_csv('CompleteData.csv', sep=',', encoding='utf-8', converters={
'Ticker': str,
'Beta': Decimal,
'DY': Decimal,
'VOL': Decimal,
'P/L': Decimal,
'Cresc5A': Decimal,
'LPA': Decimal,
'VPA': Decimal,
'LAST': Decimal
})