从导入的模块查询数据库时出错
Error querying a database from imported module
我需要从一个导入的模块中提取 5000 个结果,但即使我尝试 return 1000 个结果也会出错。我最多 return 是 500 个结果(num_players=500
).理想情况下,我可以随机抽取 5000 个结果,但我猜前 5000 个必须这样做。我只需要样本数据 运行 在 Excel 中进行分析。以下代码摘自此处文档中的示例。 https://pyett.readthedocs.io/en/latest/cohort.html
有没有人对我如何使它正确运行有任何建议?为什么它失去与数据库的连接?
from pyETT import ett
import pandas as pd
lb_cohort = ett.Cohort(ett.ETT().get_leaderboard(num_players=1000))
lb_cohort.size
df = lb_cohort.players_dataframe()
print(df)
file_name = 'export_file.xlsx'
df.to_excel(file_name)
print('DataFrame is written to Excel File successfully.')
这是我得到的异常:
Message=Cannot connect to host www.elevenvr.club:443 ssl:default [The semaphore timeout period has expired]
Source=C:\Users\Apache Paint\source\repos\Patrick_Kimble \Patrick_Kimble.py
StackTrace:
File "C:\Users\Apache Paint\source\repos\Patrick_Kimble\Patrick_Kimble.py", line 7, in (Current frame)
lb_cohort = ett.Cohort(ett.ETT().get_leaderboard(num_players=1000))
HTTPSConnectionPool(host='www.elevenvr.club', port=443): Max retries exceeded with url: /accounts/search/bensnow/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000025564B9AE20>: Failed to establish a new connection: [WinError 10060]
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))
编辑:
我创建了一个循环来尝试从数据库中提取所有数据。
for i in range(1, 500000):
try:
if int(i) % 100 == 0:
print('Loop is at:', i)
user_id = i
line = ett.ett_parser.get_user(user_id)
temp_df = pd.DataFrame(line, index=[i])
self.df_master = self.df_master.append(temp_df, ignore_index = True)
except Exception:
print("Error:",i )
MDR 的答案适用于 return 随机数据。但我需要使用 describe()
函数来提取更多其他细节,但它只接受 'cohort' 类型。
示例:
import pandas as pd
lb_cohort = ett.Cohort(ett.ETT().get_leaderboard(num_players=10))
lb_cohort.size
lb_cohort.describe()
应该return像下面的图片。
由于 .user_search_dataframe()
需要一个字符串和 returns 一个基于部分匹配的框架,您可以组成一长串用户名,循环它,然后将这些框架连接在一起。
示例:
from pyETT import ett
import pandas as pd
from time import sleep
from datetime import datetime
eleven = ett.ETT()
# test short list
l = ['happy', 'honey', 'mad']
# what makes a good username? 'Neo' sure, but what else?
# l = ['League', 'Knight', 'happy', 'honey', 'mad', 'crazy', 'Super', 'one', 'neo', 'duke', 'wizard', 'two', 'jon', 'bob', 'Dog']
dfs = []
for name in l:
df = eleven.user_search_dataframe(name)
#print(df.shape)
# hangs a bit so slow it down/avoid timeouts
sleep(10)
dfs.append(df)
df = pd.concat(dfs, ignore_index=True)
print('Size before dropping duplicate names: ', df.shape)
df = df.drop_duplicates(subset=['name']).reset_index(drop=True)
# should users who have never won or lost a game be removed?
# maybe they never played a game and are just in the system?
# if so uncomment this line...
# df = df.loc[(df[['wins', 'losses']] != 0).any(axis=1)]
df['last_online'] = pd.to_datetime(df['last_online'], format='%Y-%m-%dT%H:%M:%S.%fZ')
df = df.sort_values('rank')
print('Final size of frame: ', df.shape)
print('Random sample of results:', '\n')
# random sample from throughout the frame
print(df.sample(n=20))
# timestamp in file name helps if you have the file open when the script is running and it cannot overwrite
df.to_excel('export_file_' + datetime.now().strftime("%H_%M_%S") + '.xlsx', index=False)
输出(根据代码中的长列表):
Size before dropping duplicate names: (11755, 7)
Final size of frame: (11541, 7)
Random sample of results:
id name elo rank wins losses \
2767 548341 shani_ahmad 1499.0 482261 1 1
2795 575725 MadisonLu 1500.0 466685 0 0
2087 343031 Jomadi97 1797.5 5151 126 76
4640 159384 TwoHungLow 1500.0 48816 0 0
530 193084 Happybloke 1500.0 165971 1 1
3952 538362 Neo2442 1471.0 546859 0 2
783 555710 HappySanguineGaming 1485.0 477280 2 4
9435 73557 NateDoggLi 1500.0 104922 22 66
1 268668 IvyLeague412 1489.0 349980 2 2
2202 387282 Madlog31 1500.0 370736 0 0
1319 20604 Madssr1 1516.0 33429 1 1
739 407953 Happy0321 1500.0 343960 0 0
2165 379302 SamAdam 1500.0 324270 0 0
1693 222456 Hamada 1485.0 504640 0 2
778 451963 happylyu 1500.0 380432 0 0
6740 192120 JonRose32 1500.0 315481 0 0
796 562459 UiJun_Happy 1526.0 39078 19 10
7292 319677 Dapbob 1500.0 220062 0 0
4991 590248 natwon.brooks.3 1500.0 477078 0 0
9859 248163 postdog4 1500.0 107996 0 0
last_online
2767 2021-08-03 18:05:10.423
2795 2021-07-19 23:21:06.618
2087 2021-05-25 14:10:35.903
4640 2020-11-28 14:38:30.703
530 2020-12-31 17:25:02.802
3952 2021-08-18 19:29:39.149
783 2021-07-06 01:35:36.241
9435 2020-10-20 13:16:21.542
1 2021-01-31 01:23:54.627
2202 2021-06-02 19:45:27.265
1319 2020-03-27 13:17:35.754
739 2021-03-25 23:49:28.654
2165 2021-03-07 23:41:26.949
1693 2021-03-11 03:17:06.368
778 2021-04-29 16:51:31.216
6740 2021-07-08 04:08:08.927
796 2021-08-08 11:57:48.181
7292 2021-02-14 14:08:20.299
4991 2021-07-30 13:02:12.894
9859 2021-01-03 08:35:27.054
我需要从一个导入的模块中提取 5000 个结果,但即使我尝试 return 1000 个结果也会出错。我最多 return 是 500 个结果(num_players=500
).理想情况下,我可以随机抽取 5000 个结果,但我猜前 5000 个必须这样做。我只需要样本数据 运行 在 Excel 中进行分析。以下代码摘自此处文档中的示例。 https://pyett.readthedocs.io/en/latest/cohort.html
有没有人对我如何使它正确运行有任何建议?为什么它失去与数据库的连接?
from pyETT import ett
import pandas as pd
lb_cohort = ett.Cohort(ett.ETT().get_leaderboard(num_players=1000))
lb_cohort.size
df = lb_cohort.players_dataframe()
print(df)
file_name = 'export_file.xlsx'
df.to_excel(file_name)
print('DataFrame is written to Excel File successfully.')
这是我得到的异常:
Message=Cannot connect to host www.elevenvr.club:443 ssl:default [The semaphore timeout period has expired]
Source=C:\Users\Apache Paint\source\repos\Patrick_Kimble \Patrick_Kimble.py
StackTrace:
File "C:\Users\Apache Paint\source\repos\Patrick_Kimble\Patrick_Kimble.py", line 7, in (Current frame)
lb_cohort = ett.Cohort(ett.ETT().get_leaderboard(num_players=1000))
HTTPSConnectionPool(host='www.elevenvr.club', port=443): Max retries exceeded with url: /accounts/search/bensnow/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000025564B9AE20>: Failed to establish a new connection: [WinError 10060]
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))
编辑:
我创建了一个循环来尝试从数据库中提取所有数据。
for i in range(1, 500000):
try:
if int(i) % 100 == 0:
print('Loop is at:', i)
user_id = i
line = ett.ett_parser.get_user(user_id)
temp_df = pd.DataFrame(line, index=[i])
self.df_master = self.df_master.append(temp_df, ignore_index = True)
except Exception:
print("Error:",i )
MDR 的答案适用于 return 随机数据。但我需要使用 describe()
函数来提取更多其他细节,但它只接受 'cohort' 类型。
示例:
import pandas as pd
lb_cohort = ett.Cohort(ett.ETT().get_leaderboard(num_players=10))
lb_cohort.size
lb_cohort.describe()
应该return像下面的图片。
由于 .user_search_dataframe()
需要一个字符串和 returns 一个基于部分匹配的框架,您可以组成一长串用户名,循环它,然后将这些框架连接在一起。
示例:
from pyETT import ett
import pandas as pd
from time import sleep
from datetime import datetime
eleven = ett.ETT()
# test short list
l = ['happy', 'honey', 'mad']
# what makes a good username? 'Neo' sure, but what else?
# l = ['League', 'Knight', 'happy', 'honey', 'mad', 'crazy', 'Super', 'one', 'neo', 'duke', 'wizard', 'two', 'jon', 'bob', 'Dog']
dfs = []
for name in l:
df = eleven.user_search_dataframe(name)
#print(df.shape)
# hangs a bit so slow it down/avoid timeouts
sleep(10)
dfs.append(df)
df = pd.concat(dfs, ignore_index=True)
print('Size before dropping duplicate names: ', df.shape)
df = df.drop_duplicates(subset=['name']).reset_index(drop=True)
# should users who have never won or lost a game be removed?
# maybe they never played a game and are just in the system?
# if so uncomment this line...
# df = df.loc[(df[['wins', 'losses']] != 0).any(axis=1)]
df['last_online'] = pd.to_datetime(df['last_online'], format='%Y-%m-%dT%H:%M:%S.%fZ')
df = df.sort_values('rank')
print('Final size of frame: ', df.shape)
print('Random sample of results:', '\n')
# random sample from throughout the frame
print(df.sample(n=20))
# timestamp in file name helps if you have the file open when the script is running and it cannot overwrite
df.to_excel('export_file_' + datetime.now().strftime("%H_%M_%S") + '.xlsx', index=False)
输出(根据代码中的长列表):
Size before dropping duplicate names: (11755, 7)
Final size of frame: (11541, 7)
Random sample of results:
id name elo rank wins losses \
2767 548341 shani_ahmad 1499.0 482261 1 1
2795 575725 MadisonLu 1500.0 466685 0 0
2087 343031 Jomadi97 1797.5 5151 126 76
4640 159384 TwoHungLow 1500.0 48816 0 0
530 193084 Happybloke 1500.0 165971 1 1
3952 538362 Neo2442 1471.0 546859 0 2
783 555710 HappySanguineGaming 1485.0 477280 2 4
9435 73557 NateDoggLi 1500.0 104922 22 66
1 268668 IvyLeague412 1489.0 349980 2 2
2202 387282 Madlog31 1500.0 370736 0 0
1319 20604 Madssr1 1516.0 33429 1 1
739 407953 Happy0321 1500.0 343960 0 0
2165 379302 SamAdam 1500.0 324270 0 0
1693 222456 Hamada 1485.0 504640 0 2
778 451963 happylyu 1500.0 380432 0 0
6740 192120 JonRose32 1500.0 315481 0 0
796 562459 UiJun_Happy 1526.0 39078 19 10
7292 319677 Dapbob 1500.0 220062 0 0
4991 590248 natwon.brooks.3 1500.0 477078 0 0
9859 248163 postdog4 1500.0 107996 0 0
last_online
2767 2021-08-03 18:05:10.423
2795 2021-07-19 23:21:06.618
2087 2021-05-25 14:10:35.903
4640 2020-11-28 14:38:30.703
530 2020-12-31 17:25:02.802
3952 2021-08-18 19:29:39.149
783 2021-07-06 01:35:36.241
9435 2020-10-20 13:16:21.542
1 2021-01-31 01:23:54.627
2202 2021-06-02 19:45:27.265
1319 2020-03-27 13:17:35.754
739 2021-03-25 23:49:28.654
2165 2021-03-07 23:41:26.949
1693 2021-03-11 03:17:06.368
778 2021-04-29 16:51:31.216
6740 2021-07-08 04:08:08.927
796 2021-08-08 11:57:48.181
7292 2021-02-14 14:08:20.299
4991 2021-07-30 13:02:12.894
9859 2021-01-03 08:35:27.054