我如何才能真正释放 CPU 资源用于 Jupyter Notebook 中的这个 for 循环?
How can I actually free up CPU resources for this for loop in Jupyter Notebook?
我每天都在尝试在 Jupyter Notebook(来自 deepnote.com)中 运行 一个自动化过程,但是 在 运行 第一次awhile loop
的迭代并开始下一次迭代(在while loop
内的for loop
),虚拟机崩溃抛出下面的消息:
KernelInterrupted: Execution interrupted by the Jupyter kernel
代码如下:
.
.
.
while y < 5:
print(f'\u001b[45m Try No. {y} out of 5 3[0m')
#make the driver wait up to 10 seconds before doing anything.
driver.implicitly_wait(10)
#values for the example.
#Declaring several variables for looping.
#Let's start at the newest page.
link = 'https...'
driver.get(link)
#Here we use an Xpath element to get the initial page
initial_page = int(driver.find_element_by_xpath('Xpath').text)
print(f'The initial page is the No. {initial_page}')
final_page = initial_page + 120
pages = np.arange(initial_page, final_page+1, 1)
minimun_value = 0.95
maximum_value = 1.2
#the variable to_place is set as a string value that must exist in the rows in order to be scraped.
#if it doesn't exist it is ignored.
to_place = 'A particular place'
#the same comment stated above is applied to the variable POINTS.
POINTS = 'POINTS'
#let's set a final dataframe which will contain all the scraped data from the arange that
#matches with the parameters set (minimun_value, maximum value, to_place, POINTS).
df_final = pd.DataFrame()
dataframe_final = pd.DataFrame()
#set another final dataframe for the 2ND PART OF THE PROCESS.
initial_df = pd.DataFrame()
#set a for loop for each page from the arange.
for page in pages:
#INITIAL SEARCH.
#look for general data of the link.
#amount of results and pages for the execution of the for loop, "page" variable is used within the {}.
url = 'https...page={}&p=1'.format(page)
print(f'\u001b[42m Current page: {page} 3[0m '+'\u001b[42m Final page: '+str(final_page)+'3[0m '+'\u001b[42m Page left: '+str(final_page-page)+'3[0m '+'\u001b[45m Try No. '+str(y)+' out of '+str(5)+'3[0m'+'\n')
driver.get(url)
#Here we order the scrapper to try finding the total number of subpages a particular page has if such page IS NOT empty.
#if so, the scrapper will proceed to execute the rest of the procedure.
try:
subpages = driver.find_element_by_xpath('Xpath').text
print(f'Reading the information about the number of subpages of this page ... {subpages}')
subpages = int(re.search(r'\d{0,3}$', subpages).group())
print(f'This page has {subpages} subpages in total')
df = pd.DataFrame()
df2 = pd.DataFrame()
print(df)
print(df2)
#FOR LOOP.
#search at each subpage all the rows that contain the previous parameters set.
#minimun_value, maximum value, to_place, POINTS.
#set a sub-loop for each row from the table of each subpage of each page
for subpage in range(1,subpages+1):
url = 'https...page={}&p={}'.format(page,subpage)
driver.get(url)
identities_found = int(driver.find_element_by_xpath('Xpath').text.replace('A total of ','').replace(' identities found','').replace(',',''))
identities_found_last = identities_found%50
print(f'Página: {page} de {pages}') #AT THIS LINE CRASHED THE LAST TIME
.
.
.
#If the particular page is empty
except:
print(f'This page No. {page} IT'S EMPTY ¯\_₍⸍⸌̣ʷ̣̫⸍̣⸌₎_/¯, ¡NEXT! ')
.
.
.
y += 1
最初我认为 KernelInterrupted Error
是由于我的虚拟机在 [=43= 时缺少虚拟内存而抛出的]第二次迭代...
但经过多次测试后,我发现我的程序根本不消耗 RAM,因为在内核崩溃之前的所有过程中,虚拟 RAM 并没有发生太大变化。我可以保证。
所以现在我想也许我的虚拟机的 virtual CPU 是导致内核崩溃的原因,但如果是这样的话我只是不明白为什么,这是我第一次不得不处理这种情况,这个程序运行在我的电脑.
中非常完美
这里有数据科学家或机器学习工程师可以帮助我吗?提前致谢。
我在 Deepnote 社区论坛本身找到了答案,只是这个平台的“免费套餐”机器不能保证永久运行(24 小时/7),无论在其 VM 中执行的程序如何。
就是这样。问题已解决。
我每天都在尝试在 Jupyter Notebook(来自 deepnote.com)中 运行 一个自动化过程,但是 在 运行 第一次awhile loop
的迭代并开始下一次迭代(在while loop
内的for loop
),虚拟机崩溃抛出下面的消息:
KernelInterrupted: Execution interrupted by the Jupyter kernel
代码如下:
.
.
.
while y < 5:
print(f'\u001b[45m Try No. {y} out of 5 3[0m')
#make the driver wait up to 10 seconds before doing anything.
driver.implicitly_wait(10)
#values for the example.
#Declaring several variables for looping.
#Let's start at the newest page.
link = 'https...'
driver.get(link)
#Here we use an Xpath element to get the initial page
initial_page = int(driver.find_element_by_xpath('Xpath').text)
print(f'The initial page is the No. {initial_page}')
final_page = initial_page + 120
pages = np.arange(initial_page, final_page+1, 1)
minimun_value = 0.95
maximum_value = 1.2
#the variable to_place is set as a string value that must exist in the rows in order to be scraped.
#if it doesn't exist it is ignored.
to_place = 'A particular place'
#the same comment stated above is applied to the variable POINTS.
POINTS = 'POINTS'
#let's set a final dataframe which will contain all the scraped data from the arange that
#matches with the parameters set (minimun_value, maximum value, to_place, POINTS).
df_final = pd.DataFrame()
dataframe_final = pd.DataFrame()
#set another final dataframe for the 2ND PART OF THE PROCESS.
initial_df = pd.DataFrame()
#set a for loop for each page from the arange.
for page in pages:
#INITIAL SEARCH.
#look for general data of the link.
#amount of results and pages for the execution of the for loop, "page" variable is used within the {}.
url = 'https...page={}&p=1'.format(page)
print(f'\u001b[42m Current page: {page} 3[0m '+'\u001b[42m Final page: '+str(final_page)+'3[0m '+'\u001b[42m Page left: '+str(final_page-page)+'3[0m '+'\u001b[45m Try No. '+str(y)+' out of '+str(5)+'3[0m'+'\n')
driver.get(url)
#Here we order the scrapper to try finding the total number of subpages a particular page has if such page IS NOT empty.
#if so, the scrapper will proceed to execute the rest of the procedure.
try:
subpages = driver.find_element_by_xpath('Xpath').text
print(f'Reading the information about the number of subpages of this page ... {subpages}')
subpages = int(re.search(r'\d{0,3}$', subpages).group())
print(f'This page has {subpages} subpages in total')
df = pd.DataFrame()
df2 = pd.DataFrame()
print(df)
print(df2)
#FOR LOOP.
#search at each subpage all the rows that contain the previous parameters set.
#minimun_value, maximum value, to_place, POINTS.
#set a sub-loop for each row from the table of each subpage of each page
for subpage in range(1,subpages+1):
url = 'https...page={}&p={}'.format(page,subpage)
driver.get(url)
identities_found = int(driver.find_element_by_xpath('Xpath').text.replace('A total of ','').replace(' identities found','').replace(',',''))
identities_found_last = identities_found%50
print(f'Página: {page} de {pages}') #AT THIS LINE CRASHED THE LAST TIME
.
.
.
#If the particular page is empty
except:
print(f'This page No. {page} IT'S EMPTY ¯\_₍⸍⸌̣ʷ̣̫⸍̣⸌₎_/¯, ¡NEXT! ')
.
.
.
y += 1
最初我认为 KernelInterrupted Error
是由于我的虚拟机在 [=43= 时缺少虚拟内存而抛出的]第二次迭代...
但经过多次测试后,我发现我的程序根本不消耗 RAM,因为在内核崩溃之前的所有过程中,虚拟 RAM 并没有发生太大变化。我可以保证。
所以现在我想也许我的虚拟机的 virtual CPU 是导致内核崩溃的原因,但如果是这样的话我只是不明白为什么,这是我第一次不得不处理这种情况,这个程序运行在我的电脑.
中非常完美这里有数据科学家或机器学习工程师可以帮助我吗?提前致谢。
我在 Deepnote 社区论坛本身找到了答案,只是这个平台的“免费套餐”机器不能保证永久运行(24 小时/7),无论在其 VM 中执行的程序如何。
就是这样。问题已解决。