卡在循环 <> 代码不想拉除第 1 行以外的任何内容

Stuck in loop <> Code doesn't want to pull anything except row 1

我陷入了循环,我不知道要更改什么才能使我的代码正常工作... 问题出在 CSV 文件上,我的文件包含域列表(freedommortgage.com、google.com、amd.com 等),所以当我 运行 代码时,一开始一切都很好,但是然后它一直向我发送相同的结果:

freedommortgage.com 的每月总访问量为 1.10M

So here is my line:

import csv
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import urllib
from captcha2upload import CaptchaUpload
import time


# setting the firefox driver
def init_driver():
    driver = webdriver.Firefox(executable_path=r'C:\Users\muki\Desktop\similarweb_scrapper-master\geckodriver.exe')
    driver.implicitly_wait(10)
    return driver


# solving the captcha (with 2captcha.com)
def captcha_solver(driver):
    captcha_src = driver.find_element_by_id('recaptcha_challenge_image').get_attribute("src")
    urllib.urlretrieve(captcha_src, "captcha.jpg")
    captcha = CaptchaUpload("4cfd308fd703d40291a7e250d743ca84")  # 2captcha API KEY
    captcha_answer = captcha.solve("captcha.jpg")
    wait = WebDriverWait(driver, 10)
    captcha_input_box = wait.until(
        EC.presence_of_element_located((By.ID, "recaptcha_response_field")))
    captcha_input_box.send_keys(captcha_answer)
    driver.implicitly_wait(10)
    captcha_input_box.submit()


# inputting the domain in similar web search box and finding necessary values
def lookup(driver, domain, short_method):
    # short method - inputting the domain in the url 
    if short_method:
        driver.get("https://www.similarweb.com/website/" + domain)
    else:
        driver.get("https://www.similarweb.com")
    attempt = 0
    # trying 3 times before quiting (due to second refresh by the website that clears the search box)
    while attempt < 1:
        try:
            captcha_body_page = driver.find_elements_by_class_name("block-page")
            driver.implicitly_wait(10)
            if captcha_body_page:
                print("Captcha ahead, solving the captcha, it may take a few seconds")
                captcha_solver(driver)
                print("Captcha solved! the program will continue shortly")
                time.sleep(20)  # to prevent second refresh affecting the upcoming elements finding after captcha solved
        # for normal method, inputting the domain in the searchbox instead of url
            if not short_method:
                input_element = driver.find_element_by_id("js-swSearch-input")
                input_element.click()
                input_element.send_keys(domain)
                input_element.submit()
            wait = WebDriverWait(driver, 10)
            time.sleep(10)
            total_visits = wait.until(
                EC.presence_of_element_located((By.XPATH, "//span[@class='engagementInfo-valueNumber js-countValue']")))


            total_visits_line = "the monthly total visits to %s is %s" % (domain, total_visits.text)
            time.sleep(10)
            print('\n' + total_visits_line)


        except TimeoutException:
            print("Box or Button or Element not found in similarweb while checking %s" % domain)
            attempt += 1
            print("attempt number %d... trying again" % attempt)


# main
if __name__ == "__main__":
    with open('bigdomains.csv', 'rt') as f:
        reader = csv.reader(f)
        driver = init_driver()
        for row in reader:
            domain = row[0]
            lookup(driver, domain, True) # user need to give as a parameter True or False, True will activate the
            # short method, False will take the normal method

(抱歉代码太长,但我必须展示所有内容,即使重点放在代码的最后一部分)

我的问题很简单:

为什么它一直采用第 1 行域,而忽略第 2 行、第 3 行、第 4 行等...?

时间 = 延迟必须为 10 或更多,以避免此网站出现验证码问题

如果有人想 运行 这个,你必须编辑 csv 文件的名称,当然,其中的域名格式要少 google.com(不是 www.google.com) .

看起来你总是每次访问同一个索引:

domain = row[0]

索引 0 是第一项,因此您一直得到相同的值。

此 post 解释了在 Python 中使用 for 循环的另一种方法。

Accessing the index in 'for' loops?