Webscraping 客户评论 - 使用 XPath 的无效选择器错误

Webscraping customer review - Invalid selector error using XPath

我正在尝试使用 selenium 从以下站点提取用户 ID、评级和评论,但它显示“无效的选择器错误”。我认为,我试图定义以获取评论文本的 Xpath 是错误的原因。但我无法解决这个问题。本站link如下:

teslamotor review

我使用的代码如下:

#Class for Review webscraping from consumeraffairs.com site
class CarForumCrawler(): 
    def __init__(self, start_link):
        self.link_to_explore = start_link 
        self.comments = pd.DataFrame(columns = ['rating','user_id','comments'])
        self.driver = webdriver.Chrome(executable_path=r'C:/Users/mumid/Downloads/chromedriver/chromedriver.exe')            
        self.driver.get(self.link_to_explore)
        self.driver.implicitly_wait(5)
        self.extract_data()
        self.save_data_to_file()
   
    def extract_data(self):
        ids = self.driver.find_elements_by_xpath("//*[contains(@id,'review-')]")
        comment_ids = []
        for i in ids:
            comment_ids.append(i.get_attribute('id'))

        for x in comment_ids:
            #Extract dates from for each user on a page
            user_rating = self.driver.find_elements_by_xpath('//*[@id="' + x +'"]/div[1]/div/img')[0]
            rating = user_rating.get_attribute('data-rating')

            #Extract user ids from each user on a page
            userid_element = self.driver.find_elements_by_xpath('//*[@id="' + x +'"]/div[2]/div[2]/strong')[0]
            userid = userid_element.get_attribute('itemprop')

            #Extract Message for each user on a page
            user_message = self.driver.find_elements_by_xpath('//*[@id="' + x +'"]]/div[3]/p[2]/text()')[0]
            comment = user_message.text

            #Adding date, userid and comment for each user in a dataframe
            self.comments.loc[len(self.comments)] = [rating,userid,comment]

    def save_data_to_file(self):
    #we save the dataframe content to a CSV file
        self.comments.to_csv ('Tesla_rating-6.csv', index = None, header=True)
    def close_spider(self):
    #end the session
        self.driver.quit()

try:
    url = 'https://www.consumeraffairs.com/automotive/tesla_motors.html'
    mycrawler = CarForumCrawler(url)
    mycrawler.close_spider()
except:
    raise

我收到的错误如下:

此外,我尝试跟踪的 xpath 来自以下 HTML

您看到的是经典错误...

因为 find_elements_by_xpath('//*[@id="' + x +'"]]/div[3]/p[2]/text()')[0] 会 select 属性,相反你需要传递一个 表达式,selects 元素。

您需要更改为:

user_message = self.driver.find_elements_by_xpath('//*[@id="' + x +'"]]/div[3]/p[2]')[0]

参考资料

您可以在以下位置找到一些相关的详细讨论:

  • invalid selector: The result of the xpath expression "//a[contains(@href, 'mailto')]/@href" is: [object Attr] getting the href attribute with Selenium