Beautiful Soup - hyphenated keyword, Error :: keyword can't be an expression

Question

我正在使用 Selenium，然后使用 Beautiful Soup 来尝试抓取网页，该页面使用 JavaScript 来加载某些内容。 Selenium 给了我简单的 html，我已经使用 print 检查了它，发现它确实包含我试图抓取的部分。但我的问题是 Beautiful Soup。

我想找到 div 个带有

的标签

class="comment-detail"

我试过使用

comments = soup.find_all("div", class_="comment-detail")

但是这个returns是空的，可能是因为实际的div标签也有

data-selenium="reviews-comments"

html 中的确切标记是

<div data-selenium="reviews-comments" class="comment-detail">

所以我尝试了以下方法，

comments = soup.find_all("div", data-selenium="reviews-comments", class_="comment-detail")

但这给出了错误

SyntaxError: keyword can't be an expression

因为

data-selenium

当它实际上只是一个带连字符的单词时就像一个减法运算。我试过用引号将它括起来，但这无济于事。

我也试过了

dct = {
    'div': '',
    'data-selenium': 'reviews-comments',
    'class': 'comment-detail'

}
comments = soup.find_all(**dct)

但是

len(comments)

returns零，即评论为空

为了清楚地拿到我的汤，我有代码

from selenium import webdriver  
from selenium.common.exceptions import NoSuchElementException  
from selenium.webdriver.common.keys import Keys  
from bs4 import BeautifulSoup

browser = webdriver.Firefox()  
browser.get('http://www.agoda.com/the-coast-resort-koh-phangan/hotel/koh-phangan-th.html/')  
html_source = browser.page_source  
browser.quit()

soup = BeautifulSoup(html_source,'html.parser')

关于如何继续这里的任何想法？

Answer 1

问题源于 URL，您在末尾有一个额外的正斜杠，returns 一个 404 页面而不是您真正想要的页面。只需删除它，您的代码就可以正常工作。

这是我以防万一使用的代码：

from selenium import webdriver  
from selenium.common.exceptions import NoSuchElementException  
from selenium.webdriver.common.keys import Keys  
from bs4 import BeautifulSoup

browser = webdriver.Firefox()  
browser.get('http://www.agoda.com/the-coast-resort-koh-phangan/hotel/koh-phangan-th.html')  
html_source = browser.page_source  
browser.quit()

soup = BeautifulSoup(html_source, 'html.parser')

comments = soup.find_all("div", class_="comment-detail")

print(comments)

Beautiful Soup - hyphenated keyword, Error :: keyword can't be an expression

Beautiful Soup - hyphenated keyword, Error :: keyword can't be an expression

python

beautifulsoup

keyword