无法使用 Floki 访问特定 html

Can't access specific html with Floki

我正在尝试提取此 url 中显示的每条评论的日期和完整评论:https://www.dealerrater.com/dealer/McKaig-Chevrolet-Buick-A-Dealer-For-The-People-dealer-reviews-23685/#link 我可以得到:

June 17, 2021SALES VISIT - NEW"Joe was great and took extra time to help make sure I got..."- MelisaswartJoe was great and took extra time to help make sure I got the car that I wanted not pushing me into a car I didn’t want. He even made sure my car  was made ready in his day off. Great Job Thank You. 
Melisa SRead MoreCustomer ServiceQuality of WorkFriendlinessPricingOverall ExperienceRecommend Dealer
                Yes
            Employees Worked With 
                                             Taylor Prickett
                                         5.0
                                             Joe Wynne
                                         5.0
                                             Brandon McCloskey
                                         4.0Report |
        Print Helpful 0
    
    .review-response {
        overflow: hidden;
    }

    .open .review-response {
        max-height: none;
    }

     @media (max-width: 767px) {
         .public-messages {
             border-top: none !important;
             margin-left: 0 !important;
             margin-top: 5px !important;
             padding-top: 0 !important;
         }

         .review-hide {
             display: none !important;
         }

         .open .review-hide{
             display: block !important;
         }
     }

使用此代码:

def get_reviews_url() do
     case HTTPoison.get("https://www.dealerrater.com/dealer/McKaig-Chevrolet-Buick-A-Dealer-For-The-People-dealer-reviews-23685/#link") do
      {:ok, %HTTPoison.Response{status_code: 200, body: body}} ->
        IO.puts body
        |> Floki.find("#reviews")
        |> Enum.map(&Floki.text/1)

但是我想遍历每条评论,并将每条评论的日期和完整评论文本放入具有单独键值对的地图中。但是当我试图只抓取日期或评论文本本身时,我在 return 中什么也得不到,也无法弄清楚。这是我最好的编码尝试:

def get_reviews_url() do
    case HTTPoison.get("https://www.dealerrater.com/dealer/McKaig-Chevrolet-Buick-A-Dealer-For-The-People-dealer-reviews-23685/#link") do
     {:ok, %HTTPoison.Response{status_code: 200, body: body}} ->
       IO.puts body
       |> Floki.find("div.italic col-xs-6 col-sm-12 pad-none margin-none font-20")#html for dates
       |> Floki.find("h3.no-format inline italic-bolder font-20 dark-grey") #html for review text
       |> Enum.map(&Floki.text/1)

这只是 returns :ok 并且我在阅读文档后尝试了所有我能想到的方法并且无法得到不同的结果。任何方向都会有所帮助。谢谢。

不太确定如何回答这个问题而不只是为您做这件事,所以就这样吧。您可以根据需要进行调整。

"https://www.dealerrater.com/dealer/McKaig-Chevrolet-Buick-A-Dealer-For-The-People-dealer-reviews-23685/#link"
|> HTTPoison.get!()
|> Map.get(:body)
|> Floki.parse()
|> Floki.find(".review-entry")
|> Map.new(fn entry ->
  [{"div", _, [date]}] = Floki.find(entry, "div.italic")
  [{"p", _, [content]}]  = Floki.find(entry, "p.review-content")
  {date, content}
end)

输出:

%{
  "June 17, 2021" => "Joe was great and took extra time to help make sure I got the car that I wanted not pushing me into a car I didn’t want. He even made sure my car  was made ready in his day off. Great Job Thank You. \r\nMelisa S",
  "June 20, 2021" => "Awesome service, Adrian was great to work with I told him what I wanted and he showed me the best car Thank you so much!",
  ...
}

要点:

  1. 不要将 IO.puts 的输出 :ok 传送到 Floki(如果您正在调试,请改用 IO.inspect,return 是相同的值,使其可以在管道中使用)。
  2. 先调用Floki.parse()解析HTML。
  3. 首先使用 .review-entry 选择器找到评论,然后映射结果以提取您想要的部分。
  4. div.italic 选择器是我写的第一个用于查找日期的东西,它看起来很脆弱,所以你可能想想出一个更好的版本。
  5. 您可能想要将 Map.new 更改为 Enum.map,因为如果同一日期有多个评论,这将只 return 最后一个。更改为 Enum.map 将为您提供 {date, review} 个元组的列表。