用奇怪的结果抓取 html 页面

Question

刮擦有效，但奇怪的是结果是 ["-3°"]

我尝试了很多不同的方法只得到 -3°

但是如果 [" 和 "] 不在代码中，它们是怎么出现的呢！

有人能告诉我如何实现这个目标吗

我使用的代码是

<?php
function scrape($url){
$output = file_get_contents($url); 
return $output;
}

function fetchdata($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start));  // Stripping $start
$stop = stripos($data, $end);   // Getting the position of the $end of the    data to scrape
$data = substr($data, 0, $stop);    // Stripping all data from after and including the $end of the data to scrape
return $data;   // Returning the scraped data from the function
}

$page = scrape("https://weather.gc.ca/city/pages/bc-37_metric_e.html");   
$result = fetchdata($page, "<p class=\"text-center mrgn-tp-md mrgn-bttm-sm     lead\"><span class=\"wxo-metric-hide\">", "<abbr title=\"Celsius\">C</abbr>");
echo json_encode(array($result));    
?>

已经感谢您的帮助！

Answer 1

您可以使用 DOMDocument 来解析 HTML 文件。

$page = file_get_contents("https://weather.gc.ca/city/pages/bc-37_metric_e.html");
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($page);
libxml_use_internal_errors(false);
$paragraphs = $doc->getElementsByTagName('p');
foreach($paragraphs as $p){
    if($p->getAttribute('class') == 'text-center mrgn-tp-md mrgn-bttm-sm lead') {
        foreach($p->getElementsbyTagName('span') as $attr) {
            if($attr->getAttribute('class') == 'wxo-metric-hide') {
                foreach($attr->getElementsbyTagName('abbr') as $abbr) {
                    if($abbr->getAttribute('title') == 'Celsius') {
                        echo trim($attr->nodeValue);
                    }
                }
            }
        }
    }
}

输出：

-3°C

这是假设类和结构一致...

用奇怪的结果抓取 html 页面

scrape html page with strange result

html

php

scrape