用奇怪的结果抓取 html 页面
scrape html page with strange result
刮擦有效,但奇怪的是结果是 ["-3°"]
我尝试了很多不同的方法只得到 -3°
但是如果 [" 和 "] 不在代码中,它们是怎么出现的呢!
有人能告诉我如何实现这个目标吗
我使用的代码是
<?php
function scrape($url){
$output = file_get_contents($url);
return $output;
}
function fetchdata($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start)); // Stripping $start
$stop = stripos($data, $end); // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape
return $data; // Returning the scraped data from the function
}
$page = scrape("https://weather.gc.ca/city/pages/bc-37_metric_e.html");
$result = fetchdata($page, "<p class=\"text-center mrgn-tp-md mrgn-bttm-sm lead\"><span class=\"wxo-metric-hide\">", "<abbr title=\"Celsius\">C</abbr>");
echo json_encode(array($result));
?>
已经感谢您的帮助!
您可以使用 DOMDocument 来解析 HTML 文件。
$page = file_get_contents("https://weather.gc.ca/city/pages/bc-37_metric_e.html");
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($page);
libxml_use_internal_errors(false);
$paragraphs = $doc->getElementsByTagName('p');
foreach($paragraphs as $p){
if($p->getAttribute('class') == 'text-center mrgn-tp-md mrgn-bttm-sm lead') {
foreach($p->getElementsbyTagName('span') as $attr) {
if($attr->getAttribute('class') == 'wxo-metric-hide') {
foreach($attr->getElementsbyTagName('abbr') as $abbr) {
if($abbr->getAttribute('title') == 'Celsius') {
echo trim($attr->nodeValue);
}
}
}
}
}
}
输出:
-3°C
这是假设 类 和结构一致...
刮擦有效,但奇怪的是结果是 ["-3°"]
我尝试了很多不同的方法只得到 -3°
但是如果 [" 和 "] 不在代码中,它们是怎么出现的呢!
有人能告诉我如何实现这个目标吗
我使用的代码是
<?php
function scrape($url){
$output = file_get_contents($url);
return $output;
}
function fetchdata($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start)); // Stripping $start
$stop = stripos($data, $end); // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape
return $data; // Returning the scraped data from the function
}
$page = scrape("https://weather.gc.ca/city/pages/bc-37_metric_e.html");
$result = fetchdata($page, "<p class=\"text-center mrgn-tp-md mrgn-bttm-sm lead\"><span class=\"wxo-metric-hide\">", "<abbr title=\"Celsius\">C</abbr>");
echo json_encode(array($result));
?>
已经感谢您的帮助!
您可以使用 DOMDocument 来解析 HTML 文件。
$page = file_get_contents("https://weather.gc.ca/city/pages/bc-37_metric_e.html");
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($page);
libxml_use_internal_errors(false);
$paragraphs = $doc->getElementsByTagName('p');
foreach($paragraphs as $p){
if($p->getAttribute('class') == 'text-center mrgn-tp-md mrgn-bttm-sm lead') {
foreach($p->getElementsbyTagName('span') as $attr) {
if($attr->getAttribute('class') == 'wxo-metric-hide') {
foreach($attr->getElementsbyTagName('abbr') as $abbr) {
if($abbr->getAttribute('title') == 'Celsius') {
echo trim($attr->nodeValue);
}
}
}
}
}
}
输出:
-3°C
这是假设 类 和结构一致...