file_get_contents() 删除 XML 标签

Question

在浏览器中，以下 url http://kulturarvsdata.se/raa/fmi/xml/10028201230001 显示为常规 XML 文件。但是当我使用

file_get_contents('http://kulturarvsdata.se/raa/fmi/xml/10028201230001');

它会删除所有 XML 标签，只删除 returns 包含的文本。为什么会发生这种情况，我该如何避免？

一些回应headers：

array(5) { 
    [0]=> string(15) "HTTP/1.1 200 OK" 
    [1]=> string(35) "Date: Thu, 01 Jan 2015 20:07:04 GMT" 
    [2]=> string(25) "Server: Apache-Coyote/1.1" 
    [3]=> string(43) "Content-Type: application/xml;charset=UTF-8" 
    [4]=> string(17) "Connection: close" 
}

Answer 1

请试试这个

simplexml_load_file — 将 XML 文件解释为对象

<?php
// The file test.xml contains an XML document with a root element
// and at least an element /[root]/title.

if (file_exists('test.xml')) {
    $xml = simplexml_load_file('test.xml');

    print_r($xml);
} else {
    exit('Failed to open test.xml.');
}
?>

Answer 2

无法重现：

<?php
/**
 * 
 */

header('Content-Type: text/plain; charset=utf-8');
echo substr(file_get_contents('http://kulturarvsdata.se/raa/fmi/xml/10028201230001'), 0, 256);

<?xml version="1.0" encoding="UTF-8"?><pres:item xmlns:pres="http://kulturarvsdata.se/presentation#"><pres:id>10028201230001</pres:id><pres:entityUri>http://kulturarvsdata.se/raa/fmi/10028201230001</pres:entityUri><pres:type>Kulturlämning</pres:type><pres

所以答案是：工作正常。

您也许会查看删除标签的浏览器响应？这至少是 Whosebug 上一些用户询问的常见错误。

Answer 3

有同样的问题，我通过改变获取信息的方式解决了它，所以我没有使用 file_get_contents，而是使用了 curl:

$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
curl_setopt ($ch, CURLOPT_TIMEOUT, 10);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
$result = curl_exec ($ch);
curl_close ($ch);

file_get_contents() 删除 XML 标签

file_get_contents() removes XML tags

php

xml

api

file-get-contents