DOMDocument - 从正文中获取脚本文本
DOMDocument - get script text from within body
我想做的是从 body 标签中获取脚本,但只有包含文本而不是脚本链接的脚本
例如。 <script type="text/javascript">console.log("for a test run");</script>
不是具有文件 src 的脚本。
我想将这些脚本放在页尾 </body>
之前。
到目前为止我有
echo "<pre>";
echo "reaches 1 <br />";
//work for inpage scripts
$mainBody = @$dom->getElementsByTagName('body')->item(0);
foreach (@$dom->getElementsByTagName('body') as $head) {
echo "reaches 2";
foreach (@$head->childNodes as $node) {
echo "reaches 3";
var_dump($node);
if ($node instanceof DOMComment) {
if (preg_match('/<script/i', $node->nodeValue)){
$src = $node->nodeValue;
echo "its a node";
var_dump($node);
}
}
if ($node->nodeName == 'script' && $node->attributes->getNamedItem('type')->nodeValue == 'text/javascript') {
if (@$src = $node->attributes->getNamedItem('src')->nodeValue) {
// yay - $src was true, so we don't do anything here
} else {
$src = $node->nodeValue;
}
echo "its a node2";
var_dump($node);
}
if (isset($src)) {
$move = ($this->params->get('exclude')) ? true : false;
foreach ($omit as $omitit) {
if (preg_match($omitit, $src) == 1) {
$move = ($this->params->get('exclude')) ? false : true;
break;
}
}
if ($move)
$moveme[] = $node;
unset($src);
}
}
}
foreach ($moveme as $moveit) {
echo "Moving";
print_r($moveit);
$mainBody->appendChild($moveit->cloneNode(true));
if ($pretty) {
$mainBody->appendChild($newline->cloneNode(false));
}
$moveit->parentNode->removeChild($moveit);
}
$mainBody = $xhtml ? $dom->saveXML() : $dom->saveHTML();
JResponse::setBody($sanitize?preg_replace($this->sanitizews['search'],$this->sanitizews['replace'],$mainBody):$mainBody);
更新 1
问题是 <script type="text/javascript">
也可以在 div 中,也可以在嵌套的 div 中。因此,使用 foreach @$head->childNodes
只会获取顶部的 html 标签,而不会扫描可能包含 <script>
标签的内部标签。我不明白如何获取所有必需的脚本标签。
并且没有错误,但顶级节点上也没有脚本标签。
更新 2
经过xpath的回答,感谢您的回答。任务有一定进展。但是现在将脚本移动到页脚后,我无法 delete/remove 原始脚本标签。
这是我目前的更新代码:
echo "<pre>3";
// echo "reaches 1 <br />";
//work for inpage scripts
$xpath = new DOMXPath($dom);
$script_tags = $xpath->query('//body//script[not(@src)]');
foreach ($script_tags as $tag) {
// var_dump($tag->nodeValue);
$moveme[] = $tag;
}
$mainBody = @$dom->getElementsByTagName('body')->item(0);
foreach ($moveme as $moveItScript) {
print_r($moveItScript->cloneNode(true));
$mainBody->appendChild($moveItScript->cloneNode(true));
// var_dump($moveItScript->parentNode);
// $moveItScript->parentNode->removeChild($moveItScript);
/* try{
$mainBody->appendChild($moveit->cloneNode(true));
if ($pretty) {
$body->appendChild($newline->cloneNode(false));
}
$moveit->parentNode->removeChild($moveit);
}catch (Exception $ex){
var_dump($ex);
}*/
}
echo "</pre>";
更新 3
我在为 Joomla 工作,试图将脚本移动到页面的页脚。我使用了 scriptsdown 插件,它将脚本从 head 标签移动到底部。但是中间页面的脚本没有移到底部,所以导致页内脚本无法正确响应的原因。
我的问题现在已经解决了。发布我的解决方案代码,以便将来对某人有所帮助。
function onAfterRender() {
$app = JFactory::getApplication();
$doc = JFactory::getDocument();
/* test that the page is not administrator && test that the document is HTML output */
if ($app->isAdmin() || $doc->getType() != 'html')
return;
$pretty = (int)$this->params->get('pretty', 0);
$stripcomments = (int)$this->params->get('stripcomments', 0);
$sanitize = (int)$this->params->get('sanitize',0);
$debug = (int)$app->getCfg('debug',0);
if($debug) $pretty = true;
$omit = array();
/* now we know this is a frontend page and it is html - begin processing */
/* first - prepare the omit array */
if (strlen(trim($this->params->get('omit'))) > 0) {
foreach (explode("\n", $this->params->get('omit')) as $omitme) {
$omit[] = '/' . str_replace(array('/', '\''), array('\/', '\\''), trim($omitme)) . '/i';
}
unset($omitme);
}
$moveme = array();
$dom = new DOMDocument();
$dom->recover = true;
$dom->substituteEntities = true;
if ($pretty) {
$dom->formatOutput = true;
} else {
$dom->preserveWhiteSpace = false;
}
$source = JResponse::getBody();
/* DOMDocument can get quite vocal when malformed HTML/XHTML is loaded.
* First we grab the current level, and set the error reporting level
* to zero, afterwards, we return it to the original value. This trickery
* is used to keep the logs clear of DOMDocument protests while loading the source.
* I promise to set the level back as soon as I'm done loading source...
*/
if(!$debug) $erlevel = error_reporting(0);
$xhtml = (preg_match('/XHTML/', $source)) ? true : false;
switch ($xhtml) {
case true:
$dom->loadXML($source);
break;
case false:
$dom->loadHTML($source);
break;
}
if(!$debug) error_reporting($erlevel); /* You see, error_reporting is back to normal - just like I promised */
if ($pretty) {
$newline = $dom->createTextNode("\n");
}
if($sanitize && !$debug && !$pretty) {
$this->_sanitizeCSS($dom->getElementsByTagName('style'));
}
if ($stripcomments && !$debug) {
$comments = $this->_domComments($dom);
foreach ($comments as $node)
if (!preg_match('/\[endif]/i', $node->nodeValue)) // we don't remove IE conditionals
if ($node->parentNode->nodeName != 'script') // we also don't remove comments in javascript because some developers write JS inside of a comment
$node->parentNode->removeChild($node);
}
$body = @$dom->getElementsByTagName('footer')->item(0);
foreach (@$dom->getElementsByTagName('head') as $head) {
foreach (@$head->childNodes as $node) {
if ($node instanceof DOMComment) {
if (preg_match('/<script/i', $node->nodeValue))
$src = $node->nodeValue;
}
if ($node->nodeName == 'script' && $node->attributes->getNamedItem('type')->nodeValue == 'text/javascript') {
if (@$src = $node->attributes->getNamedItem('src')->nodeValue) {
// yay - $src was true, so we don't do anything here
} else {
$src = $node->nodeValue;
}
}
if (isset($src)) {
$move = ($this->params->get('exclude')) ? true : false;
foreach ($omit as $omitit) {
if (preg_match($omitit, $src) == 1) {
$move = ($this->params->get('exclude')) ? false : true;
break;
}
}
if ($move)
$moveme[] = $node;
unset($src);
}
}
}
foreach ($moveme as $moveit) {
$body->appendChild($moveit->cloneNode(true));
if ($pretty) {
$body->appendChild($newline->cloneNode(false));
}
$moveit->parentNode->removeChild($moveit);
}
//work for inpage scripts
$xpath = new DOMXPath($dom);
$script_tags = $xpath->query('//body//script[not(@src)]');
$mainBody = @$dom->getElementsByTagName('body')->item(0);
foreach ($script_tags as $tag) {
$mainBody->appendChild($tag->cloneNode(true));
$tag->parentNode->removeChild($tag);
}
$body = $xhtml ? $dom->saveXML() : $dom->saveHTML();
JResponse::setBody($sanitize?preg_replace($this->sanitizews['search'],$this->sanitizews['replace'],$body):$body);
}
为了仅获取不具有 src
属性的 <script>
节点,您最好使用 DOMXPath
:
$xpath = new DOMXPath($dom);
$script_tags = $xpath->query('//body//script[not(@src)]');
变量 $script_tags
现在是一个包含所有脚本标签的 DOMNodeList
对象。
您现在可以遍历 DOMNodeList
以获取所有节点并对它们执行任何您想执行的操作:
foreach ($script_tags as $tag) {
var_dump($tag->nodeValue);
$moveme[] = $tag;
}
我想做的是从 body 标签中获取脚本,但只有包含文本而不是脚本链接的脚本
例如。 <script type="text/javascript">console.log("for a test run");</script>
不是具有文件 src 的脚本。
我想将这些脚本放在页尾 </body>
之前。
到目前为止我有
echo "<pre>";
echo "reaches 1 <br />";
//work for inpage scripts
$mainBody = @$dom->getElementsByTagName('body')->item(0);
foreach (@$dom->getElementsByTagName('body') as $head) {
echo "reaches 2";
foreach (@$head->childNodes as $node) {
echo "reaches 3";
var_dump($node);
if ($node instanceof DOMComment) {
if (preg_match('/<script/i', $node->nodeValue)){
$src = $node->nodeValue;
echo "its a node";
var_dump($node);
}
}
if ($node->nodeName == 'script' && $node->attributes->getNamedItem('type')->nodeValue == 'text/javascript') {
if (@$src = $node->attributes->getNamedItem('src')->nodeValue) {
// yay - $src was true, so we don't do anything here
} else {
$src = $node->nodeValue;
}
echo "its a node2";
var_dump($node);
}
if (isset($src)) {
$move = ($this->params->get('exclude')) ? true : false;
foreach ($omit as $omitit) {
if (preg_match($omitit, $src) == 1) {
$move = ($this->params->get('exclude')) ? false : true;
break;
}
}
if ($move)
$moveme[] = $node;
unset($src);
}
}
}
foreach ($moveme as $moveit) {
echo "Moving";
print_r($moveit);
$mainBody->appendChild($moveit->cloneNode(true));
if ($pretty) {
$mainBody->appendChild($newline->cloneNode(false));
}
$moveit->parentNode->removeChild($moveit);
}
$mainBody = $xhtml ? $dom->saveXML() : $dom->saveHTML();
JResponse::setBody($sanitize?preg_replace($this->sanitizews['search'],$this->sanitizews['replace'],$mainBody):$mainBody);
更新 1
问题是 <script type="text/javascript">
也可以在 div 中,也可以在嵌套的 div 中。因此,使用 foreach @$head->childNodes
只会获取顶部的 html 标签,而不会扫描可能包含 <script>
标签的内部标签。我不明白如何获取所有必需的脚本标签。
并且没有错误,但顶级节点上也没有脚本标签。
更新 2
经过xpath的回答,感谢您的回答。任务有一定进展。但是现在将脚本移动到页脚后,我无法 delete/remove 原始脚本标签。
这是我目前的更新代码:
echo "<pre>3";
// echo "reaches 1 <br />";
//work for inpage scripts
$xpath = new DOMXPath($dom);
$script_tags = $xpath->query('//body//script[not(@src)]');
foreach ($script_tags as $tag) {
// var_dump($tag->nodeValue);
$moveme[] = $tag;
}
$mainBody = @$dom->getElementsByTagName('body')->item(0);
foreach ($moveme as $moveItScript) {
print_r($moveItScript->cloneNode(true));
$mainBody->appendChild($moveItScript->cloneNode(true));
// var_dump($moveItScript->parentNode);
// $moveItScript->parentNode->removeChild($moveItScript);
/* try{
$mainBody->appendChild($moveit->cloneNode(true));
if ($pretty) {
$body->appendChild($newline->cloneNode(false));
}
$moveit->parentNode->removeChild($moveit);
}catch (Exception $ex){
var_dump($ex);
}*/
}
echo "</pre>";
更新 3
我在为 Joomla 工作,试图将脚本移动到页面的页脚。我使用了 scriptsdown 插件,它将脚本从 head 标签移动到底部。但是中间页面的脚本没有移到底部,所以导致页内脚本无法正确响应的原因。
我的问题现在已经解决了。发布我的解决方案代码,以便将来对某人有所帮助。
function onAfterRender() {
$app = JFactory::getApplication();
$doc = JFactory::getDocument();
/* test that the page is not administrator && test that the document is HTML output */
if ($app->isAdmin() || $doc->getType() != 'html')
return;
$pretty = (int)$this->params->get('pretty', 0);
$stripcomments = (int)$this->params->get('stripcomments', 0);
$sanitize = (int)$this->params->get('sanitize',0);
$debug = (int)$app->getCfg('debug',0);
if($debug) $pretty = true;
$omit = array();
/* now we know this is a frontend page and it is html - begin processing */
/* first - prepare the omit array */
if (strlen(trim($this->params->get('omit'))) > 0) {
foreach (explode("\n", $this->params->get('omit')) as $omitme) {
$omit[] = '/' . str_replace(array('/', '\''), array('\/', '\\''), trim($omitme)) . '/i';
}
unset($omitme);
}
$moveme = array();
$dom = new DOMDocument();
$dom->recover = true;
$dom->substituteEntities = true;
if ($pretty) {
$dom->formatOutput = true;
} else {
$dom->preserveWhiteSpace = false;
}
$source = JResponse::getBody();
/* DOMDocument can get quite vocal when malformed HTML/XHTML is loaded.
* First we grab the current level, and set the error reporting level
* to zero, afterwards, we return it to the original value. This trickery
* is used to keep the logs clear of DOMDocument protests while loading the source.
* I promise to set the level back as soon as I'm done loading source...
*/
if(!$debug) $erlevel = error_reporting(0);
$xhtml = (preg_match('/XHTML/', $source)) ? true : false;
switch ($xhtml) {
case true:
$dom->loadXML($source);
break;
case false:
$dom->loadHTML($source);
break;
}
if(!$debug) error_reporting($erlevel); /* You see, error_reporting is back to normal - just like I promised */
if ($pretty) {
$newline = $dom->createTextNode("\n");
}
if($sanitize && !$debug && !$pretty) {
$this->_sanitizeCSS($dom->getElementsByTagName('style'));
}
if ($stripcomments && !$debug) {
$comments = $this->_domComments($dom);
foreach ($comments as $node)
if (!preg_match('/\[endif]/i', $node->nodeValue)) // we don't remove IE conditionals
if ($node->parentNode->nodeName != 'script') // we also don't remove comments in javascript because some developers write JS inside of a comment
$node->parentNode->removeChild($node);
}
$body = @$dom->getElementsByTagName('footer')->item(0);
foreach (@$dom->getElementsByTagName('head') as $head) {
foreach (@$head->childNodes as $node) {
if ($node instanceof DOMComment) {
if (preg_match('/<script/i', $node->nodeValue))
$src = $node->nodeValue;
}
if ($node->nodeName == 'script' && $node->attributes->getNamedItem('type')->nodeValue == 'text/javascript') {
if (@$src = $node->attributes->getNamedItem('src')->nodeValue) {
// yay - $src was true, so we don't do anything here
} else {
$src = $node->nodeValue;
}
}
if (isset($src)) {
$move = ($this->params->get('exclude')) ? true : false;
foreach ($omit as $omitit) {
if (preg_match($omitit, $src) == 1) {
$move = ($this->params->get('exclude')) ? false : true;
break;
}
}
if ($move)
$moveme[] = $node;
unset($src);
}
}
}
foreach ($moveme as $moveit) {
$body->appendChild($moveit->cloneNode(true));
if ($pretty) {
$body->appendChild($newline->cloneNode(false));
}
$moveit->parentNode->removeChild($moveit);
}
//work for inpage scripts
$xpath = new DOMXPath($dom);
$script_tags = $xpath->query('//body//script[not(@src)]');
$mainBody = @$dom->getElementsByTagName('body')->item(0);
foreach ($script_tags as $tag) {
$mainBody->appendChild($tag->cloneNode(true));
$tag->parentNode->removeChild($tag);
}
$body = $xhtml ? $dom->saveXML() : $dom->saveHTML();
JResponse::setBody($sanitize?preg_replace($this->sanitizews['search'],$this->sanitizews['replace'],$body):$body);
}
为了仅获取不具有 src
属性的 <script>
节点,您最好使用 DOMXPath
:
$xpath = new DOMXPath($dom);
$script_tags = $xpath->query('//body//script[not(@src)]');
变量 $script_tags
现在是一个包含所有脚本标签的 DOMNodeList
对象。
您现在可以遍历 DOMNodeList
以获取所有节点并对它们执行任何您想执行的操作:
foreach ($script_tags as $tag) {
var_dump($tag->nodeValue);
$moveme[] = $tag;
}