如何剥离 html 标签 类 和 id 并仅保留形式及其带有 PHP 的元素?
How to strip html tags classes and ids and preserve only form and it's elements with PHP?
我一直在四处寻找可以让我可靠地将一大块 HTML 代码剥离成裸露形式及其元素的东西。我需要一个解决方案来删除所有非表单元素,包括所有内容、类 和 id。我可以使用 JavaScript 或 PHP.
有人能给我指出正确的方向吗and/or 提供一些小的示例代码和建议来帮助我入门?
给你一些背景..
在大多数情况下,自动回复服务提供商会提供各种类型的可嵌入代码。由于我永远无法理解的原因,他们从不提供干净的表单代码。周围总是有一些丑陋的垃圾,然后必须清理它们。
这是来自响应程序服务的示例可嵌入代码
<style>
._form {
position:relative;
background:#fff;
width:400px;/*F*/
padding:0!important;
text-align:left;
}
._form em {
color:#9a9a9a;
}
._form a {
margin-left:3px;
}
._form ._field,
._form ._field ._label,
._form ._type_radio,
._form ._type_checkbox,
._form ._type_captcha,
._form ._field table {
background:none;
}
._form ._field {
position:relative;
width:100%;
cursor:move;
font-style:normal;
margin:1.2em 0;
padding:0;
overflow:hidden;
}
._form ._field input[type="text"] {
width:100%;
padding:8px;
font-size:16px;
border:1px solid #b6b6b6;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._field ._label {
display:block;
margin:0 0 0.5em;
padding:0!important;
font-size:15px;
}
._form ._field ._option input[type="checkbox"],
._form ._field ._option input[type="radio"] {
position:relative;
width:13px;
height:13px;
margin:-4px 0 0 1px;
cursor:pointer;
vertical-align:middle;
}
._form ._field ._option input[type="submit"],
._form ._field ._option input[type="button"] {
margin:0;
cursor:pointer;
height:35px;
width:auto;
font-size:15px;
}
._form ._field ._option select {
display:block;
margin:0;
padding:0;
width:auto;
font-size:15px;
border:1px solid #b6b6b6;
}
._form ._type_radio ._option,
._form ._type_checkbox ._option {
font-size:13px;
font-weight:normal;
line-height:1.8;
}
._form ._type_date ._option input[type="text"] {
float:left;
width:100px;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._type_date ._option input[type="button"] {
width:37px;
height:36px;
margin-left:5px;
padding:20px;
border:none;
outline:none;
text-indent:-9999px;
}
._form ._type_captcha img {
float:left;
margin:0 6px 0 0;
width:70px;
height:33px;
border:1px solid #b6b6b6;
}
._form ._type_captcha input[type="text"] {
margin:-14px 0 0 0!important;
width:25%;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._field table {
width:100%!important;
}
._form ._field table tbody tr td {
width:50%!important;
font-size:15px;
}
._form {
width:265px;/*F*/
background:#fff;
color:#2c2c2c;
font-weight:normal;
}
._form #notice {
margin:10px 0 0 -3px!important;
padding:0;
color:#acacac;
font-size:11px;
font-family:helvetica,arial,sans-serif;
}
._form #notice a:link, ._form #notice a:visited {
color:#acacac;
text-decoration:underline;
}
._form ._field {
position:relative;
width:100%;
cursor:default;
font-style:normal;
margin:0 0 16px;
padding:0;
overflow:hidden;
}
._form ._field input[type="text"],
._form ._field input[type="email"] {
width:100%;
padding:4px;
font-size:14px;
background:#fafafa;
border:1px solid #c7c7c7;
border-top:1px solid #b6b6b6;
-webkit-border-radius:3px;
-moz-border-radius:3px;
border-radius:3px;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._field ._label {
margin:0 0 4px;
color:#2c2c2c;
font-size:13px;
font-family:helvetica,arial,sans-serif;
font-weight:700;
}
._form ._field ._option {
margin:0;
padding:0;
color:#2c2c2c;
font-size:13px;
font-family:helvetica,arial,sans-serif;
font-weight:normal;
line-height:20px;
}
._form ._type_header ._label {
width:100%;
font-style:normal;
font-size:16px!important;
line-height:20px;
color:#005698;
margin:0 0 5px!important;
padding:0 0 10px!important;
overflow:hidden;
border-bottom:1px solid #e0e0e0;
}
._form ._type_input ._option textarea{
width:97%!important;
background:#fafafa;
border:1px solid #c7c7c7;
border-top:1px solid #b6b6b6;
-webkit-border-radius:3px;
-moz-border-radius:3px;
border-radius:3px;
}
._form ._type_input ._option input[type="submit"],
._form ._type_input ._option input[type="button"] {
width:auto;
margin:10px 0 0!important;
padding:2px 15px!important;
cursor:pointer;
font-family:verdana,arial,sans-serif;
font-weight:700;
font-size:12px;
color:#3f3f3f;
background:#f7f7f7;
border:1px solid #999999;
border-bottom:1px solid #888888;
text-align:center;
}
._form ._type_input ._option input[type="submit"]:hover,
._form ._type_input ._option input[type="button"]:hover {
border:1px solid #afafaf;
border-bottom:1px solid #a5a5a5;
background:#f7f7f7;
color:#525252;
}
._form ._type_date ._option input[type="text"] {
float:left;
width:100px;
}
._form ._type_radio ._option label {
display:inline;
font-size:14px;
font-weight:normal;
line-height:18px;
}
._form ._type_radio ._option label input[type="radio"] {
position:relative;
width:13px;
height:13px;
margin:-4px 0 0 1px;
cursor:pointer;
vertical-align:middle;
line-height:20px;
}
._form ._type_date ._option input[type="button"] {
width:24px;
height:24px;
margin:2px 0 0 5px;
padding:0;
border:none;
outline:none;
text-indent:-9999px;
}
._form ._field ._option select {
display:block;
margin:0;
padding:0;
width:auto;
font-size:14px;
border:1px solid #b6b6b6;
}
._form ._type_captcha img {
float:left;
width:42px;
height:24px;
margin:0 6px 0 0;
border:1px solid #b6b6b6;
}
._form ._type_captcha input[type="text"] {
float:left;
margin:0!important;
width:40%;
font-size:14px;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._field table {
margin:0;
padding:0;
border-collapse:collapse;
width:100%!important;
table-layout:fixed;
margin-bottom:18px;
font-size:13px!important;
border-collapse:collapse;
border-spacing:0;
}
._form ._field table td {
padding:0 10px 0 0!important;
line-height:18px;
text-align:left;
font-size:13px!important;
color:#606060;
}
._form ._type_input ._option table tbody#_forward_rcpt input {margin:0 0 4px 0; width:96%!important;}
._form ._type_input ._option table tbody#_forward_rcpt img.image_addrcpt {cursor:pointer;}
.form_errors{
text-align:center;
font-size:15px;
margin:10px;
color:#900;
font-family:Arial, Helvetica, sans-serif;
font-weight:bold;
margin-bottom:20px;
}
</style>
<form action='//something.com/proc.php' method='post' id='_form_37' accept-charset='utf-8' enctype='multipart/form-data'>
<input type='hidden' name='f' value='37'>
<input type='hidden' name='s' value=''>
<input type='hidden' name='c' value='0'>
<input type='hidden' name='m' value='0'>
<input type='hidden' name='act' value='sub'>
<input type='hidden' name='nlbox[]' value='6'>
<div class='_form'>
<div class='formwrapper'>
<div id='_field284'>
<div id='compile284' class='_field _type_input'>
<div class='_label '>
First Name
</div>
<div class='_option'>
<input type='text' name='field[6]' value=''>
</div>
</div>
</div>
<div id='_field272'>
<div id='compile272' class='_field _type_input'>
<div class='_label '>
Email *
</div>
<div class='_option'>
<input type='email' name='email' >
</div>
</div>
</div>
<div id='_field273'>
<div id='compile273' class='_field _type_input'>
<div class='_option'>
<input type='submit' value="Subscribe">
</div>
</div>
</div>
<div id='_field280'>
<div id='compile280' class='_field _type_hidden'>
<div class='_option'>
<input type='hidden' name='field[4]' value=''>
</div>
</div>
</div>
<div id='_field281'>
<div id='compile281' class='_field _type_hidden'>
<div class='_option'>
<input type='hidden' name='field[5]' value=''>
</div>
</div>
</div>
<div id='_field282'>
<div id='compile282' class='_field _type_hidden'>
<div class='_option'>
<input type='hidden' name='field[3]' value=''>
</div>
</div>
</div>
</div>
</div>
</form>
这就是我想要的,无需手动清理:
<form action='//something.com/proc.php' method='post' accept-charset='utf-8' enctype='multipart/form-data'>
<input type='hidden' name='f' value='37'>
<input type='hidden' name='s' value=''>
<input type='hidden' name='c' value='0'>
<input type='hidden' name='m' value='0'>
<input type='hidden' name='act' value='sub'>
<input type='hidden' name='nlbox[]' value='6'>
First Name
<input type='text' name='field[6]' value=''>
Email *
<input type='email' name='email' >
<input type='submit' value="Subscribe">
<input type='hidden' name='field[4]' value=''>
<input type='hidden' name='field[5]' value=''>
<input type='hidden' name='field[3]' value=''>
</form>
简单地使用 strip 标签似乎没问题,但它不会从标签中删除 css
我将示例可嵌入代码添加到 string.txt
$file = file_get_contents('string.txt', true);
echo '<textarea rows="50" cols="50">' . $file . '</textarea>';
$file = strip_tags($file, '<form><input>');
echo '<textarea rows="50" cols="80">' . $file . '</textarea>';
最后我在这方面取得了一些进展,但它并不完美,ID 仍在表单元素上,我预见到更多问题
$file = file_get_contents('string.txt', true);
function strip_html_tags( $text )
{
$text = preg_replace(
array(
// Remove invisible content
'@<head[^>]*?>.*?</head>@siu',
'@<style[^>]*?>.*?</style>@siu',
'@<script[^>]*?.*?</script>@siu',
'@<object[^>]*?.*?</object>@siu',
'@<embed[^>]*?.*?</embed>@siu',
'@<applet[^>]*?.*?</applet>@siu',
'@<noframes[^>]*?.*?</noframes>@siu',
'@<noscript[^>]*?.*?</noscript>@siu',
'@<noembed[^>]*?.*?</noembed>@siu',
),
array(
' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
"\n$0", "\n$0", "\n$0", "\n$0", "\n$0", "\n$0",
"\n$0", "\n$0",
),
$text );
return strip_tags( $text, '<form><input>' );
}
$newText = strip_html_tags($file);
echo '<textarea rows="50" cols="80">' . $newText . '</textarea>';
所以你想要这样的输出
<form action="//something.com/proc.php" method="post" id="_form_37" accept-charset="utf-8" enctype="multipart/form-data">
<input type="hidden" name="f" value="37"><input type="hidden" name="s" value=""><input type="hidden" name="c" value="0"><input type="hidden" name="m" value="0"><input type="hidden" name="act" value="sub"><input type="hidden" name="nlbox[]" value="6">
First Name
<input type="text" name="field[6]" value="">
Email *
<input type="email" name="email">
<input type="submit" value="Subscribe">
<input type="hidden" name="field[4]" value="">
<input type="hidden" name="field[5]" value="">
<input type="hidden" name="field[3]" value="">
</form>
如果那是对的,我想这样做就可以了:
$string = '<style>
._form {
position:relative;
background:#fff;
width:400px;/*F*/
padding:0!important;
text-align:left;
}
._form em {
color:#9a9a9a;
}
._form a {
margin-left:3px;
}
._form ._field,
._form ._field ._label,
._form ._type_radio,
._form ._type_checkbox,
._form ._type_captcha,
._form ._field table {
background:none;
}
._form ._field {
position:relative;
width:100%;
cursor:move;
font-style:normal;
margin:1.2em 0;
padding:0;
overflow:hidden;
}
._form ._field input[type="text"] {
width:100%;
padding:8px;
font-size:16px;
border:1px solid #b6b6b6;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._field ._label {
display:block;
margin:0 0 0.5em;
padding:0!important;
font-size:15px;
}
._form ._field ._option input[type="checkbox"],
._form ._field ._option input[type="radio"] {
position:relative;
width:13px;
height:13px;
margin:-4px 0 0 1px;
cursor:pointer;
vertical-align:middle;
}
._form ._field ._option input[type="submit"],
._form ._field ._option input[type="button"] {
margin:0;
cursor:pointer;
height:35px;
width:auto;
font-size:15px;
}
._form ._field ._option select {
display:block;
margin:0;
padding:0;
width:auto;
font-size:15px;
border:1px solid #b6b6b6;
}
._form ._type_radio ._option,
._form ._type_checkbox ._option {
font-size:13px;
font-weight:normal;
line-height:1.8;
}
._form ._type_date ._option input[type="text"] {
float:left;
width:100px;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._type_date ._option input[type="button"] {
width:37px;
height:36px;
margin-left:5px;
padding:20px;
border:none;
outline:none;
text-indent:-9999px;
}
._form ._type_captcha img {
float:left;
margin:0 6px 0 0;
width:70px;
height:33px;
border:1px solid #b6b6b6;
}
._form ._type_captcha input[type="text"] {
margin:-14px 0 0 0!important;
width:25%;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._field table {
width:100%!important;
}
._form ._field table tbody tr td {
width:50%!important;
font-size:15px;
}
._form {
width:265px;/*F*/
background:#fff;
color:#2c2c2c;
font-weight:normal;
}
._form #notice {
margin:10px 0 0 -3px!important;
padding:0;
color:#acacac;
font-size:11px;
font-family:helvetica,arial,sans-serif;
}
._form #notice a:link, ._form #notice a:visited {
color:#acacac;
text-decoration:underline;
}
._form ._field {
position:relative;
width:100%;
cursor:default;
font-style:normal;
margin:0 0 16px;
padding:0;
overflow:hidden;
}
._form ._field input[type="text"],
._form ._field input[type="email"] {
width:100%;
padding:4px;
font-size:14px;
background:#fafafa;
border:1px solid #c7c7c7;
border-top:1px solid #b6b6b6;
-webkit-border-radius:3px;
-moz-border-radius:3px;
border-radius:3px;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._field ._label {
margin:0 0 4px;
color:#2c2c2c;
font-size:13px;
font-family:helvetica,arial,sans-serif;
font-weight:700;
}
._form ._field ._option {
margin:0;
padding:0;
color:#2c2c2c;
font-size:13px;
font-family:helvetica,arial,sans-serif;
font-weight:normal;
line-height:20px;
}
._form ._type_header ._label {
width:100%;
font-style:normal;
font-size:16px!important;
line-height:20px;
color:#005698;
margin:0 0 5px!important;
padding:0 0 10px!important;
overflow:hidden;
border-bottom:1px solid #e0e0e0;
}
._form ._type_input ._option textarea{
width:97%!important;
background:#fafafa;
border:1px solid #c7c7c7;
border-top:1px solid #b6b6b6;
-webkit-border-radius:3px;
-moz-border-radius:3px;
border-radius:3px;
}
._form ._type_input ._option input[type="submit"],
._form ._type_input ._option input[type="button"] {
width:auto;
margin:10px 0 0!important;
padding:2px 15px!important;
cursor:pointer;
font-family:verdana,arial,sans-serif;
font-weight:700;
font-size:12px;
color:#3f3f3f;
background:#f7f7f7;
border:1px solid #999999;
border-bottom:1px solid #888888;
text-align:center;
}
._form ._type_input ._option input[type="submit"]:hover,
._form ._type_input ._option input[type="button"]:hover {
border:1px solid #afafaf;
border-bottom:1px solid #a5a5a5;
background:#f7f7f7;
color:#525252;
}
._form ._type_date ._option input[type="text"] {
float:left;
width:100px;
}
._form ._type_radio ._option label {
display:inline;
font-size:14px;
font-weight:normal;
line-height:18px;
}
._form ._type_radio ._option label input[type="radio"] {
position:relative;
width:13px;
height:13px;
margin:-4px 0 0 1px;
cursor:pointer;
vertical-align:middle;
line-height:20px;
}
._form ._type_date ._option input[type="button"] {
width:24px;
height:24px;
margin:2px 0 0 5px;
padding:0;
border:none;
outline:none;
text-indent:-9999px;
}
._form ._field ._option select {
display:block;
margin:0;
padding:0;
width:auto;
font-size:14px;
border:1px solid #b6b6b6;
}
._form ._type_captcha img {
float:left;
width:42px;
height:24px;
margin:0 6px 0 0;
border:1px solid #b6b6b6;
}
._form ._type_captcha input[type="text"] {
float:left;
margin:0!important;
width:40%;
font-size:14px;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._field table {
margin:0;
padding:0;
border-collapse:collapse;
width:100%!important;
table-layout:fixed;
margin-bottom:18px;
font-size:13px!important;
border-collapse:collapse;
border-spacing:0;
}
._form ._field table td {
padding:0 10px 0 0!important;
line-height:18px;
text-align:left;
font-size:13px!important;
color:#606060;
}
._form ._type_input ._option table tbody#_forward_rcpt input {margin:0 0 4px 0; width:96%!important;}
._form ._type_input ._option table tbody#_forward_rcpt img.image_addrcpt {cursor:pointer;}
.form_errors{
text-align:center;
font-size:15px;
margin:10px;
color:#900;
font-family:Arial, Helvetica, sans-serif;
font-weight:bold;
margin-bottom:20px;
}
</style>
<form action="//something.com/proc.php" method="post" id="_form_37" accept-charset="utf-8" enctype="multipart/form-data">
<input type="hidden" name="f" value="37">
<input type="hidden" name="s" value="">
<input type="hidden" name="c" value="0">
<input type="hidden" name="m" value="0">
<input type="hidden" name="act" value="sub">
<input type="hidden" name="nlbox[]" value="6">
<div class="_form">
<div class="formwrapper">
<div id="_field284">
<div id="compile284" class="_field _type_input">
<div class="_label ">
First Name
</div>
<div class="_option">
<input type="text" name="field[6]" value="">
</div>
</div>
</div>
<div id="_field272">
<div id="compile272" class="_field _type_input">
<div class="_label ">
Email *
</div>
<div class="_option">
<input type="email" name="email" >
</div>
</div>
</div>
<div id="_field273">
<div id="compile273" class="_field _type_input">
<div class="_option">
<input type="submit" value="Subscribe">
</div>
</div>
</div>
<div id="_field280">
<div id="compile280" class="_field _type_hidden">
<div class="_option">
<input type="hidden" name="field[4]" value="">
</div>
</div>
</div>
<div id="_field281">
<div id="compile281" class="_field _type_hidden">
<div class="_option">
<input type="hidden" name="field[5]" value="">
</div>
</div>
</div>
<div id="_field282">
<div id="compile282" class="_field _type_hidden">
<div class="_option">
<input type="hidden" name="field[3]" value="">
</div>
</div>
</div>
</div>
</div>
</form>';
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($string);
libxml_use_internal_errors(false);
$forms = $doc->getElementsByTagName('form');
foreach($forms as $form) {
echo preg_replace('~^\s+$~m', "", strip_tags($doc->saveHTML($form), '<form><input>'));
}
最好避免正则表达式 HTML/XML 除非有一致的模式(即便如此通常最好避免)。
更新:
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($string);
libxml_use_internal_errors(false);
$forms = $doc->getElementsByTagName('form');
foreach($forms as $form) {
$form->removeAttribute('id');
$form->removeAttribute('class');
foreach($form->getElementsByTagName('input') as $input) {
$input->removeAttribute('class');
$input->removeAttribute('id');
}
echo preg_replace('~^\s+$~m', "", strip_tags($doc->saveHTML($form), '<form><input>'));
}
我一直在四处寻找可以让我可靠地将一大块 HTML 代码剥离成裸露形式及其元素的东西。我需要一个解决方案来删除所有非表单元素,包括所有内容、类 和 id。我可以使用 JavaScript 或 PHP.
有人能给我指出正确的方向吗and/or 提供一些小的示例代码和建议来帮助我入门?
给你一些背景..
在大多数情况下,自动回复服务提供商会提供各种类型的可嵌入代码。由于我永远无法理解的原因,他们从不提供干净的表单代码。周围总是有一些丑陋的垃圾,然后必须清理它们。
这是来自响应程序服务的示例可嵌入代码
<style>
._form {
position:relative;
background:#fff;
width:400px;/*F*/
padding:0!important;
text-align:left;
}
._form em {
color:#9a9a9a;
}
._form a {
margin-left:3px;
}
._form ._field,
._form ._field ._label,
._form ._type_radio,
._form ._type_checkbox,
._form ._type_captcha,
._form ._field table {
background:none;
}
._form ._field {
position:relative;
width:100%;
cursor:move;
font-style:normal;
margin:1.2em 0;
padding:0;
overflow:hidden;
}
._form ._field input[type="text"] {
width:100%;
padding:8px;
font-size:16px;
border:1px solid #b6b6b6;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._field ._label {
display:block;
margin:0 0 0.5em;
padding:0!important;
font-size:15px;
}
._form ._field ._option input[type="checkbox"],
._form ._field ._option input[type="radio"] {
position:relative;
width:13px;
height:13px;
margin:-4px 0 0 1px;
cursor:pointer;
vertical-align:middle;
}
._form ._field ._option input[type="submit"],
._form ._field ._option input[type="button"] {
margin:0;
cursor:pointer;
height:35px;
width:auto;
font-size:15px;
}
._form ._field ._option select {
display:block;
margin:0;
padding:0;
width:auto;
font-size:15px;
border:1px solid #b6b6b6;
}
._form ._type_radio ._option,
._form ._type_checkbox ._option {
font-size:13px;
font-weight:normal;
line-height:1.8;
}
._form ._type_date ._option input[type="text"] {
float:left;
width:100px;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._type_date ._option input[type="button"] {
width:37px;
height:36px;
margin-left:5px;
padding:20px;
border:none;
outline:none;
text-indent:-9999px;
}
._form ._type_captcha img {
float:left;
margin:0 6px 0 0;
width:70px;
height:33px;
border:1px solid #b6b6b6;
}
._form ._type_captcha input[type="text"] {
margin:-14px 0 0 0!important;
width:25%;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._field table {
width:100%!important;
}
._form ._field table tbody tr td {
width:50%!important;
font-size:15px;
}
._form {
width:265px;/*F*/
background:#fff;
color:#2c2c2c;
font-weight:normal;
}
._form #notice {
margin:10px 0 0 -3px!important;
padding:0;
color:#acacac;
font-size:11px;
font-family:helvetica,arial,sans-serif;
}
._form #notice a:link, ._form #notice a:visited {
color:#acacac;
text-decoration:underline;
}
._form ._field {
position:relative;
width:100%;
cursor:default;
font-style:normal;
margin:0 0 16px;
padding:0;
overflow:hidden;
}
._form ._field input[type="text"],
._form ._field input[type="email"] {
width:100%;
padding:4px;
font-size:14px;
background:#fafafa;
border:1px solid #c7c7c7;
border-top:1px solid #b6b6b6;
-webkit-border-radius:3px;
-moz-border-radius:3px;
border-radius:3px;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._field ._label {
margin:0 0 4px;
color:#2c2c2c;
font-size:13px;
font-family:helvetica,arial,sans-serif;
font-weight:700;
}
._form ._field ._option {
margin:0;
padding:0;
color:#2c2c2c;
font-size:13px;
font-family:helvetica,arial,sans-serif;
font-weight:normal;
line-height:20px;
}
._form ._type_header ._label {
width:100%;
font-style:normal;
font-size:16px!important;
line-height:20px;
color:#005698;
margin:0 0 5px!important;
padding:0 0 10px!important;
overflow:hidden;
border-bottom:1px solid #e0e0e0;
}
._form ._type_input ._option textarea{
width:97%!important;
background:#fafafa;
border:1px solid #c7c7c7;
border-top:1px solid #b6b6b6;
-webkit-border-radius:3px;
-moz-border-radius:3px;
border-radius:3px;
}
._form ._type_input ._option input[type="submit"],
._form ._type_input ._option input[type="button"] {
width:auto;
margin:10px 0 0!important;
padding:2px 15px!important;
cursor:pointer;
font-family:verdana,arial,sans-serif;
font-weight:700;
font-size:12px;
color:#3f3f3f;
background:#f7f7f7;
border:1px solid #999999;
border-bottom:1px solid #888888;
text-align:center;
}
._form ._type_input ._option input[type="submit"]:hover,
._form ._type_input ._option input[type="button"]:hover {
border:1px solid #afafaf;
border-bottom:1px solid #a5a5a5;
background:#f7f7f7;
color:#525252;
}
._form ._type_date ._option input[type="text"] {
float:left;
width:100px;
}
._form ._type_radio ._option label {
display:inline;
font-size:14px;
font-weight:normal;
line-height:18px;
}
._form ._type_radio ._option label input[type="radio"] {
position:relative;
width:13px;
height:13px;
margin:-4px 0 0 1px;
cursor:pointer;
vertical-align:middle;
line-height:20px;
}
._form ._type_date ._option input[type="button"] {
width:24px;
height:24px;
margin:2px 0 0 5px;
padding:0;
border:none;
outline:none;
text-indent:-9999px;
}
._form ._field ._option select {
display:block;
margin:0;
padding:0;
width:auto;
font-size:14px;
border:1px solid #b6b6b6;
}
._form ._type_captcha img {
float:left;
width:42px;
height:24px;
margin:0 6px 0 0;
border:1px solid #b6b6b6;
}
._form ._type_captcha input[type="text"] {
float:left;
margin:0!important;
width:40%;
font-size:14px;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._field table {
margin:0;
padding:0;
border-collapse:collapse;
width:100%!important;
table-layout:fixed;
margin-bottom:18px;
font-size:13px!important;
border-collapse:collapse;
border-spacing:0;
}
._form ._field table td {
padding:0 10px 0 0!important;
line-height:18px;
text-align:left;
font-size:13px!important;
color:#606060;
}
._form ._type_input ._option table tbody#_forward_rcpt input {margin:0 0 4px 0; width:96%!important;}
._form ._type_input ._option table tbody#_forward_rcpt img.image_addrcpt {cursor:pointer;}
.form_errors{
text-align:center;
font-size:15px;
margin:10px;
color:#900;
font-family:Arial, Helvetica, sans-serif;
font-weight:bold;
margin-bottom:20px;
}
</style>
<form action='//something.com/proc.php' method='post' id='_form_37' accept-charset='utf-8' enctype='multipart/form-data'>
<input type='hidden' name='f' value='37'>
<input type='hidden' name='s' value=''>
<input type='hidden' name='c' value='0'>
<input type='hidden' name='m' value='0'>
<input type='hidden' name='act' value='sub'>
<input type='hidden' name='nlbox[]' value='6'>
<div class='_form'>
<div class='formwrapper'>
<div id='_field284'>
<div id='compile284' class='_field _type_input'>
<div class='_label '>
First Name
</div>
<div class='_option'>
<input type='text' name='field[6]' value=''>
</div>
</div>
</div>
<div id='_field272'>
<div id='compile272' class='_field _type_input'>
<div class='_label '>
Email *
</div>
<div class='_option'>
<input type='email' name='email' >
</div>
</div>
</div>
<div id='_field273'>
<div id='compile273' class='_field _type_input'>
<div class='_option'>
<input type='submit' value="Subscribe">
</div>
</div>
</div>
<div id='_field280'>
<div id='compile280' class='_field _type_hidden'>
<div class='_option'>
<input type='hidden' name='field[4]' value=''>
</div>
</div>
</div>
<div id='_field281'>
<div id='compile281' class='_field _type_hidden'>
<div class='_option'>
<input type='hidden' name='field[5]' value=''>
</div>
</div>
</div>
<div id='_field282'>
<div id='compile282' class='_field _type_hidden'>
<div class='_option'>
<input type='hidden' name='field[3]' value=''>
</div>
</div>
</div>
</div>
</div>
</form>
这就是我想要的,无需手动清理:
<form action='//something.com/proc.php' method='post' accept-charset='utf-8' enctype='multipart/form-data'>
<input type='hidden' name='f' value='37'>
<input type='hidden' name='s' value=''>
<input type='hidden' name='c' value='0'>
<input type='hidden' name='m' value='0'>
<input type='hidden' name='act' value='sub'>
<input type='hidden' name='nlbox[]' value='6'>
First Name
<input type='text' name='field[6]' value=''>
Email *
<input type='email' name='email' >
<input type='submit' value="Subscribe">
<input type='hidden' name='field[4]' value=''>
<input type='hidden' name='field[5]' value=''>
<input type='hidden' name='field[3]' value=''>
</form>
简单地使用 strip 标签似乎没问题,但它不会从标签中删除 css
我将示例可嵌入代码添加到 string.txt
$file = file_get_contents('string.txt', true);
echo '<textarea rows="50" cols="50">' . $file . '</textarea>';
$file = strip_tags($file, '<form><input>');
echo '<textarea rows="50" cols="80">' . $file . '</textarea>';
最后我在这方面取得了一些进展,但它并不完美,ID 仍在表单元素上,我预见到更多问题
$file = file_get_contents('string.txt', true);
function strip_html_tags( $text )
{
$text = preg_replace(
array(
// Remove invisible content
'@<head[^>]*?>.*?</head>@siu',
'@<style[^>]*?>.*?</style>@siu',
'@<script[^>]*?.*?</script>@siu',
'@<object[^>]*?.*?</object>@siu',
'@<embed[^>]*?.*?</embed>@siu',
'@<applet[^>]*?.*?</applet>@siu',
'@<noframes[^>]*?.*?</noframes>@siu',
'@<noscript[^>]*?.*?</noscript>@siu',
'@<noembed[^>]*?.*?</noembed>@siu',
),
array(
' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
"\n$0", "\n$0", "\n$0", "\n$0", "\n$0", "\n$0",
"\n$0", "\n$0",
),
$text );
return strip_tags( $text, '<form><input>' );
}
$newText = strip_html_tags($file);
echo '<textarea rows="50" cols="80">' . $newText . '</textarea>';
所以你想要这样的输出
<form action="//something.com/proc.php" method="post" id="_form_37" accept-charset="utf-8" enctype="multipart/form-data">
<input type="hidden" name="f" value="37"><input type="hidden" name="s" value=""><input type="hidden" name="c" value="0"><input type="hidden" name="m" value="0"><input type="hidden" name="act" value="sub"><input type="hidden" name="nlbox[]" value="6">
First Name
<input type="text" name="field[6]" value="">
Email *
<input type="email" name="email">
<input type="submit" value="Subscribe">
<input type="hidden" name="field[4]" value="">
<input type="hidden" name="field[5]" value="">
<input type="hidden" name="field[3]" value="">
</form>
如果那是对的,我想这样做就可以了:
$string = '<style>
._form {
position:relative;
background:#fff;
width:400px;/*F*/
padding:0!important;
text-align:left;
}
._form em {
color:#9a9a9a;
}
._form a {
margin-left:3px;
}
._form ._field,
._form ._field ._label,
._form ._type_radio,
._form ._type_checkbox,
._form ._type_captcha,
._form ._field table {
background:none;
}
._form ._field {
position:relative;
width:100%;
cursor:move;
font-style:normal;
margin:1.2em 0;
padding:0;
overflow:hidden;
}
._form ._field input[type="text"] {
width:100%;
padding:8px;
font-size:16px;
border:1px solid #b6b6b6;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._field ._label {
display:block;
margin:0 0 0.5em;
padding:0!important;
font-size:15px;
}
._form ._field ._option input[type="checkbox"],
._form ._field ._option input[type="radio"] {
position:relative;
width:13px;
height:13px;
margin:-4px 0 0 1px;
cursor:pointer;
vertical-align:middle;
}
._form ._field ._option input[type="submit"],
._form ._field ._option input[type="button"] {
margin:0;
cursor:pointer;
height:35px;
width:auto;
font-size:15px;
}
._form ._field ._option select {
display:block;
margin:0;
padding:0;
width:auto;
font-size:15px;
border:1px solid #b6b6b6;
}
._form ._type_radio ._option,
._form ._type_checkbox ._option {
font-size:13px;
font-weight:normal;
line-height:1.8;
}
._form ._type_date ._option input[type="text"] {
float:left;
width:100px;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._type_date ._option input[type="button"] {
width:37px;
height:36px;
margin-left:5px;
padding:20px;
border:none;
outline:none;
text-indent:-9999px;
}
._form ._type_captcha img {
float:left;
margin:0 6px 0 0;
width:70px;
height:33px;
border:1px solid #b6b6b6;
}
._form ._type_captcha input[type="text"] {
margin:-14px 0 0 0!important;
width:25%;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._field table {
width:100%!important;
}
._form ._field table tbody tr td {
width:50%!important;
font-size:15px;
}
._form {
width:265px;/*F*/
background:#fff;
color:#2c2c2c;
font-weight:normal;
}
._form #notice {
margin:10px 0 0 -3px!important;
padding:0;
color:#acacac;
font-size:11px;
font-family:helvetica,arial,sans-serif;
}
._form #notice a:link, ._form #notice a:visited {
color:#acacac;
text-decoration:underline;
}
._form ._field {
position:relative;
width:100%;
cursor:default;
font-style:normal;
margin:0 0 16px;
padding:0;
overflow:hidden;
}
._form ._field input[type="text"],
._form ._field input[type="email"] {
width:100%;
padding:4px;
font-size:14px;
background:#fafafa;
border:1px solid #c7c7c7;
border-top:1px solid #b6b6b6;
-webkit-border-radius:3px;
-moz-border-radius:3px;
border-radius:3px;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._field ._label {
margin:0 0 4px;
color:#2c2c2c;
font-size:13px;
font-family:helvetica,arial,sans-serif;
font-weight:700;
}
._form ._field ._option {
margin:0;
padding:0;
color:#2c2c2c;
font-size:13px;
font-family:helvetica,arial,sans-serif;
font-weight:normal;
line-height:20px;
}
._form ._type_header ._label {
width:100%;
font-style:normal;
font-size:16px!important;
line-height:20px;
color:#005698;
margin:0 0 5px!important;
padding:0 0 10px!important;
overflow:hidden;
border-bottom:1px solid #e0e0e0;
}
._form ._type_input ._option textarea{
width:97%!important;
background:#fafafa;
border:1px solid #c7c7c7;
border-top:1px solid #b6b6b6;
-webkit-border-radius:3px;
-moz-border-radius:3px;
border-radius:3px;
}
._form ._type_input ._option input[type="submit"],
._form ._type_input ._option input[type="button"] {
width:auto;
margin:10px 0 0!important;
padding:2px 15px!important;
cursor:pointer;
font-family:verdana,arial,sans-serif;
font-weight:700;
font-size:12px;
color:#3f3f3f;
background:#f7f7f7;
border:1px solid #999999;
border-bottom:1px solid #888888;
text-align:center;
}
._form ._type_input ._option input[type="submit"]:hover,
._form ._type_input ._option input[type="button"]:hover {
border:1px solid #afafaf;
border-bottom:1px solid #a5a5a5;
background:#f7f7f7;
color:#525252;
}
._form ._type_date ._option input[type="text"] {
float:left;
width:100px;
}
._form ._type_radio ._option label {
display:inline;
font-size:14px;
font-weight:normal;
line-height:18px;
}
._form ._type_radio ._option label input[type="radio"] {
position:relative;
width:13px;
height:13px;
margin:-4px 0 0 1px;
cursor:pointer;
vertical-align:middle;
line-height:20px;
}
._form ._type_date ._option input[type="button"] {
width:24px;
height:24px;
margin:2px 0 0 5px;
padding:0;
border:none;
outline:none;
text-indent:-9999px;
}
._form ._field ._option select {
display:block;
margin:0;
padding:0;
width:auto;
font-size:14px;
border:1px solid #b6b6b6;
}
._form ._type_captcha img {
float:left;
width:42px;
height:24px;
margin:0 6px 0 0;
border:1px solid #b6b6b6;
}
._form ._type_captcha input[type="text"] {
float:left;
margin:0!important;
width:40%;
font-size:14px;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
._form ._field table {
margin:0;
padding:0;
border-collapse:collapse;
width:100%!important;
table-layout:fixed;
margin-bottom:18px;
font-size:13px!important;
border-collapse:collapse;
border-spacing:0;
}
._form ._field table td {
padding:0 10px 0 0!important;
line-height:18px;
text-align:left;
font-size:13px!important;
color:#606060;
}
._form ._type_input ._option table tbody#_forward_rcpt input {margin:0 0 4px 0; width:96%!important;}
._form ._type_input ._option table tbody#_forward_rcpt img.image_addrcpt {cursor:pointer;}
.form_errors{
text-align:center;
font-size:15px;
margin:10px;
color:#900;
font-family:Arial, Helvetica, sans-serif;
font-weight:bold;
margin-bottom:20px;
}
</style>
<form action="//something.com/proc.php" method="post" id="_form_37" accept-charset="utf-8" enctype="multipart/form-data">
<input type="hidden" name="f" value="37">
<input type="hidden" name="s" value="">
<input type="hidden" name="c" value="0">
<input type="hidden" name="m" value="0">
<input type="hidden" name="act" value="sub">
<input type="hidden" name="nlbox[]" value="6">
<div class="_form">
<div class="formwrapper">
<div id="_field284">
<div id="compile284" class="_field _type_input">
<div class="_label ">
First Name
</div>
<div class="_option">
<input type="text" name="field[6]" value="">
</div>
</div>
</div>
<div id="_field272">
<div id="compile272" class="_field _type_input">
<div class="_label ">
Email *
</div>
<div class="_option">
<input type="email" name="email" >
</div>
</div>
</div>
<div id="_field273">
<div id="compile273" class="_field _type_input">
<div class="_option">
<input type="submit" value="Subscribe">
</div>
</div>
</div>
<div id="_field280">
<div id="compile280" class="_field _type_hidden">
<div class="_option">
<input type="hidden" name="field[4]" value="">
</div>
</div>
</div>
<div id="_field281">
<div id="compile281" class="_field _type_hidden">
<div class="_option">
<input type="hidden" name="field[5]" value="">
</div>
</div>
</div>
<div id="_field282">
<div id="compile282" class="_field _type_hidden">
<div class="_option">
<input type="hidden" name="field[3]" value="">
</div>
</div>
</div>
</div>
</div>
</form>';
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($string);
libxml_use_internal_errors(false);
$forms = $doc->getElementsByTagName('form');
foreach($forms as $form) {
echo preg_replace('~^\s+$~m', "", strip_tags($doc->saveHTML($form), '<form><input>'));
}
最好避免正则表达式 HTML/XML 除非有一致的模式(即便如此通常最好避免)。
更新:
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($string);
libxml_use_internal_errors(false);
$forms = $doc->getElementsByTagName('form');
foreach($forms as $form) {
$form->removeAttribute('id');
$form->removeAttribute('class');
foreach($form->getElementsByTagName('input') as $input) {
$input->removeAttribute('class');
$input->removeAttribute('id');
}
echo preg_replace('~^\s+$~m', "", strip_tags($doc->saveHTML($form), '<form><input>'));
}