HTML 文档中文本替换的正则表达式更正

Regex correction for text replacement in HTML document

我有以下正则表达式:

/<(?:textarea|select)[\s\S]*?>[\s\S]*?(\{\{\{variable:(.+?)\}\}\})[\s\S]*?<\/(?:textarea|select)>|<(?:input)[\s\S]+?(value=[\s\S]+?)(\{\{\{variable:(.+?)\}\}\})[\s\S]+?>|(\{\{\{variable:(.+?)\}\}\})/im

还有这个(缩短的)HTML 文档:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <title>Test</title>
</head>
<body>
    <section id="about">
        <div class="container about-container">
            <div class="row">
                <div class="col-md-12">
                    {{{block:welcome-intro}}}
                </div>
            </div>
        </div>
    </section>
    <section id="services">
        <div class="container">
            <div class="row">
                <div class="col-md-12">
                                        <p>You are using system version: {{{variable:system_version}}}</p>
                    <p>Your address: {{{variable:contact-email-address}}}</p>
                    <form action="http://k.loc/content/view/welcome"  class="default-form" enctype="multipart/form-data" method="post" accept-charset="utf-8">
                                                                                    <input type="hidden" name="csrfkcmstoken" value="94ee71ada809b9a79d1b723c81020c78" />

                        <div class="row">
                            <div class="col-sm-12 form-error"></div>
                        </div>
                    <div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>Personal information</legend><div class="row"><div class="col-sm-12">
                    <div class="control-label">
                        <label for="testinput">Name<span class="form-validation-required"> * </span></label>

                    </div>
                <div class="hint-text">Enter at least 2 characters and a maximum of 12 characters.</div><input id="testinput" name="testinput" placeholder="Enter your name here." class="input-group width-50" type="text" value="{{{variable:system_name}}}  {{{variable:system_login}}}"><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div><div class="row"><div class="col-sm-12">
                    <div class="control-label">
                        <label for="testpassword">Password</label>

                    </div>
                <div class="hint-text">Your password must be at least 12 characters long, contain 1 special character, 1 nunber, 1 lower case character and 1 upper case character.</div><input id="testpassword" name="testpassword" placeholder="Enter your password here." class="input-group width-50" type="password"><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div></fieldset></div></div><div class="row"><div class="col-sm-12"><fieldset id="bioinfo"><legend>Biographical information</legend><div class="row"><div class="col-sm-12">
                    <div class="control-label">
                        <label for="testtextarea">Biography</label>
                <span class="hint-text">A minimum of 40 characters and a maximum of 255 is allowed. This hint is displayed inline.</span>
                    </div>
                <textarea id="testtextarea" name="testtextarea" placeholder="Please enter your biography here." class="input-group-wide width-100" rows="5" cols="80">{{{variable:system_name}}}

{{{variable:system_login}}}</textarea><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div><div class="row"><div class="col-sm-12">
                    <div class="control-label">
                        <label for="testsummernote">Interests</label>
                <span class="hint-text">A minimum of 40 characters is required. This hint is displayed inline.</span>
                    </div>
                <textarea id="testsummernote" name="testsummernote" class="wysiwyg-editor" placeholder="Please enter your interests here."><p>{{{variable:system_name}}}<br></p><p>{{{variable:system_login}}}</p><p>{{{variable:activate_url}}}<br></p></textarea></div></div></fieldset></div></div><div class="row"><div class="col-sm-12"><button name="testsubmit" id="testsubmit" type="submit" class="btn primary">Submit<i class="zmdi zmdi-arrow-forward"></i></button></div></div>
        </form>                </div>
            </div>
        </div>
    </section>
</body>
</html>

解析上述 HTML 文档以查找 {{{variable:whatever}}} 产生此结果:

Array
(
    [0] => Array
        (
            [0] => {{{variable:system_version}}}
            [1] => {{{variable:contact-email-address}}}
            [2] => <input type="hidden" name="csrfkcmstoken" value="94ee71ada809b9a79d1b723c81020c78" />
                   <div class="row"><div class="col-sm-12 form-error"></div></div>
                   <div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>Personal information</legend><div class="row"><div class="col-sm-12">
                   <div class="control-label"><label for="testinput">Name<span class="form-validation-required"> * </span></label></div>
                   <div class="hint-text">Enter at least 2 characters and a maximum of 12 characters.</div>
                   <input id="testinput" name="testinput" placeholder="Enter your name here." class="input-group width-50" type="text" value="{{{variable:system_name}}}  {{{variable:system_login}}}">
            [3] => <textarea id="testtextarea" name="testtextarea" placeholder="Please enter your biography here." class="input-group-wide width-100" rows="5" cols="80">{{{variable:system_name}}} {{{variable:system_login}}}</textarea>
            [4] => <textarea id="testsummernote" name="testsummernote" class="wysiwyg-editor" placeholder="Please enter your interests here."><p>{{{variable:system_name}}}<br></p><p>{{{variable:system_login}}}</p><p>{{{variable:activate_url}}}<br></p></textarea>
        )
)

我正在学习正则表达式,但仍然不理解所有概念,但我正在变得更好,所以如果我的术语有误,请原谅,但它确实出现了某种贪婪匹配。我希望只在索引 [2].

处看到 <input id="testinput"...{{{variable:...}}}">

最终目标是只用不同的数据替换这些占位符,如果它们不在 textarea/select/input。

为什么索引 [2] 会匹配这么多元素,如何解决?

这是不受欢迎的,但我猜这个表达方式可能更接近您的想法,但不太确定:

<(?:textarea|select).*?>.*?(\{\{\{variable:(.*?)\}\}\}).*?<\/(?:textarea|select)>|<(?:input).+?(value=.*?)(\{\{\{variable:(.+?)\}\}\})?.*?>|(\{\{\{variable:(.*?)\}\}\})

还可以改进,比如不需要转义:

<(?:textarea|select).*?>.*?({{{variable:(.*?)}}}).*?</(?:textarea|select)>|<(?:input).+?(value=.*?)({{{variable:(.+?)}}})?.*?>|({{{variable:(.*?)}}}) 

在这里,我们将尝试为我们的 input 元素添加一个可选组,以便它可以区分具有和不具有现有变量的元素。

Demo

测试

$re = '/<(?:textarea|select).*?>.*?(\{\{\{variable:(.*?)\}\}\}).*?<\/(?:textarea|select)>|<(?:input).+?(value=.*?)(\{\{\{variable:(.+?)\}\}\})?.*?>|(\{\{\{variable:(.*?)\}\}\})/si';
$str = '<section id="services">
        <div class="container">
            <div class="row">
                <div class="col-md-12">
                                        <p>You are using system version: {{{variable:system_version}}}</p>
                    <p>Your address: {{{variable:contact-email-address}}}</p>
                    <form action="http://k.loc/content/view/welcome"  class="default-form" enctype="multipart/form-data" method="post" accept-charset="utf-8">
                                                                                    <input type="hidden" name="csrfkcmstoken" value="94ee71ada809b9a79d1b723c81020c78" />

                        <div class="row">
                            <div class="col-sm-12 form-error"></div>
                        </div>
                    <div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>Personal information</legend><div class="row"><div class="col-sm-12">
                    <div class="control-label">';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

var_dump($matches);