如何转换类似于wordpress的简码？

Question

所以我们的用户写文章，为了嵌入 PDF，他们生成一个短代码，所以他们不必知道如何编写 iframes/html5 个对象。

原始字符串如下所示：

$string= "Hello, view this pdf [PDF='hello.pdf'] 
and then view this PDF [PDF='goodmorning.pdf']";

需要输出

 ' Hello, view this pdf  <object data="https://myurl.com/media/hello.pdf" class="pdf-shortcode-
  iframe" type="application/pdf">
  <iframe class="pdf-shortcode-iframe" src="https://docs.google.com/viewer?
  url=https://myurl.com/media/hello.pdf&embedded=true"></iframe>
   </object>
   and then view this PDF
  <object data="https://myurl.com/media/goodmorning.pdf" class="pdf-
    shortcode-iframe" type="application/pdf">
   <iframe class="pdf-shortcode-iframe" src="https://docs.google.com/viewer?
   url=https://myurl.com/media/hello.pdf&embedded=true"></iframe>
    </object>

我尝试运行对简码的“[PDF='”和“']”部分进行字符串替换，但是因为每个简码都需要替换为其中的对象和 iframe，所以它似乎不可能。

Answer 1

那根本不是一个非常复杂的正则表达式....:-)

$re = '/(.*?)(\[\w{3}=\')(\w+\.\w+)(\'\])(.*?)(\[\w{3}=\')(\w+\.\w+).*/s';

$str = 'Hello, view this pdf [PDF=\'hello.pdf\'] 
and then view this PDF [PDF=\'goodmorning.pdf\']';


$subst = ' <object data="https://myurl.com/media/"  class="pdf-shortcode-iframe" type="application/pdf">  <iframe class="pdf-shortcode-iframe" src="https://docs.google.com/viewer?url=https://myurl.com/media/&embedded=true"></iframe>   </object> <object data="https://myurl.com/media/" class="pdf-shortcode-iframe" type="application/pdf">   <iframe class="pdf-shortcode-iframe" src="https://docs.google.com/viewer?url=https://myurl.com/media/&embedded=true"></iframe>    </object>';

$result = preg_replace($re, $subst, $str);

echo $result;

https://3v4l.org/9oIpv
或者在 regex101 https://regex101.com/r/MZV6ym/1

我唯一能给出的真正解释是它匹配消息的所有部分并用你想要的替换它。
我在这里只注意到这部分的三个字母扩展名：(\[\w{3}=\')（请记住正则表达式中有两个，以防你想更改它）它可以更改为 {3,4} 以防万一您想要包含三个和四个字母的扩展名。
或者你可以让它成为 \w+ 它会匹配所有，但这可能意味着它会选择错误的匹配项。

Edit; sorry didn't notice that regex101 code generator escaped the " automatically. I had to first remove my escaping, and then I noticed no escaping at all was needed, so I had to remove another escape.

编辑2；我将尝试更好地解释正则表达式，因为您写道您不知道正则表达式。

Preg_replace 需要一个模式或也称为正则表达式 $re、一个替换模式 $subst 和一个输入字符串 $str.
正则表达式包含有关查找内容的说明，因此我可以查找 word.word 而不是查找 "hello.pdf"。或者只有数字等
当您需要学习计算机如何阅读人类可以轻松找到的复杂文本时，正则表达式非常有用。

我使用的模式是：

/ is delimiter that must be used in regex, you can use ~, #, + and some more.
() The paranthese means capture as in save this. And in this regex there is a few needed.
(.*?) capture anything that is zero or more in lenght. The ? means be lazy and stop as soon as the next pattern is true.
(\[\w{3}=\') Capture [ and a word with three letters a = and '. I need to escape some signs as they are part of regex patterns
(\w+\.\w+) this captures a word of one or more letters followed by a dot and a word again of one or more letters.
(\'\]) Capture the ' and ]. I only do this so that I can filter them out of the result string.
(.*?) again capture anything that is zero or more in lenght. This is to capture the second line starting with "and then"

And then there is a repetition of the finding file name.

/ end delimiter 
s this is a setting to make a dot match a new line (multi line setting)

替换更容易理解。
模式中的所有捕获 () 从左到右编号。
$0 是完整的原始字符串，所以这个更多供参考。
$1 是第一个捕获，使用的模式等于 (.*?) 或 "Hello,..."
$2 将保留第二个捕获= "[PDF=\'"

等等。

因此，通过这些捕获，您可以构建替换字符串。

作为最后的说明。
看起来正则表达式是完美的并且适用于所有内容，但是请。尽量少用。
在 SO 上，似乎 "everyone" 将 regex 用于最简单的任务，而这不是 regex 的本意。
当您不知道要查找的 "number" 是 1 还是 1000 时，正则表达式适用于复杂的 none 静态字符串（例如您的字符串）。或者如果您要查找的词是在位置 4 或 50，它的长度是多少。
那是正则表达式最有效的时候。

Regex 与常规 php 相比需要很长时间并且使用更多内存。
这就是为什么我的意思是应该为那些特殊场合保存正则表达式。

如何转换类似于wordpress的简码？

How to convert shortcodes similar to wordpress?

php

str-replace