PHP - 修改字符串文本上图像的绝对 path/URL
PHP - Modify absolute path/URL of images on a string text
我正在尝试将旧博文(基于 WP)迁移到新平台。其中一个步骤定义为:
- 获取 full_text 个帖子
- 搜索完整 path/url 旧图像的存在(让我们设置 https://whosebug.com/uploads/logo.png 或只是 uploads/logo。 png)
- Extract/save 并获取新图像的 guid()
- 将旧路径https://whosebug.com/uploads/logo.png切换到新路径(让我们看看https://quora.[=52= .png
我尝试使用正则表达式来搜索旧网址:
/(http:\/\/Whosebug\.com\/uploads\/)+(.*?)[a-zA-Z0-9]+(\.jpg|\.png|\.gif)/
然后尝试:
$old = array();
$pattern = "/(https:|http:\/\/Whosebug\.com\/uploads\/)+(.*?)[a-zA-Z0-9]+(\.jpg|\.png|\.gif)/";
$text = "orem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor <img src='https://whosebug.com/uploads/image1.png'/> rem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor <img src='https://whosebug.com/uploads/image2.png'/>";
// seatch and get old urls
preg_match_all($pattern, $text, $old);
但我是这样的:
array(4) {
[0]=>
array(2) {
[0]=>
string(44) "https://whosebug.com/uploads/image1.png"
[1]=>
string(44) "https://whosebug.com/uploads/image2.png"
}
[1]=>
array(2) {
[0]=>
string(6) "https:"
[1]=>
string(6) "https:"
}
[2]=>
array(2) {
[0]=>
string(28) "//whosebug.com/uploads/"
[1]=>
string(28) "//whosebug.com/uploads/"
}
[3]=>
array(2) {
[0]=>
string(4) ".png"
[1]=>
string(4) ".png"
}
}
我认为这个正则表达式会做得更好一点:
#\b((?:https?://Whosebug\.com/)?uploads/(.*?\.(?:jpg|png|gif)))\b#
我简化了你的一些(例如,将 https:|http:
替换为 https?:
),还删除了看起来不必要的 [a-zA-Z0-9]+
。我还改进了分组,使一些非捕获:
新代码(注意我添加了一个额外的图像参考用于测试):
$old = array();
$pattern = "#\b((?:https?://Whosebug\.com/)?uploads/(.*?\.(?:jpg|png|gif)))\b#";
$text = "orem uploads/xyx.gif ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor <img src='https://whosebug.com/uploads/image1.png'/> rem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor <img src='https://whosebug.com/uploads/image2.png'/>";
// seatch and get old urls
preg_match_all($pattern, $text, $old);
print_r($old);
输出:
Array
(
[0] => Array
(
[0] => uploads/xyx.gif
[1] => https://whosebug.com/uploads/image1.png
[2] => https://whosebug.com/uploads/image2.png
)
[1] => Array
(
[0] => uploads/xyx.gif
[1] => https://whosebug.com/uploads/image1.png
[2] => https://whosebug.com/uploads/image2.png
)
[2] => Array
(
[0] => xyx.gif
[1] => image1.png
[2] => image2.png
)
)
如果你想坚持图像名称只包含 [a-zA-Z0-9]
然后将 .*?
更改为 [a-zA-Z0-9]+
即
$pattern = "#\b((?:https?://Whosebug\.com/)?uploads/([a-zA-Z0-9]+\.(?:jpg|png|gif)))\b#";
我正在尝试将旧博文(基于 WP)迁移到新平台。其中一个步骤定义为:
- 获取 full_text 个帖子
- 搜索完整 path/url 旧图像的存在(让我们设置 https://whosebug.com/uploads/logo.png 或只是 uploads/logo。 png)
- Extract/save 并获取新图像的 guid()
- 将旧路径https://whosebug.com/uploads/logo.png切换到新路径(让我们看看https://quora.[=52= .png
我尝试使用正则表达式来搜索旧网址:
/(http:\/\/Whosebug\.com\/uploads\/)+(.*?)[a-zA-Z0-9]+(\.jpg|\.png|\.gif)/
然后尝试:
$old = array();
$pattern = "/(https:|http:\/\/Whosebug\.com\/uploads\/)+(.*?)[a-zA-Z0-9]+(\.jpg|\.png|\.gif)/";
$text = "orem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor <img src='https://whosebug.com/uploads/image1.png'/> rem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor <img src='https://whosebug.com/uploads/image2.png'/>";
// seatch and get old urls
preg_match_all($pattern, $text, $old);
但我是这样的:
array(4) {
[0]=>
array(2) {
[0]=>
string(44) "https://whosebug.com/uploads/image1.png"
[1]=>
string(44) "https://whosebug.com/uploads/image2.png"
}
[1]=>
array(2) {
[0]=>
string(6) "https:"
[1]=>
string(6) "https:"
}
[2]=>
array(2) {
[0]=>
string(28) "//whosebug.com/uploads/"
[1]=>
string(28) "//whosebug.com/uploads/"
}
[3]=>
array(2) {
[0]=>
string(4) ".png"
[1]=>
string(4) ".png"
}
}
我认为这个正则表达式会做得更好一点:
#\b((?:https?://Whosebug\.com/)?uploads/(.*?\.(?:jpg|png|gif)))\b#
我简化了你的一些(例如,将 https:|http:
替换为 https?:
),还删除了看起来不必要的 [a-zA-Z0-9]+
。我还改进了分组,使一些非捕获:
新代码(注意我添加了一个额外的图像参考用于测试):
$old = array();
$pattern = "#\b((?:https?://Whosebug\.com/)?uploads/(.*?\.(?:jpg|png|gif)))\b#";
$text = "orem uploads/xyx.gif ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor <img src='https://whosebug.com/uploads/image1.png'/> rem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor <img src='https://whosebug.com/uploads/image2.png'/>";
// seatch and get old urls
preg_match_all($pattern, $text, $old);
print_r($old);
输出:
Array
(
[0] => Array
(
[0] => uploads/xyx.gif
[1] => https://whosebug.com/uploads/image1.png
[2] => https://whosebug.com/uploads/image2.png
)
[1] => Array
(
[0] => uploads/xyx.gif
[1] => https://whosebug.com/uploads/image1.png
[2] => https://whosebug.com/uploads/image2.png
)
[2] => Array
(
[0] => xyx.gif
[1] => image1.png
[2] => image2.png
)
)
如果你想坚持图像名称只包含 [a-zA-Z0-9]
然后将 .*?
更改为 [a-zA-Z0-9]+
即
$pattern = "#\b((?:https?://Whosebug\.com/)?uploads/([a-zA-Z0-9]+\.(?:jpg|png|gif)))\b#";