PHP - 从消息中删除 http/www(主机域除外)以禁用可点击链接
PHP - remove http/www from message (except for the host domain) to disable clickable links
我有一个简单的留言板,比方说:mywebsite.com,它允许用户 post 他们的消息。目前,该板使所有 link 都可点击,即。当某人 post 的内容以:
开头时
http://, https://, www., http://www., https://www.
然后脚本会自动将它们设为 links(即添加 A href.. 标签)。
问题 - 垃圾邮件太多。所以我的想法是自动删除上面的 http|s/www 这样它们就不会变成 'clickable links.' 但是,我想允许 post 用户 link 访问我的页面网站,即。当消息包含 link/s 到 mywebsite.com.
时不要删除 http|s/www
我的想法是创建两个数组:
$removeParts = array('http://', 'https://', 'www.', 'http://www.', 'https://www.');
$keepParts = array('http://mywebsite.com', 'http://www.mywebsite.com', 'www.mywebsite.com', 'http://mywebsite.com', 'https://www.mywebsite.com', 'https://mywebsite.com');
但我不知道如何正确使用它们(可能 str_replace 可以以某种方式工作)。
下面是 posting 之前和 posting 之后的 $message 示例:
$message BEFORE:
Hello world, thanks to http://mywebsite/about I learned a lot. I found
you on http://www.bing.com, https://google.com/search and on some www.spamwebsite.com/refid=spammer2.
$message AFTER:
Hello world, thanks to http://mywebsite.com/about I learned a lot. I
found you on bing.com, google.com/search and on some spamwebsite.com/refid=spammer2.
请注意,用户在 post 表单中输入明文,因此脚本只能使用此明文(而不是 href 等)。
$url = "http://mywebsite/about";
$parse = parse_url($url);
if($parse["host"] == "mywebsite")
echo "My site, let's mark it as link";
killSpam()
函数特点:
- 适用于单引号和双引号。
- 无效html
- ftp://
- http://
- https://
- 文件://
- 邮寄地址:
function killSpam($html, $whitelist){
//process html links
preg_match_all('%(<(?:\s+)?a.*?href=["|\'](.*?)["|\'].*?>(.*?)<(?:\s+)?/(?:\s+)?a(?:\s+)?>)%sm', $html, $match, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($match[1]); $i++) {
if(!preg_match("/$whitelist/", $match[1][$i])){
$spamsite = $match[3][$i];
$html = preg_replace("%" . preg_quote($match[1][$i]) . "%", " (SPAM) ", $html);
}
}
//process cleartext links
preg_match_all('/(\b(?:(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[A-Z0-9+&@#\/%?=~_|$!:,.;-]*[A-Z0-9+&@#\/%=~_|$-]|((?:mailto:)?[A-Z0-9._%+-]+@[A-Z0-9._%-]+\.[A-Z]{2,6})\b)|"(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[^"\r\n]+"|\'(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[^\'\r\n]+\')/i', $html, $match2, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($match2[1]); $i++) {
if(!preg_match("/$whitelist/", $match2[1][$i])){
$spamsite = $match2[1][$i];
$html = preg_replace("%" . preg_quote($spamsite) . "%", " (SPAM) ", $html);
}
}
return $html;
}
用法:
$html = <<< LOB
<p>Hello world, thanks to <a href="http://mywebsite.com/about" rel="nofollow">http://mywebsite/about</a> I learned a lot. I found
you on <a href="http://www.bing.com" rel="nofollow">http://www.bing.com</a>, <a href="https://google.com/search" rel="nofollow">https://google.com/search</a> and on some <a href="http://www.spamwebsite.com" rel="nofollow">www.spamwebsite.com/refid=spammer2< /a >. www.spamme.com, http://morespam.com/?aff=122, http://crazyspammer.com/?money=22 and spam@email.com, file://spamfile.com/file.txt ftp://spamftp.com/file.exe </p>
LOB;
$whitelist = "(google\.com|yahoo\.com|bing\.com|nicesite\.com|mywebsite\.com)";
$noSpam = killSpam($html, $whitelist);
echo $noSpam;
垃圾邮件示例:
我不能 POST 垃圾邮件 HTML 在这里,我想这是自己的 killSpam()...- 在 [=15 查看它=]
Hello world, thanks to http://mywebsite/about I learned a lot. I found
you on http://www.bing.com, https://google.com/search and on some www.spamwebsite.com/refid=spammer2.
www.spamme.com, http://morespam.com/?aff=122,
http://crazyspammer.com/?money=22 and spam@email.com,
file://spamfile.com/file.txt ftp://spamftp.com/file.exe
输出:
Hello world, thanks to (SPAM) I learned a lot. I found you on http://www.bing.com, https://google.com/search and on some (SPAM) .
(SPAM) , (SPAM) , (SPAM) and (SPAM) , (SPAM) (SPAM)
演示:
如果您想保留链接的文本,但将它们设为 "not clickable",您可以尝试此代码:
<?php
$text = <<<__text
Hello world, thanks to http://mywebsite/about I learned a lot.
I found you on http://www.bing.com, https://google.com/search and on some www.spamwebsite.com/refid=spammer2.
www.spamme.com, http://morespam.com/?aff=122, http://crazyspammer.com/?money=22 and spam@email.com, file://spamfile.com/file.txt ftp://spamftp.com/file.exe
__text;
$allowed_domains = ['mywebsite.com'];
$pattern = "/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[\-;:&=\+$,\w]+@)?[A-Za-z0-9\.\-]+|(?:www\.|[\-;:&=\+$,\w]+@)[A-Za-z0-9\.\-]+)((?:\/[\+~%\/\.\w\-_]*)?\??(?:[\-\+=&;%@\.\w_]*)#?(?:[\.\!\/\\w]*))?)/";
preg_match_all($pattern, $text, $matches, PREG_SET_ORDER);
foreach ($matches as $m) {
list(, $url, $scheme_and_domain, $scheme, $path) = $m;
$domain = preg_replace(['/^' . preg_quote($scheme, '/') . '/i', "/^www./i"], '', $scheme_and_domain);
if (in_array($domain, $allowed_domains)) continue;
$url_prepared = rtrim("$domain$path", '/');
$text = str_replace($url, $url_prepared, $text);
}
echo $text;
对于任何寻找答案的人 - 我发布了一个解决问题的相关(更具体)问题:
我有一个简单的留言板,比方说:mywebsite.com,它允许用户 post 他们的消息。目前,该板使所有 link 都可点击,即。当某人 post 的内容以:
开头时http://, https://, www., http://www., https://www.
然后脚本会自动将它们设为 links(即添加 A href.. 标签)。
问题 - 垃圾邮件太多。所以我的想法是自动删除上面的 http|s/www 这样它们就不会变成 'clickable links.' 但是,我想允许 post 用户 link 访问我的页面网站,即。当消息包含 link/s 到 mywebsite.com.
时不要删除 http|s/www我的想法是创建两个数组:
$removeParts = array('http://', 'https://', 'www.', 'http://www.', 'https://www.');
$keepParts = array('http://mywebsite.com', 'http://www.mywebsite.com', 'www.mywebsite.com', 'http://mywebsite.com', 'https://www.mywebsite.com', 'https://mywebsite.com');
但我不知道如何正确使用它们(可能 str_replace 可以以某种方式工作)。
下面是 posting 之前和 posting 之后的 $message 示例:
$message BEFORE:
Hello world, thanks to http://mywebsite/about I learned a lot. I found you on http://www.bing.com, https://google.com/search and on some www.spamwebsite.com/refid=spammer2.
$message AFTER:
Hello world, thanks to http://mywebsite.com/about I learned a lot. I found you on bing.com, google.com/search and on some spamwebsite.com/refid=spammer2.
请注意,用户在 post 表单中输入明文,因此脚本只能使用此明文(而不是 href 等)。
$url = "http://mywebsite/about";
$parse = parse_url($url);
if($parse["host"] == "mywebsite")
echo "My site, let's mark it as link";
killSpam()
函数特点:
- 适用于单引号和双引号。
- 无效html
- ftp://
- http://
- https://
- 文件://
- 邮寄地址:
function killSpam($html, $whitelist){
//process html links
preg_match_all('%(<(?:\s+)?a.*?href=["|\'](.*?)["|\'].*?>(.*?)<(?:\s+)?/(?:\s+)?a(?:\s+)?>)%sm', $html, $match, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($match[1]); $i++) {
if(!preg_match("/$whitelist/", $match[1][$i])){
$spamsite = $match[3][$i];
$html = preg_replace("%" . preg_quote($match[1][$i]) . "%", " (SPAM) ", $html);
}
}
//process cleartext links
preg_match_all('/(\b(?:(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[A-Z0-9+&@#\/%?=~_|$!:,.;-]*[A-Z0-9+&@#\/%=~_|$-]|((?:mailto:)?[A-Z0-9._%+-]+@[A-Z0-9._%-]+\.[A-Z]{2,6})\b)|"(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[^"\r\n]+"|\'(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[^\'\r\n]+\')/i', $html, $match2, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($match2[1]); $i++) {
if(!preg_match("/$whitelist/", $match2[1][$i])){
$spamsite = $match2[1][$i];
$html = preg_replace("%" . preg_quote($spamsite) . "%", " (SPAM) ", $html);
}
}
return $html;
}
用法:
$html = <<< LOB
<p>Hello world, thanks to <a href="http://mywebsite.com/about" rel="nofollow">http://mywebsite/about</a> I learned a lot. I found
you on <a href="http://www.bing.com" rel="nofollow">http://www.bing.com</a>, <a href="https://google.com/search" rel="nofollow">https://google.com/search</a> and on some <a href="http://www.spamwebsite.com" rel="nofollow">www.spamwebsite.com/refid=spammer2< /a >. www.spamme.com, http://morespam.com/?aff=122, http://crazyspammer.com/?money=22 and spam@email.com, file://spamfile.com/file.txt ftp://spamftp.com/file.exe </p>
LOB;
$whitelist = "(google\.com|yahoo\.com|bing\.com|nicesite\.com|mywebsite\.com)";
$noSpam = killSpam($html, $whitelist);
echo $noSpam;
垃圾邮件示例:
我不能 POST 垃圾邮件 HTML 在这里,我想这是自己的 killSpam()...- 在 [=15 查看它=]
Hello world, thanks to http://mywebsite/about I learned a lot. I found you on http://www.bing.com, https://google.com/search and on some www.spamwebsite.com/refid=spammer2. www.spamme.com, http://morespam.com/?aff=122, http://crazyspammer.com/?money=22 and spam@email.com, file://spamfile.com/file.txt ftp://spamftp.com/file.exe
输出:
Hello world, thanks to (SPAM) I learned a lot. I found you on http://www.bing.com, https://google.com/search and on some (SPAM) . (SPAM) , (SPAM) , (SPAM) and (SPAM) , (SPAM) (SPAM)
演示:
如果您想保留链接的文本,但将它们设为 "not clickable",您可以尝试此代码:
<?php
$text = <<<__text
Hello world, thanks to http://mywebsite/about I learned a lot.
I found you on http://www.bing.com, https://google.com/search and on some www.spamwebsite.com/refid=spammer2.
www.spamme.com, http://morespam.com/?aff=122, http://crazyspammer.com/?money=22 and spam@email.com, file://spamfile.com/file.txt ftp://spamftp.com/file.exe
__text;
$allowed_domains = ['mywebsite.com'];
$pattern = "/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[\-;:&=\+$,\w]+@)?[A-Za-z0-9\.\-]+|(?:www\.|[\-;:&=\+$,\w]+@)[A-Za-z0-9\.\-]+)((?:\/[\+~%\/\.\w\-_]*)?\??(?:[\-\+=&;%@\.\w_]*)#?(?:[\.\!\/\\w]*))?)/";
preg_match_all($pattern, $text, $matches, PREG_SET_ORDER);
foreach ($matches as $m) {
list(, $url, $scheme_and_domain, $scheme, $path) = $m;
$domain = preg_replace(['/^' . preg_quote($scheme, '/') . '/i', "/^www./i"], '', $scheme_and_domain);
if (in_array($domain, $allowed_domains)) continue;
$url_prepared = rtrim("$domain$path", '/');
$text = str_replace($url, $url_prepared, $text);
}
echo $text;
对于任何寻找答案的人 - 我发布了一个解决问题的相关(更具体)问题: