正则表达式:从 URL 中提取推文用户名和 ID

Regex: Extract Tweet Username and ID From URL

我正在尝试获取推文 URL,如果找到,在消息中使用此正则表达式 #^https?://twitter\.com/(?:\#!/)?(\w+)/status(es)?/(\d+)$#is

但似乎我的正则表达式无法正确获取推文 URL。下面是我的完整代码

function gettweet($string)
{
    $regex = '#^https?://twitter\.com/(?:\#!/)?(\w+)/status(es)?/(\d+)$#is';
    $string = preg_replace_callback($regex, function($matches) {
        $user = $matches[2];
        $statusid = $matches[3];
        $url = "https://twitter.com/$user/status/$statusid";
        $urlen = urlencode($url);
        $getcon = file_get_contents("https://publish.twitter.com/oembed?url=$urlen");
        $con = json_decode($getcon, true);
        $tweet_html = $con["html"];
        return $tweet_html;
    }, $string);
    return $string;
}

$message="This is absolutely trending can you also see it here https://twitter.com/itslifeme/status/765268556133064704 i like it";
$mes=gettweet($message);
echo $mes;

这不会像您预期的那样工作的原因是因为您在正则表达式中包含 anchors,这表示模式必须从头到尾匹配。

通过删除锚点,它匹配...

$regex  = '#https?://twitter\.com/(?:\#!/)?(\w+)/status(es)?/(\d+)#is';
$string = "This is absolutely trending can you also see it here https://twitter.com/itslifeme/status/765268556133064704 i like it";

if (preg_match($regex, $string, $match)) {
    var_dump($match);
}

上面的代码给了我们...

array(4) {
  [0]=>
  string(55) "https://twitter.com/itslifeme/status/765268556133064704"
  [1]=>
  string(9) "itslifeme"
  [2]=>
  string(0) ""
  [3]=>
  string(18) "765268556133064704"
}

此外,确实没有理由在您的表达式中包含 dot all pattern modifier

s (PCRE_DOTALL)

If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.