如何str_replace Google Facebook分享新闻RSS?

How to str_replace Google News RSS for Facebook Share?

您好,我正在使用 simpleXML 显示 news.google.com 提要。

这样显示的词条link到原文:

http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=AFQjCNEcqhcp4AfUzgxc2l1gumydaxQ-KQ&clid=c3a7d30bb8a4878e06b80cf16b898331&cid=52778832126843&ei=keFLVfiHGvDVmQL5_4GgBg&url=http://WEBSITEWITHNEWS.COM/ARTICLEURLHERE

我需要 link 的条目来代替: http://WEBSITEWITHNEWS.COM/ARTICLEURLHERE

原因是Facebook Sharer无法解释以下link:

https://www.facebook.com/sharer/sharer.php?u=http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=AFQjCNEcqhcp4AfUzgxc2l1gumydaxQ-KQ&clid=c3a7d30bb8a4878e06b80cf16b898331&cid=52778832126843&ei=keFLVfiHGvDVmQL5_4GgBg&url=http://WEBSITEWITHNEWS.COM/ARTICLEURLHERE

Facebook 分享器 需要它看起来像这样:

https://www.facebook.com/sharer/sharer.php?u=http://WEBSITEWITHNEWS.COM/ARTICLEURLHERE

有没有一种方法可以使用 regex(str_replace 或 preg_match) 删除 Google 重定向 URL 以便社交分享网站可以识别 link?

Google 重定向 URL 是动态的,因此每次都会略有不同,因此我需要一些可以替换每个变体的东西。

我的工作,功能代码:

    $feed = file_get_contents("https://news.google.com/news/feeds?q=KEYWORD&output=rss");
$xml = new SimpleXmlElement($feed);
foreach ($xml->channel->item as $entry){
  $date = $entry->pubDate; 
  $date = strftime("%m/%d/%y %I:%M:%S%P", strtotime($date));
  $desc = $entry->description;
  $desc = str_replace("and more »", "","$desc");
  $desc = str_replace("font-size:85%", "font-size:100%","$desc");
  ?>
  <div class="item"></div>
  <?php echo $desc; ?>
  <div class="date">
  <?php echo $date; ?></div>
  <?php } ?>
 $desc = $entry->description;
 $date = $entry->pubDate; 
 $date = strftime("%A, %m/%d/%Y, %H:%M:%S", strtotime($date));
 $desc = str_replace("and more »","x","and more »");
  echo $date; 
  echo $desc;
  }

我使用 $desc 来显示 link 而不是 $link,但是 URL 到带有 Google 重定向 URL 的文章仍然在 $link 如果你想 str_replace 或 preg_match $link 而不是 $desc

Link 开始工作 Google 以下新闻提要: https://news.google.com/news/feeds?q=KEYWORD&output=rss

如果您知道如何解决这个问题,那么您就是英雄。谢谢 Overflowers

我第一条评论的答案是使用这个正则表达式。

<?php
date_default_timezone_set('America/New_York');
$feed = file_get_contents("https://news.google.com/news/feeds?q=KEYWORD&output=rss");
$xml = new SimpleXmlElement($feed);
foreach ($xml->channel->item as $entry) {
    $date = $entry->pubDate;
    $date = strftime("%m/%d/%y %I:%M:%S%P", strtotime($date));
    $desc = $entry->description;
    $desc = str_replace("and more&nbsp;&raquo;", "","$desc");
    $desc = str_replace("font-size:85%", "font-size:100%","$desc"); /*
    ?>
    <div class="item"></div>
    <?php // echo $desc; ?>
    <div class="date"><?php echo $date; ?></div>
    <?php
    */
    $desc = $entry->description;
    $desc = preg_replace('~href=".*?&amp;url=(.*?)"~', 'href="https://www.facebook.com/sharer/sharer.php?u="', $desc);
    $date = $entry->pubDate; 
    $date = strftime("%A, %m/%d/%Y, %H:%M:%S", strtotime($date));
    //$desc = str_replace("and more »","x","and more »");
    echo $date . "\n" . $desc;
    die('1 pass');
}
?>

输出(为显示而改变的格式):

<table border="0" cellpadding="2" cellspacing="7" style="vertical-align:top;">
    <tr>
        <td width="80" align="center" valign="top"><font style="font-size:85%;font-family:arial,sans-serif"></font></td>
        <td valign="top" class="j"><font style="font-size:85%;font-family:arial,sans-serif"><br>
            <div style="padding-top:0.8em;"><img alt="" height="1" width="1"></div>
            <div class="lh"><a href="https://www.facebook.com/sharer/sharer.php?u=http://www.gamasutra.com/blogs/JonathanRaveh/20150506/242840/Death_of_the_app_keyword__whats_next.php"><b>Death of the app <b>keyword</b> – what&#39;s next?</b></a><br>
                <font size="-1"><b><font color="#6f6f6f">Gamasutra (blog)</font></b></font><br>
                <font size="-1">Yes, app <b>keywords</b> are dying. If you search the web you may find insightful stories about apps that gained massive recognition due to the clever use of <b>keywords</b>. Many companies and services (such as Sensor Tower) offer developers tools to help them&nbsp;...</font><br>
                <font size="-1" class="p"></font><br>
                <font class="p" size="-1"><a class="p" href="http://news.google.com/news/more?ncl=d4b6j-gMxFN1VKM&amp;authuser=0&amp;ned=us"><nobr><b>and more&nbsp;&raquo;</b></nobr></a></font></div>
            </font></td>
    </tr>
</table>
1 pass

这个正则表达式 ".*?&amp;url=(.*?)" 正在查找 href 的第一个双引号和最后一个双引号之间,并捕获 &amp;url= 之后的所有内容。在示例中,我看到每个实例都将 URL 作为最后一个参数。如果 URL 不是最后一个参数,则此正则表达式将不起作用,因为它使用检查来查找最后一个双引号或实体符号;那将是 ("|&amp;)。不过,我可以看到从 URLs 中截断了参数;如果他们有额外的 GET 参数。我在这些 URL 中从未见过的另一件事是它们使用 GET 参数。取出 die('1 pass'); 试试看,如果您一开始想要小样本,请保留 die

您可以为此使用内置的 PHP 函数 parse_url (split URL into components) and parse_str(从查询字符串中获取参数值):

$feed = file_get_contents(
    "https://news.google.com/news/feeds?q=KEYWORD&output=rss"
);
$xml = new SimpleXmlElement($feed);

foreach ($xml->channel->item as $entry){
    // Get query part of link
    $query = parse_url($entry->link, PHP_URL_QUERY);

    // Parse query parameters into $params array
    parse_str($query, $params);

    // Get URL from parameters
    $url = $params['url'];

    // Just output in this example
    echo "URL: $url", PHP_EOL;

    // ... Do some more stuff
}

输出:

URL: http://www.gamasutra.com/blogs/JonathanRaveh/20150506/242840/Death_of_the_app_keyword__whats_next.php
URL: http://www.business2community.com/online-marketing/8-keyword-optimization-tips-perfect-ppc-campaigns-01222200
URL: http://searchengineland.com/marry-keywords-compelling-content-218174
...