尝试清理 Rails 中的 HTML 片段以仅获取图像标签之间的内容,以便我可以在图像中显示

Trying to Sanitize HTML Fragment in Rails to only get what's between the image tags so I can display in an image

我正在使用 Feedjira 解析一些 RSS 提要,我得到的数据如下所示:

<table border="0" cellpadding="2" cellspacing="7" style="vertical-align:top;"><tr><td width="80" align="center" valign="top"><font style="font-size:85%;font-family:arial,sans-serif"><a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=us&amp;usg=AFQjCNEnMLee_eB0lY7hCtIqJCf8Iy2StQ&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52778768548994&amp;ei=xaUHVaj4GcLBmQLyjIDIDw&amp;url=http://www.foxnews.com/weather/2015/03/15/cyclone-pam-vanuatu/?intcmp%3Dlatestnews"><img src="//t1.gstatic.com/images?q=tbn:ANd9GcTHyV7D2Zf-QfzLZ-7qJlk0mE3nU7qM3-mnENtJPURJTk8o9Kh-Iqc_focHCHAALYhnRuY1Nop6" alt="" border="1" width="80" height="80"><br><font size="-2">Fox News</font></a></font></td><td valign="top" class="j"><font style="font-size:85%;font-family:arial,sans-serif"><br><div style="padding-top:0.8em;"><img alt="" height="1" width="1"></div><div class="lh"><a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=us&amp;usg=AFQjCNHTAYRk1bcvBCJxvZ4M0OUUrXTXQg&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52778768548994&amp;ei=xaUHVaj4GcLBmQLyjIDIDw&amp;url=http://www.dailymail.co.uk/wires/reuters/article-2997951/Aid-agencies-begin-helicopter-flights-cyclone-stricken-Vanuatu.html"><b>Aid agencies begin...

我正在尝试删除除 img src 之外的所有内容,以便我可以在我的页面上文章文本旁边的 img 标记中显示它。我通过以下方式使用 Ryan Grove 的 Sanitize Gem:

<img class="media-object" src="<%= Sanitize.fragment(entry.content, :elements => ['img'], :attributes => { 'img' => ['src']}) %>" alt="..." style="width:72px;height:72px">

但是,这是在我的 html 中插入以下内容:

<img class="media-object" src="<img src="//t1.gstatic.com/images?q=tbn:ANd9GcTHyV7D2Zf-QfzLZ-7qJlk0mE3nU7qM3-mnENtJPURJTk8o9Kh-Iqc_focHCHAALYhnRuY1Nop6">Fox News <img>  Aid agencies begin flights to cyclone-stricken Vanuatu, official toll lowered Daily Mail TANNA, March 17 (Reuters) - International aid agencies began emergency flights on Tuesday to some of the remote outer islands of Vanuatu, which they fear have been devastated by a monster cyclone that tore through the South Pacific island nation. Relief, hardship as Cyclone Pam survivors battle onBangkok Post UN says 24 dead in Vanuatu after Cyclone Pam7Online WSVN-TV Fears for food supplies in Vanuatu as capital cleans upThe Star Online Xinhua -MSNBC -Bloomberg all 4,389 news articles » " alt="..." style="width:72px;height:72px">

有什么想法可以只获取 src link 而不是其他所有内容吗?

感谢任何帮助!

对于像这样非常简单的HTML解析,正则表达式简单可靠。例如,

feedjira_output =~ /src="([^"]+)"/

这会将源 url 放入正则表达式组中(可通过 变量访问)。