如何使用 Cheerio 抓取图像并粘贴到 Google 表格?

How to scrape images with Cheerio and paste to Google Sheets?

这是我第一次尝试学习如何从网络上抓取图像并将它们粘贴到 Google Sheets。我想从 https://ir.eia.gov/ngs/ngs.html 下载第二张图片并将其粘贴到 Google Sheet。在网络上,有两个图像。我想在 地下存储中的工作气体与 Five-Year 范围相比 下获取第二张图片。我喜欢学习如何在代码中引用它的 img alt= 或 src="ngs.gif",而不是索引,这样我也可以将这个概念用于其他各种 HTML 情况。任何人都可以帮助修复以下代码以便我学习吗?谢谢!

function test() {
  const url = 'https://ir.eia.gov/ngs/ngs.html';
  const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
  var $ = Cheerio.load(res);
  
  // I want to download the image, <img alt="Working Gas in Underground Storage Compared with Five-Year Range" src="ngs.gif" border="0">
  // What should be changed in the following code?
  var chart = $('img').attr('src').find('ngs.gif');
  SpreadsheetApp.getActiveSheet().insertImage(chart, 1, 1);
}

我相信你的目标如下。

  • 您想检索 img 个标签的第二张图像并将其放入电子表格。

在这个HTML中,URL似乎是https://ir.eia.gov/ngs/ + filename。于是想到可以用insertImage(url, column, row)的方法。当这反映到您的脚本中时,以下修改后的脚本怎么样?

修改后的脚本:

function test() {
  const url = 'https://ir.eia.gov/ngs/ngs.html';
  const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
  const $ = Cheerio.load(res);
  const urls = [];
  $('img').each(function () {
    urls.push("https://ir.eia.gov/ngs/" + $(this).attr('src'));
  });
  if (urls.length > 1) {
    SpreadsheetApp.getActiveSheet().insertImage(urls[1], 1, 1); // 2nd image is retrieved.
  }
}
  • 当此脚本为 运行 时,检索 https://ir.eia.gov/ngs/ngs.gif 的 URL 并将图像放入电子表格。

参考:

已添加:

关于您在评论中提出的新问题,

Thanks a lot! So other than calling the index of the image, is there no method to call either alt="Working Gas in Underground Storage Compared with Five-Year Range" or src="ngs.gif" in the code? I'm just curious to learn a smart way for a potential scenario, for instance, if a web has 20 images and the locations of those images keep changing day by day, so the second image is not always in the second place. Thank you again for any guide!

在这种情况下,下面的示例脚本怎么样?

示例脚本:

function test() {
  const url = 'https://ir.eia.gov/ngs/ngs.html';
  const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
  const $ = Cheerio.load(res);

  const obj = [];
  $('img').each(function () {
    const t = $(this);
    const src = t.attr('src');
    obj.push({ alt: t.attr('alt'), src: src, url: "https://ir.eia.gov/ngs/" + src });
  });

  const searchAltValue = "Working Gas in Underground Storage Compared with Five-Year Range";
  const searchSrcValue = "ngs.gif";
  const ar = obj.filter(({alt, src}) => alt == searchAltValue && src == searchSrcValue);
  if (ar.length > 0) {
    SpreadsheetApp.getActiveSheet().insertImage(ar[0].url, 1, 1);
  }
}
  • 在此示例脚本中,当 srcalt 的值分别为 Working Gas in Underground Storage Compared with Five-Year Rangengs.gif 时,检索 URL 并将其放入到图像。
  • 如果要select Working Gas in Underground Storage Compared with Five-Year Range OR ngs.gif,请将alt == searchAltValue && src == searchSrcValue修改为alt == searchAltValue || src == searchSrcValue