Google 工作表 - 抓取 table 涉及分页

Question

我正在尝试使用 google 工作表来解决问题。我从 finviz.com 中提取数据以构建自定义股票筛选器，但唯一的问题是它们使用分页，因此前几个结果只允许 20 行。我已经检查过，如果我在 table 的分页部分中单击第二页结果，只有 URL 发生变化，表示新 table 的第一行。这意味着如果我的第一个结果页面有 20 行，那么第二个结果页面 URL 将有一个类似“r=21”的参数，指示第二页结果的第一行。现在，我将如何着手确保在 table 的分页到位后提取所有数据？此外，检查页面的来源，这些新参数存储在 href 中，这意味着如果我们的分页结果有 3 页，那么在 <table/> 元素中我们可以看到 href 中的新 urls，例如：

<table>
  <a href="screener.ashx?v=111&f=targetprice_a5&r=21"/>
  <a href="screener.ashx?v=111&f=targetprice_a5&r=41"/>
  <a href="screener.ashx?v=111&f=targetprice_a5&r=61"/>
</table>

注意url“r=21”中只增加了一个新参数，其余参数在不同的结果页面中保持一致。

使用 google 张纸甚至可以做到这一点吗？

这就是我 have。这个想法的目标是构建每 3 分钟更新一次的股票市场筛选器，这允许 integration/view 来自概念。

=QUERY(IMPORTHTML("https://finviz.com/screener.ashx?v=111&f=cap_smallover,earningsdate_thismonth,fa_epsqoq_o15,fa_grossmargin_o20,sh_avgvol_o750,sh_curvol_o1000,ta_perf_52w10o,ta_rsi_nob50&ft=4&o=perfytd&ar=180","Table","19"),"SELECT Col1,Col2,Col7,Col8,Col9,Col10,Col11")

Answer 1

尝试：

=QUERY({
 IMPORTHTML("https://finviz.com/screener.ashx?v=111&f=cap_smallover,earningsdate_thismonth,fa_epsqoq_o15,fa_grossmargin_o20,sh_avgvol_o750,sh_curvol_o1000,ta_perf_52w10o,ta_rsi_nob50&ft=4&o=perfytd&ar=180","Table","19");
 IMPORTHTML("https://finviz.com/screener.ashx?v=111&f=cap_smallover,earningsdate_thismonth,fa_epsqoq_o15,fa_grossmargin_o20,sh_avgvol_o750,sh_curvol_o1000,ta_perf_52w10o,ta_rsi_nob50&ft=4&o=perfytd&r=21&ar=180","Table","19");
 IMPORTHTML("https://finviz.com/screener.ashx?v=111&f=cap_smallover,earningsdate_thismonth,fa_epsqoq_o15,fa_grossmargin_o20,sh_avgvol_o750,sh_curvol_o1000,ta_perf_52w10o,ta_rsi_nob50&ft=4&o=perfytd&r=41&ar=180","Table","19");
 IMPORTHTML("https://finviz.com/screener.ashx?v=111&f=cap_smallover,earningsdate_thismonth,fa_epsqoq_o15,fa_grossmargin_o20,sh_avgvol_o750,sh_curvol_o1000,ta_perf_52w10o,ta_rsi_nob50&ft=4&o=perfytd&r=61&ar=180","Table","19")},
 "select Col1,Col2,Col7,Col8,Col9,Col10,Col11 where Col1 matches '\d+'", 1)

Google 工作表 - 抓取 table 涉及分页

Google Sheets - Scrape table involved with pagination

import

xpath

google-sheets

google-query-language

google-sheets-formula