从网站 Google 表格导入数据的正则表达式提取
Regexextract of importdata from website GoogleSheets
目的是提取title and tags from a webpage.
我正在使用 importdata
,我希望所有结果都在 1 行中。像这样:
[webpage] [title] [1st tag] [2nd tag] [3 rd tag] [4th tag] ... [last tag]
我卡在了一半my process in googlesheet
第一个选项卡 Extracted
- 我已经从
大数据。
=query({array_constrain(IMPORTDATA(A1),6375,10)},"WHERE (Col1 CONTAINS 'btn btn-secondary' AND Col1 CONTAINS 'href') or (Col1 CONTAINS 'meta property' AND Col1 CONTAINS 'og:title')")
second tab with REGEXEXTRACT
- 提取了我需要的文本,但仅适用于第一行(仅提取 tags
,title
仍然不存在分布在几列中...)
=REGEXEXTRACT(query({array_constrain(IMPORTDATA(A1),6375,10)},"WHERE (Col1 CONTAINS 'btn btn-secondary' AND Col1 CONTAINS 'href')"),"\>(.+)\
我不知道如何走得更远:(任何帮助表示赞赏!
=ARRAYFORMULA({REGEXREPLACE(TEXTJOIN(", ",1,
QUERY(ARRAY_CONSTRAIN(SUBSTITUTE(IMPORTDATA(A2),"""",""),1000,15),
"where Col1 contains '<meta property=og:title content='")),
"<meta property=og:title content=| />",""),
TRANSPOSE(REGEXEXTRACT(QUERY(TRANSPOSE(QUERY(TRANSPOSE(
ARRAY_CONSTRAIN(SUBSTITUTE(IMPORTDATA(A2),"""",""),8000,3)),,50000)),
"where Col1 contains '<a class=btn btn-secondary'"),"\>(.*)+\<"))})
目的是提取title and tags from a webpage.
我正在使用 importdata
,我希望所有结果都在 1 行中。像这样:
[webpage] [title] [1st tag] [2nd tag] [3 rd tag] [4th tag] ... [last tag]
我卡在了一半my process in googlesheet
第一个选项卡
Extracted
- 我已经从 大数据。=query({array_constrain(IMPORTDATA(A1),6375,10)},"WHERE (Col1 CONTAINS 'btn btn-secondary' AND Col1 CONTAINS 'href') or (Col1 CONTAINS 'meta property' AND Col1 CONTAINS 'og:title')")
second tab
with REGEXEXTRACT
- 提取了我需要的文本,但仅适用于第一行(仅提取tags
,title
仍然不存在分布在几列中...)=REGEXEXTRACT(query({array_constrain(IMPORTDATA(A1),6375,10)},"WHERE (Col1 CONTAINS 'btn btn-secondary' AND Col1 CONTAINS 'href')"),"\>(.+)\
我不知道如何走得更远:(任何帮助表示赞赏!
=ARRAYFORMULA({REGEXREPLACE(TEXTJOIN(", ",1,
QUERY(ARRAY_CONSTRAIN(SUBSTITUTE(IMPORTDATA(A2),"""",""),1000,15),
"where Col1 contains '<meta property=og:title content='")),
"<meta property=og:title content=| />",""),
TRANSPOSE(REGEXEXTRACT(QUERY(TRANSPOSE(QUERY(TRANSPOSE(
ARRAY_CONSTRAIN(SUBSTITUTE(IMPORTDATA(A2),"""",""),8000,3)),,50000)),
"where Col1 contains '<a class=btn btn-secondary'"),"\>(.*)+\<"))})