使用 Regex 抓取所有文本(包括新行)

Grab all text (including new lines) with Regex

我正在尝试弄清楚如何获取 [text](URL) 之后的所有文本,但由于换行 (\n\n),我很难在它之后包含整个文本。我目前正在尝试 (?<=.\)\n\n)(.*\n+) 的变体,但它只包括下一段。

文本如下所示:

---
layout: post
title: "13 - First Principles of AGI Safety with Richard Ngo"
date: 2022-03-30 22:15 -0700
categories: episode
---

[Google Podcasts link](https://podcasts.google.com/feed/aHR0cHM6Ly9heHJwb2RjYXN0LmxpYnN5bi5jb20vcnNz/episode/OTlmYzM1ZjEtMDFkMi00ZTExLWExYjEtNTYwOTg2ZWNhOWNi)

How should we think about artificial general intelligence (AGI), and the risks it might pose? What constraints exist on technical solutions to the problem of aligning superhuman AI systems with human intentions? In this episode, I talk to Richard Ngo about his report analyzing AGI safety from first principles, and recent conversations he had with Eliezer Yudkowsky about the difficulty of AI alignment.

Topics we discuss:
- [The nature of intelligence and AGI](#agi-intelligence-nature)
  - [The nature of intelligence](#nature-of-intelligence)
  - [AGI: what and how](#agi-what-how)
  - [Single vs collective AI minds](#single-collective-ai-minds)
- [AGI in practice](#agi-in-practice)
  - [Impact](#agi-impact)
  - [Timing](#agi-timing)
  - [Creation](#agi-creation)
  - [Risks and benefits](#agi-risks-benefits)
- [Making AGI safe](#making-agi-safe)
  - [Robustness of the agency abstraction](#agency-abstraction-robustness)
  - [Pivotal acts](#pivotal-acts)
- [AGI safety concepts](#agi-safety-concepts)
  - [Alignment](#ai-alignment)
  - [Transparency](#transparency)
  - [Cooperation](#cooperation)
- [Optima and selection pressures](#optima-selection-pressures)
- [The AI alignment research community](#ai-alignment-research-community)
  - [Updates from Yudkowsky conversation](#yudkonversation-updates)
  - [Corrections to the community](#community-corrections)
  - [Why others don't join](#why-others-dont-join)
- [Richard Ngo as a researcher](#ngo-as-researcher)
- [The world approaching AGI](#world-approaching-agi)
- [Following Richard's work](#following-richards-work)

**Daniel Filan:**
Hello, everybody. Today, I'll be speaking with Richard Ngo. Richard is a researcher at OpenAI, where he works on AI governance and forecasting. He also was a research engineer at DeepMind, and designed the course ["AGI Safety Fundamentals"](https://www.eacambridge.org/agi-safety-fundamentals). We'll be discussing his report, [AGI Safety from First Principles](https://www.alignmentforum.org/s/mzgtmmTKKn5MuCzFJ), as well as his [debate with Eliezer Yudkowsky](https://www.alignmentforum.org/s/n945eovrA3oDueqtq) about the difficulty of AI alignment. For links to what we're discussing, you can check the description of this episode, and you can read the transcripts at [axrp.net](https://axrp.net/). Well, Richard, welcome to the show.

**Richard Ngo:**
Thanks so much for having me.

感谢您的帮助!

假设您有能力将整个文本读入一个字符串变量,您可以在此处使用 re.search

s = re.search(r'\[.*?\]\(https?://.*?\)\s+(.*)', text, flags=re.S)
print(s.group(1)))

这会打印出您似乎想要的文本:

How should we think about artificial general intelligence (AGI), and the risks it might pose? What constraints exist on technical solutions to the problem of aligning superhuman AI systems with human intentions? In this episode, I talk to Richard Ngo about his report analyzing AGI safety from first principles, and recent conversations he had with Eliezer Yudkowsky about the difficulty of AI alignment...

请注意,我们在全点模式下执行此正则表达式查找,因此 .* 将匹配换行符。

我决定选择以下方法:

end = re.search('\[(Google Podcasts link)\]\((.+)\)\n\n)', text).end()
text = text[end:]

所以我只是寻找我想要的文本开始文本,然后使用 .end() 将文本字符串切片到我想要的位置。

由于似乎出现了 1 次,您也可以使用 split 模式:

(?m)^\s*\[[^][]*]\(https?://[^\s()]*\)\s*

说明

  • (?m) 启用多行
  • ^ 字符串开头
  • \s* 匹配选项空白字符
  • \[[^][]*] 匹配来自 [...]
  • \(https?://[^\s()]*\) 在括号
  • 之间匹配一个url
  • \s* 匹配尾随空白字符

查看 regex101 demo

的比赛

例子

result = re.split(r"(?m)^\s*\[[^][]*]\(https?://[^\s()]*\)\s*", text)
print(result[1])

或更具体

result = re.split(r"(?m)^\s*\[Google Podcasts link]\(https?://[^\s()]*\)\s*", text)
print(result[1])

看到一个Python demo.