如何使用 Python 正则表达式在多行文本中查找重复模式?
How to find repeated pattern in multiline text with Python regex?
我对 Python 的正则表达式模块还很陌生。我试图找到提出问题的问题编号和相应的公司名称。我的文本如下所示:
文本输入:
text = """
# Daily Coding Problem
Solutions to problems sent by dailycodingproblem.com
---
#### Problem 1
Given a list of numbers, return whether any two sums to k.
For example, given [10, 15, 3, 7] and k of 17, return true since 10 + 7 is 17.
Bonus: Can you do this in one pass?
[Solution](solutions/problem_001.py)
---
#### Problem 2
This problem was asked by Uber.
Given an array of integers, return a new array such that each element at index i of the new array is the product of all the numbers in the original array except the one at i.
For example, if our input was [1, 2, 3, 4, 5], the expected output would be [120, 60, 40, 30, 24]. If our input was [3, 2, 1], the expected output would be [2, 3, 6].
Follow-up: what if you can't use division?
[Solution](solutions/problem_002.py)
---
#### Problem 3
This problem was asked by Google.
Given the root to a binary tree, implement serialize(root), which serializes the tree into a string, and deserialize(s), which deserializes the string back into the tree.
[Solution](solutions/problem_003.py)
---
"""
import re
from pathlib import Path
pat = r"Problem (\d+)$\n.*asked by (.*)\.$"
out = re.findall(pat,text,flags=re.MULTILINE)
print(out)
"""
我的代码尝试:
import re
pat = r"^Problem (\d+).* asked by (\w+[\s]\w+)."
out = re.findall(pat, text, flags=re.MULTILINE|re.DOTALL)
print(out)
# [('1', 'Google company')]
但是我得到了错误的输出。如何得到正确的预期答案:
problem_num = [2,3]
company = ["Uber", "Google"]
我假设带有 "asked by" 的行总是在问题编号之后。对我来说,它适用于模式。
pat = r"### Problem (\d+)$\n*.*asked by ([a-zA-Z]+)\."
out = re.findall(pat,text,flags=re.MULTILINE)
$ - 由于 MULTILINE 标志,行尾
请注意,这将得到 "Google company" 而不是 "Google"
我对 Python 的正则表达式模块还很陌生。我试图找到提出问题的问题编号和相应的公司名称。我的文本如下所示:
文本输入:
text = """
# Daily Coding Problem
Solutions to problems sent by dailycodingproblem.com
---
#### Problem 1
Given a list of numbers, return whether any two sums to k.
For example, given [10, 15, 3, 7] and k of 17, return true since 10 + 7 is 17.
Bonus: Can you do this in one pass?
[Solution](solutions/problem_001.py)
---
#### Problem 2
This problem was asked by Uber.
Given an array of integers, return a new array such that each element at index i of the new array is the product of all the numbers in the original array except the one at i.
For example, if our input was [1, 2, 3, 4, 5], the expected output would be [120, 60, 40, 30, 24]. If our input was [3, 2, 1], the expected output would be [2, 3, 6].
Follow-up: what if you can't use division?
[Solution](solutions/problem_002.py)
---
#### Problem 3
This problem was asked by Google.
Given the root to a binary tree, implement serialize(root), which serializes the tree into a string, and deserialize(s), which deserializes the string back into the tree.
[Solution](solutions/problem_003.py)
---
"""
import re
from pathlib import Path
pat = r"Problem (\d+)$\n.*asked by (.*)\.$"
out = re.findall(pat,text,flags=re.MULTILINE)
print(out)
"""
我的代码尝试:
import re
pat = r"^Problem (\d+).* asked by (\w+[\s]\w+)."
out = re.findall(pat, text, flags=re.MULTILINE|re.DOTALL)
print(out)
# [('1', 'Google company')]
但是我得到了错误的输出。如何得到正确的预期答案:
problem_num = [2,3]
company = ["Uber", "Google"]
我假设带有 "asked by" 的行总是在问题编号之后。对我来说,它适用于模式。
pat = r"### Problem (\d+)$\n*.*asked by ([a-zA-Z]+)\."
out = re.findall(pat,text,flags=re.MULTILINE)
$ - 由于 MULTILINE 标志,行尾
请注意,这将得到 "Google company" 而不是 "Google"