如何从字符串中的特定单词开始提取文本？

Question

所以我试图只从这个字符串中提取地址，但我遇到了麻烦。这是字符串的样子：

1040 S. Vintage Ave.
Building A Ontario, CA 91761
United States Phone: 9099725134 Fax: 9099065401

Web: http://www.aareninc.com

我只想提取单词 'Phone' 之前的文本，所以只提取地址。

我试过 strip('Phone') 然后取数组的第一个元素，但它给了我该字符串的第一个字母。

address = contacts.strip('Phone')
print(address[0])

Answer 1

使用拆分功能，而不是剥离。

address = contacts.split('Phone')
print(address[0])

这应该有效。

Answer 2

正如@JonClements 评论的那样，解决方案是：

contacts.partition('Phone')[0]

Answer 3

对于该任务，您可以使用所谓的零长度断言（在这种情况下为正先行）

import re
text = '''1040 S. Vintage Ave.
Building A Ontario, CA 91761
United States Phone: 9099725134 Fax: 9099065401 

Web: http://www.aareninc.com'''
adress = re.findall('.*(?=Phone)',text,re.DOTALL)[0]
print(adress)

输出

1040 S. Vintage Ave.
Building A Ontario, CA 91761
United States

注意，如果text不包含Phone子串，会导致错误。注意 re.DOTALL 标志，因此 . 也匹配换行符 (/n)，没有该标志输出将是 Unites States.

Answer 4

我希望这能奏效。

在 python 2.7

上测试

string = r"1040 S. Vintage Ave. Building A Ontario, CA 91761 United States Phone: 9099725134 Fax: 9099065401 Web: http://www.aareninc.com"

f = re.split(' (?=Phone:)', string)

print 'String before Phone:', f[0]

Answer 5

考虑到你身边有这样的东西

st = '1040 S. Vintage Ave.Building A Ontario, CA 91761 United States Phone: 9099725134 Fax: 9099065401 Web: http://www.aareninc.com'

v = st.split("Phone"))
print(v[0])

这适用于 Python3。如果您使用 Python2，则可以避免在 print 语句中使用括号。

Answer 6

使用正则表达式：

import re
re.split('(Phone)', strng)
['1040 S. Vintage Ave. Building A Ontario, CA 91761 United States ',
'Phone',
': 9099725134 Fax: 9099065401 Web: http://www.aareninc.com']

Answer 7

假设您的字符串定义为：

contacts = """1040 S. Vintage Ave.
Building A Ontario, CA 91761
United States Phone: 9099725134 Fax: 9099065401

Web: http://www.aareninc.com"""

contacts.split('Phone')[0] 或 contacts.partition('Phone')[0] 必须给你相同的结果。

Answer 8

您最初可以拆分以获得 "Phone" 两边的字符串列表。然后你想使用 strip 删除前导和尾随的 white-space.

contacts.split('Phone')[0].strip()

这有效。

Answer 9

您可以使用 re.search():

import re

adress = re.search(r'^(.+?)\sPhone', s, flags=re.MULTILINE | re.DOTALL)
print(adress.group(1))

# 1040 S. Vintage Ave.
# Building A Ontario, CA 91761
# United States

如何从字符串中的特定单词开始提取文本？

How to extract text starting from a specific word in a string?

python

regex

strip