使用正则表达式查找字符串模式并将结果附加到列表中
find string pattern using regex and append the result into a list
我是一个菜鸟,在 Python 上使用 re
库。我正在做一个网络抓取,我想匹配一些字符串模式并将值附加到列表中。例如:
parking = []
rooms = []
toilets = []
attribute = soup.find('ul',{'class':'specs-list'}).find_all('li')
for a in attribute:
print(a.text)
索引为0的输出迭代a
Metters
50 m�
Rooms
2
Toilets
1
索引为 1 的输出迭代 a
Metters
50 m�
parking
1
spends
340
例如,我想匹配标题的名称,如果存在于 A 值上,我想将结果附加到每个列表上
伪代码:
for a in attribute:
if a contains "Rooms":
rooms.append(a)
if a contains "Parking":
parking.append(a)
if a contains "toilets":
parking.append(a)
if a not contains strings above:
rooms.append(nan)
parking.append(nan)
rooms.append(nan)
我使用BeautifulSoup创建网页抓取结果属性值如下:
索引 0 的属性变量输出:
[<li class="specs-item">
<strong>Metters</strong>
<span>50 m�</span>
</li>,<li class="specs-item">
<strong>Rooms</strong>
<span>2</span>
</li>,<li class="specs-item">
<strong>Toilets</strong>
<span>1</span>
</li>,<li class="specs-item">
<strong>Spends</strong>
<span>340</span></li>]
一个属性有一个长度为0f的5个值,每个值的代码都和上面类似,但是标题和值不同,有的包含parking,rooms,toiletes,有的只有toilets和rooms,等等。
这对你有帮助:
from bs4 import BeautifulSoup
import requests
parking = []
rooms = []
toilets = []
html = requests.get('website url').text
soup = BeautifulSoup(html,'html.parser')
attribute = soup.find_all('li',{'class':'specs-item'})
for a in attribute:
heading = a.strong.text
span = a.span.text
if heading == "Parking":
parking.append(span)
elif heading == "Rooms":
rooms.append(span)
elif heading == "Toilets":
toilets.append(span)
print("Parking =" , parking)
print("Rooms =", rooms)
print("Toilets =", toilets)
u 提供的 li
值的输出:
Parking = []
Rooms = ['2']
Toilets = ['1']
编辑:
虽然这可行,但我觉得拥有这么多 lists
并不是一个好方法。相反,你可以使用 dictionary
。这就是你如何使用 dictionary
:
实现相同的输出
details_dict = {'Parking':[],
'Rooms':[],
'Toilets':[]}
for a in attribute:
heading = a.strong.text
span = a.span.text
if heading == "Parking" or heading == "Rooms" or heading == "Toilets":
details_dict[heading].append(span)
print(details_dict)
输出:
{'Parking': [], 'Rooms': ['2'], 'Toilets': ['1']}
我觉得这是一个更好的方法。但这完全取决于你。选择最适合您任务的那个。
我是一个菜鸟,在 Python 上使用 re
库。我正在做一个网络抓取,我想匹配一些字符串模式并将值附加到列表中。例如:
parking = []
rooms = []
toilets = []
attribute = soup.find('ul',{'class':'specs-list'}).find_all('li')
for a in attribute:
print(a.text)
索引为0的输出迭代a
Metters
50 m�
Rooms
2
Toilets
1
索引为 1 的输出迭代 a
Metters
50 m�
parking
1
spends
340
例如,我想匹配标题的名称,如果存在于 A 值上,我想将结果附加到每个列表上
伪代码:
for a in attribute:
if a contains "Rooms":
rooms.append(a)
if a contains "Parking":
parking.append(a)
if a contains "toilets":
parking.append(a)
if a not contains strings above:
rooms.append(nan)
parking.append(nan)
rooms.append(nan)
我使用BeautifulSoup创建网页抓取结果属性值如下:
索引 0 的属性变量输出:
[<li class="specs-item">
<strong>Metters</strong>
<span>50 m�</span>
</li>,<li class="specs-item">
<strong>Rooms</strong>
<span>2</span>
</li>,<li class="specs-item">
<strong>Toilets</strong>
<span>1</span>
</li>,<li class="specs-item">
<strong>Spends</strong>
<span>340</span></li>]
一个属性有一个长度为0f的5个值,每个值的代码都和上面类似,但是标题和值不同,有的包含parking,rooms,toiletes,有的只有toilets和rooms,等等。
这对你有帮助:
from bs4 import BeautifulSoup
import requests
parking = []
rooms = []
toilets = []
html = requests.get('website url').text
soup = BeautifulSoup(html,'html.parser')
attribute = soup.find_all('li',{'class':'specs-item'})
for a in attribute:
heading = a.strong.text
span = a.span.text
if heading == "Parking":
parking.append(span)
elif heading == "Rooms":
rooms.append(span)
elif heading == "Toilets":
toilets.append(span)
print("Parking =" , parking)
print("Rooms =", rooms)
print("Toilets =", toilets)
u 提供的 li
值的输出:
Parking = []
Rooms = ['2']
Toilets = ['1']
编辑:
虽然这可行,但我觉得拥有这么多 lists
并不是一个好方法。相反,你可以使用 dictionary
。这就是你如何使用 dictionary
:
details_dict = {'Parking':[],
'Rooms':[],
'Toilets':[]}
for a in attribute:
heading = a.strong.text
span = a.span.text
if heading == "Parking" or heading == "Rooms" or heading == "Toilets":
details_dict[heading].append(span)
print(details_dict)
输出:
{'Parking': [], 'Rooms': ['2'], 'Toilets': ['1']}
我觉得这是一个更好的方法。但这完全取决于你。选择最适合您任务的那个。