不能让脚本在某些地址的特定位置保留 space

Question

我正在尝试从静态 webpage 中抓取所有文件名及其相关地址。除了将 space 保留在某些地址内的特定位置外，我已经创建的脚本几乎可以准确地获取它们。更清楚地说，除其他结果外，该脚本在控制台中打印了以下内容：

RZ000089 1207, 1211, 1215, 1217, 1219 & 1221Carlisle Avenue

而我的预期输出是（注意 Carlisle Avenue 之前的 space）：

RZ000089 1207, 1211, 1215, 1217, 1219 & 1221 Carlisle Avenue

当前方法：

import requests
from bs4 import BeautifulSoup

link = 'https://www.esquimalt.ca/business-development/development-tracker/rezoning-applications'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'
}
with requests.Session() as s:
    s.headers.update(headers)
    res = s.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    for item in soup.select("table.table_two_columns > tbody"):
        file = item.select_one("tr > td:has(strong:-soup-contains('File:'))").get_text(strip=True).replace("File:","").replace(" "," ").strip()
        addr_list = [i.text for i in item.select("tr:nth-of-type(1) > td:nth-of-type(1) > p")]
        for addr in addr_list:
            print(file,addr)

我得到的输出（截断）：

RZ000095 A-904 Admirals Road
RZ000089 1207, 1211, 1215, 1217, 1219 & 1221Carlisle Avenue
RZ000089 512 & 522 Fraser Street
RZ000089 1212, 1216, 1220, 1222, 1224 & 1226Lyall Street
RZ000055 1072 Colville Road
RZ000056 1076 Colville Road

我希望得到的输出（注意 Carlisle Avenue 和 Lyall Street 之前的 space）：

RZ000095 A-904 Admirals Road
RZ000089 1207, 1211, 1215, 1217, 1219 & 1221 Carlisle Avenue
RZ000089 512 & 522 Fraser Street
RZ000089 1212, 1216, 1220, 1222, 1224 & 1226 Lyall Street
RZ000055 1072 Colville Road
RZ000056 1076 Colville Road

Answer 1

而不是 i.text 使用 i.get_text() 和 separator= 参数：

import requests
from bs4 import BeautifulSoup

link = "https://www.esquimalt.ca/business-development/development-tracker/rezoning-applications"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
}
with requests.Session() as s:
    s.headers.update(headers)
    res = s.get(link)
    soup = BeautifulSoup(res.text, "lxml")
    for item in soup.select("table.table_two_columns > tbody"):
        file = (
            item.select_one("tr > td:has(strong:-soup-contains('File:'))")
            .get_text(strip=True)
            .replace("File:", "")
            .replace(" ", " ")
            .strip()
        )
        addr_list = [
            i.get_text(strip=True, separator=" ")
            for i in item.select("tr:nth-of-type(1) > td:nth-of-type(1) > p")
        ]
        for addr in addr_list:
            print(file, addr)

打印：

RZ000095 A-904 Admirals Road
RZ000089 1207, 1211, 1215, 1217, 1219 & 1221 Carlisle Avenue
RZ000089 512 & 522 Fraser Street
RZ000089 1212, 1216, 1220, 1222, 1224 & 1226 Lyall Street
RZ000055 1072 Colville Road
RZ000056 1076 Colville Road
RZ000098 812 Craigflower Road
RZ000083 881 Craigflower Road
RZ000071 820 Dunsmuir Road

...

不能让脚本在某些地址的特定位置保留 space

Can't let a script keep a space in a certain position within some addresses

python

beautifulsoup

web-scraping

python-3.x

python-requests