想要从行中删除一些文本

Want to remove some text from the line

我只需要 address 不需要 tel, Fax, Email 当我 运行 他们给了我整个数据的代码但是我只想要这个页面的地址 link https://all.accor.com/hotel/8392/index.de.shtml

from scrapy import Spider
from scrapy.http import Request


class AuthorSpider(Spider):
    name = 'pushpa'
    start_urls = ['https://all.accor.com/de/region/hotels-sachsen-dsn.shtml']
    page_number = 0
    custom_settings = {
        'CONCURRENT_REQUESTS_PER_DOMAIN': 1,
        'DOWNLOAD_DELAY': 1,
        'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'
    }



    def parse(self, response):
        books = response.xpath("//a[@class='Teaser-link']//@href").extract()
        for book in books:
            url = response.urljoin(book)
            yield Request(url, callback=self.parse_book)

    def parse_book(self, response):
        title=response.xpath("//h3//text()").get()
        address = response.xpath("//div[@class='infos__content']//p//text()")[:-3].getall()
        address = [i.strip() for i in address]
        # remove empty strings:
        address = [i for i in address if i]
       
        
        
        yield{
            'name':title,
            'address':address,
        }

您的地址 xpath 选择器有误。您需要使用 infos__content 的 class 来限制您想要来自 div 的第一个子项的文本。将下面的代码用于 parse_book 方法,它应该可以工作。

def parse_book(self, response):
        title=response.xpath("//h3//text()").get()
        address = response.xpath("normalize-space(//div[@class='infos__content']/div[1]/p)").get()
        address = address.replace("\xa0", " ")
        yield{
            'name':title,
            'address':address,
        }