为什么 scrapy 在我希望有文本的地方打印 \t\n\n？

Question

我是scrapy的初学者，正在学习中。我一直在解析 this page。并试图从页面上删除地址。

我已经在 scrapy shell 中完成了这个，所以我开始：

scrapy shell https://www.marksandspencer.com/MSStoreDetailsView?storeId=10151&langId=-24&SAPStoreId=6952

效果很好。然后我尝试解析地址：

response.xpath('//li[@class="address"]/text()').extract()

但我的输出如下：

['\n\t\t', '\n\t\t\n\t\t']

为什么我看不到页面上显示的地址：

BELFAST ABBEY CENTRE, 1 Old Glenmount Road Newtonabbey, Newton Abbey, BT36 7DN

我将如何获取这个地址？我感谢任何花时间回复的人。

Answer 1

关于您处理此问题的方式存在一些错误：

当使用scrapy shell时，你必须用""包围url，因为字符[=]可能导致终端将其解释为多个进程15=] 里面 url:
```
scrapy shell "https://www.marksandspencer.com/MSStoreDetailsView?storeId=10151&langId=-24&SAPStoreId=6952"
```
您的 xpath 不正确，因为使用 /text() 您得到的是特定标签的文本，而 li 实际上并不包含您想要的信息。包含该文本的标记位于 li 的子项上，因此您可以使用：
```
response.xpath('//li[@class="address"]//text()').extract()
```
或
```
response.xpath('//li[@class="address"]/p/text()').extract()
```

Why is scrapy printing \t\n\n where I expect there to be text?