Wget 以符号 (&) 结束并跳过之后的所有内容

Question

Wget 跳过与号 (&) 之后的所有内容。我尝试转义 &，但它不起作用

代码：

import threading
import urllib.request
import os
import re
import time
import json
import sys

def take():
    a = ["https://itunes.apple.com/us/genre/ios-games-action/id7001?mt=8&letter=A","https://itunes.apple.com/us/genre/ios-games-action/id7001?mt=8&letter=B"]
    for url_file in a:
        url_file = re.sub(r'\&','\&',url_file)
        data = os.popen('wget -qO- %s'% url_file).read()
        if re.search(r'(?mis)paginate\-more\">next',data):
            print ("hi")

take()

这应该打印 "hi"

但是由于 Wget 跳过了 & 之后的所有内容，它会抛出空白输出。

我怎样才能完成这项工作？

Answer 1

您面临的问题是 & 在 shell 中具有特殊含义（并且您通过 popen 调用 shell）：即在 & 号左侧设置作业背景。

要避免这种情况，您必须转义特殊字符，或者在 URL:

周围使用引号

 data = os.popen('wget -qO- "%s"' % url_file).read()

Answer 2

你的代码对我来说是有效的。我在 Linux.

上使用 Python 2.6.x

输出为

hi
hi

我看到您在源代码中转义了“&”。

Wget 以符号 (&) 结束并跳过之后的所有内容

Wget ends at ampersand(&) and skips eveything after that

python

web-scraping

web-crawler

wget