python 脚本和 TOR 代理服务器中的参数错误

Question

我是 Python 的菜鸟。 我的老板指示我运行这个 python 脚本与 TOR 代理服务器运行ning。 他告诉我以这种方式传递这个参数： python DownloadYP.py /Users/myfolder/ 日本 http://www.jpyellow.com/company 1 222299

他在 MAC 上配置了它。我正在使用 windows。所以我的参数是这样的： python DownloadYP.py C:\rrrb 日本 http://www.jpyellow.com/company 1 222299

但我收到错误消息：

> Traceback (most recent call last):
File "C:\Users\USER\yp1\code\DownloadYP.py", line 92, in <module>
WebPage(path, country, url, lowerlimit,upperlimit)
File "C:\Users\USER\yp1\code\DownloadYP.py", line 23, in __init__
fout = open(self.dir+"/limit.txt",'wb')
IOError: [Errno 2] No such file or directory: 'C:\rrr/japan/limit.txt'

我的代码是：

DownloadYP.py

def __init__(self, path, country, url, lower=0,upper=9999):
    self.dir = str(path)+"/"+ str(country)
    self.url = url
    try:
      fin = open(self.dir+"/limit.txt",'r')
        limit = fin.readline()
        limits = str(limit).split(",")
        lower = int(limits[0])
        upper = int(limits[1])
        fin.close()
    except:

这是第 23 行

          **fout = open(self.dir+"/limit.txt",'wb')**
        limits = str(lower)+","+str(upper)
        fout.write(limits)
        fout.close()  
    self.process_instances(lower,upper)


    def process_instances(self,lower,upper):
            try:
                os.stat(self.dir)
            except:
                os.mkdir(self.dir)
            for count in range(lower,upper+1):
                if count == upper:
                    print "all downloaded, quitting the app!!"
                    break
                targetURL = self.url+"/"+str(count)
                print "Downloading :" + targetURL
                req = urllib2.Request(targetURL)
                try:
                    response = urllib2.urlopen(req)
                    the_page = response.read()  
                    if the_page.find("Your IP suspended")>=0:
                        print "The IP is suspended"
                        fout = open(self.dir+"/limit.txt",'wb')
                        limits = str(count)+","+str(upper)
                        fout.write(limits)
                        fout.close()  
                        break
                    if the_page.find("Too many requests")>=0:
                        print "Too many requests"
                        print "Renew IP...."
                        fout = open(self.dir+"/limit.txt",'wb')
                        limits = str(count)+","+str(upper)
                        fout.write(limits)
                        fout.close()
                        break


#subprocess.Popen("sudo  /Users/myfolder/Documents/workspace/DataMining/ip_renew.py", shell=True)
                    if the_page.find("404 error")>=0:
                        print "the page not exist"
                        continue
                    self.saveHTML(count, the_page)
                except:
                    print "The URL cannot be fetched"
                    pass
                    #raise

    def saveHTML(self,count, content):
        fout = open(self.dir+"/"+str(count)+".html",'wb')
        fout.write(content)
        fout.close()

    if __name__ == '__main__':

    if len(sys.argv) !=6:
        print "cannot process!!! Five Parameters are required to run the          process."
        print "Parameter 1 should be the path where to save the data, eg, /Users/myfolder/data/"
        print "Parameter 2 should be the name of the country for which data is collected, eg, japan"
        print "Parameter 3 should be the URL from which the data to collect, eg, http://www.jpyellow.com/company/"
        print "Parameter 4 should be the lower limit of the company id, eg, 11 "
        print "Parameter 5 should be the upper limit of the company id, eg, 1000 "
        print "The output will be saved as the HTML file for each company in the target folder's country"
        exit()
    else:
        path = str(sys.argv[1])
        country = str(sys.argv[2])
        url = str(sys.argv[3])
        lowerlimit = int(sys.argv[4])
        upperlimit = int(sys.argv[5])

这是第 92 行

        **WebPage(path, country, url, lowerlimit,upperlimit)**

我已将 TorVPN 下载到运行代理服务器.. 并运行正在运行此脚本。那么为什么会发生错误？这是可以下载网站的脚本。

Answer 1

问题在DownloadYP.py-

您没有文件 - C:\rrr\japan\limit.txt

我建议在上述目录中创建一个具有该名称的虚拟文件，然后再次尝试运行该脚本。

此外，附带说明 - 您正在混合来自 unix 的 os 路径分隔符并在 windows 中使用它，而您需要使用 os.path.join() 函数以便 python 能够处理 os 路径分隔符 accross 平台。代码就像 -

import os
self.dir = os.path.join(str(path),str(country))

另外，打开文件时需要使用os.path.join，而不是直接指定路径分隔符-

fin = open(os.path.join(self.dir,"limit.txt"),'r')

python 脚本和 TOR 代理服务器中的参数错误

parameter error in python script & TOR proxy server

python

sockets

subprocess

tor

proxy-server

这是第 23 行

这是第 92 行