python 脚本和 TOR 代理服务器中的参数错误
parameter error in python script & TOR proxy server
我是 Python 的菜鸟。 我的老板指示我 运行 这个 python 脚本与 TOR 代理服务器 运行ning。 他告诉我以这种方式传递这个参数:
python DownloadYP.py /Users/myfolder/ 日本 http://www.jpyellow.com/company 1 222299
他在 MAC 上配置了它。我正在使用 windows。所以我的参数是这样的:
python DownloadYP.py C:\rrrb 日本 http://www.jpyellow.com/company 1 222299
但我收到错误消息:
> Traceback (most recent call last):
File "C:\Users\USER\yp1\code\DownloadYP.py", line 92, in <module>
WebPage(path, country, url, lowerlimit,upperlimit)
File "C:\Users\USER\yp1\code\DownloadYP.py", line 23, in __init__
fout = open(self.dir+"/limit.txt",'wb')
IOError: [Errno 2] No such file or directory: 'C:\rrr/japan/limit.txt'
我的代码是:
DownloadYP.py
def __init__(self, path, country, url, lower=0,upper=9999):
self.dir = str(path)+"/"+ str(country)
self.url = url
try:
fin = open(self.dir+"/limit.txt",'r')
limit = fin.readline()
limits = str(limit).split(",")
lower = int(limits[0])
upper = int(limits[1])
fin.close()
except:
这是第 23 行
**fout = open(self.dir+"/limit.txt",'wb')**
limits = str(lower)+","+str(upper)
fout.write(limits)
fout.close()
self.process_instances(lower,upper)
def process_instances(self,lower,upper):
try:
os.stat(self.dir)
except:
os.mkdir(self.dir)
for count in range(lower,upper+1):
if count == upper:
print "all downloaded, quitting the app!!"
break
targetURL = self.url+"/"+str(count)
print "Downloading :" + targetURL
req = urllib2.Request(targetURL)
try:
response = urllib2.urlopen(req)
the_page = response.read()
if the_page.find("Your IP suspended")>=0:
print "The IP is suspended"
fout = open(self.dir+"/limit.txt",'wb')
limits = str(count)+","+str(upper)
fout.write(limits)
fout.close()
break
if the_page.find("Too many requests")>=0:
print "Too many requests"
print "Renew IP...."
fout = open(self.dir+"/limit.txt",'wb')
limits = str(count)+","+str(upper)
fout.write(limits)
fout.close()
break
#subprocess.Popen("sudo /Users/myfolder/Documents/workspace/DataMining/ip_renew.py", shell=True)
if the_page.find("404 error")>=0:
print "the page not exist"
continue
self.saveHTML(count, the_page)
except:
print "The URL cannot be fetched"
pass
#raise
def saveHTML(self,count, content):
fout = open(self.dir+"/"+str(count)+".html",'wb')
fout.write(content)
fout.close()
if __name__ == '__main__':
if len(sys.argv) !=6:
print "cannot process!!! Five Parameters are required to run the process."
print "Parameter 1 should be the path where to save the data, eg, /Users/myfolder/data/"
print "Parameter 2 should be the name of the country for which data is collected, eg, japan"
print "Parameter 3 should be the URL from which the data to collect, eg, http://www.jpyellow.com/company/"
print "Parameter 4 should be the lower limit of the company id, eg, 11 "
print "Parameter 5 should be the upper limit of the company id, eg, 1000 "
print "The output will be saved as the HTML file for each company in the target folder's country"
exit()
else:
path = str(sys.argv[1])
country = str(sys.argv[2])
url = str(sys.argv[3])
lowerlimit = int(sys.argv[4])
upperlimit = int(sys.argv[5])
这是第 92 行
**WebPage(path, country, url, lowerlimit,upperlimit)**
我已将 TorVPN 下载到 运行 代理服务器.. 并 运行 正在运行此脚本。那么为什么会发生错误?这是可以下载网站的脚本。
问题在DownloadYP.py
-
您没有文件 - C:\rrr\japan\limit.txt
我建议在上述目录中创建一个具有该名称的虚拟文件,然后再次尝试 运行 该脚本。
此外,附带说明 - 您正在混合来自 unix 的 os 路径分隔符并在 windows 中使用它,而您需要使用 os.path.join()
函数以便 python 能够处理 os 路径分隔符 accross 平台。代码就像 -
import os
self.dir = os.path.join(str(path),str(country))
另外,打开文件时需要使用os.path.join
,而不是直接指定路径分隔符-
fin = open(os.path.join(self.dir,"limit.txt"),'r')
我是 Python 的菜鸟。 我的老板指示我 运行 这个 python 脚本与 TOR 代理服务器 运行ning。 他告诉我以这种方式传递这个参数: python DownloadYP.py /Users/myfolder/ 日本 http://www.jpyellow.com/company 1 222299
他在 MAC 上配置了它。我正在使用 windows。所以我的参数是这样的: python DownloadYP.py C:\rrrb 日本 http://www.jpyellow.com/company 1 222299
但我收到错误消息:
> Traceback (most recent call last):
File "C:\Users\USER\yp1\code\DownloadYP.py", line 92, in <module>
WebPage(path, country, url, lowerlimit,upperlimit)
File "C:\Users\USER\yp1\code\DownloadYP.py", line 23, in __init__
fout = open(self.dir+"/limit.txt",'wb')
IOError: [Errno 2] No such file or directory: 'C:\rrr/japan/limit.txt'
我的代码是:
DownloadYP.py
def __init__(self, path, country, url, lower=0,upper=9999): self.dir = str(path)+"/"+ str(country) self.url = url try: fin = open(self.dir+"/limit.txt",'r') limit = fin.readline() limits = str(limit).split(",") lower = int(limits[0]) upper = int(limits[1]) fin.close() except:
这是第 23 行
**fout = open(self.dir+"/limit.txt",'wb')** limits = str(lower)+","+str(upper) fout.write(limits) fout.close() self.process_instances(lower,upper) def process_instances(self,lower,upper): try: os.stat(self.dir) except: os.mkdir(self.dir) for count in range(lower,upper+1): if count == upper: print "all downloaded, quitting the app!!" break targetURL = self.url+"/"+str(count) print "Downloading :" + targetURL req = urllib2.Request(targetURL) try: response = urllib2.urlopen(req) the_page = response.read() if the_page.find("Your IP suspended")>=0: print "The IP is suspended" fout = open(self.dir+"/limit.txt",'wb') limits = str(count)+","+str(upper) fout.write(limits) fout.close() break if the_page.find("Too many requests")>=0: print "Too many requests" print "Renew IP...." fout = open(self.dir+"/limit.txt",'wb') limits = str(count)+","+str(upper) fout.write(limits) fout.close() break #subprocess.Popen("sudo /Users/myfolder/Documents/workspace/DataMining/ip_renew.py", shell=True) if the_page.find("404 error")>=0: print "the page not exist" continue self.saveHTML(count, the_page) except: print "The URL cannot be fetched" pass #raise def saveHTML(self,count, content): fout = open(self.dir+"/"+str(count)+".html",'wb') fout.write(content) fout.close() if __name__ == '__main__': if len(sys.argv) !=6: print "cannot process!!! Five Parameters are required to run the process." print "Parameter 1 should be the path where to save the data, eg, /Users/myfolder/data/" print "Parameter 2 should be the name of the country for which data is collected, eg, japan" print "Parameter 3 should be the URL from which the data to collect, eg, http://www.jpyellow.com/company/" print "Parameter 4 should be the lower limit of the company id, eg, 11 " print "Parameter 5 should be the upper limit of the company id, eg, 1000 " print "The output will be saved as the HTML file for each company in the target folder's country" exit() else: path = str(sys.argv[1]) country = str(sys.argv[2]) url = str(sys.argv[3]) lowerlimit = int(sys.argv[4]) upperlimit = int(sys.argv[5])
这是第 92 行
**WebPage(path, country, url, lowerlimit,upperlimit)**
我已将 TorVPN 下载到 运行 代理服务器.. 并 运行 正在运行此脚本。那么为什么会发生错误?这是可以下载网站的脚本。
问题在DownloadYP.py
-
您没有文件 - C:\rrr\japan\limit.txt
我建议在上述目录中创建一个具有该名称的虚拟文件,然后再次尝试 运行 该脚本。
此外,附带说明 - 您正在混合来自 unix 的 os 路径分隔符并在 windows 中使用它,而您需要使用 os.path.join()
函数以便 python 能够处理 os 路径分隔符 accross 平台。代码就像 -
import os
self.dir = os.path.join(str(path),str(country))
另外,打开文件时需要使用os.path.join
,而不是直接指定路径分隔符-
fin = open(os.path.join(self.dir,"limit.txt"),'r')