python whois 库 return 没有活动域名
python whois library return nothing for active domain names
我正在尝试使用 python whois 库来收集一些网站的 whois 记录。
问题是我没有得到某些网站的任何信息,例如 nih.gov 这是一个活跃的域名!
w = whois.whois("nih.gov")
print w
{u'updated_date': None, u'status': u'ACTIVE', u'name': None, u'dnssec': None, u'city': None, u'expiration_date': None, u'zipcode': None, u'domain_name': u'NIH.GOV', u'country': None, u'whois_server': None, u'state': None, u'registrar': None, u'referral_url': None, u'address': None, u'name_servers': None, u'org': None, u'creation_date': None, u'emails': None}
我不明白问题是什么,我应该使用哪个库或如何使用它来涵盖所有情况?
这里有一些代码可以完成这项工作。
import sys
import socket
from datetime import datetime as dt
import time
def whois(ip):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("whois.arin.net", 43))
s.send(('n ' + ip + '\r\n').encode())
response = b""
# setting time limit in secondsmd
startTime = time.mktime(dt.now().timetuple())
timeLimit = 3
while True:
elapsedTime = time.mktime(dt.now().timetuple()) - startTime
data = s.recv(4096)
response += data
if (not data) or (elapsedTime >= timeLimit):
break
s.close()
print(response.decode())
def main():
domain = sys.argv[1];
ip = socket.gethostbyname(domain);
whois(ip)
main()
例如:
c:\Temp>py test.py www.google.com
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#
#
# The following results may also be obtained via:
# https://whois.arin.net/rest/nets;q=216.58.213.196?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2
#
NetRange: 216.58.192.0 - 216.58.223.255
CIDR: 216.58.192.0/19
NetName: GOOGLE
NetHandle: NET-216-58-192-0-1
Parent: NET216 (NET-216-0-0-0-0)
NetType: Direct Allocation
OriginAS: AS15169
Organization: Google LLC (GOGL)
RegDate: 2012-01-27
Updated: 2012-01-27
Ref: https://whois.arin.net/rest/net/NET-216-58-192-0-1
OrgName: Google LLC
OrgId: GOGL
Address: 1600 Amphitheatre Parkway
City: Mountain View
StateProv: CA
PostalCode: 94043
Country: US
RegDate: 2000-03-30
Updated: 2017-12-21
Ref: https://whois.arin.net/rest/org/GOGL
OrgAbuseHandle: ABUSE5250-ARIN
OrgAbuseName: Abuse
OrgAbusePhone: +1-650-253-0000
OrgAbuseEmail: network-abuse@google.com
OrgAbuseRef: https://whois.arin.net/rest/poc/ABUSE5250-ARIN
OrgTechHandle: ZG39-ARIN
OrgTechName: Google LLC
OrgTechPhone: +1-650-253-0000
OrgTechEmail: arin-contact@google.com
OrgTechRef: https://whois.arin.net/rest/poc/ZG39-ARIN
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#
特别是 www.nih.gov 我们得到:
c:\Temp>py test.py www.nih.gov
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#
#
# The following results may also be obtained via:
# https://whois.arin.net/rest/nets;q=23.21.241.1?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2
#
NetRange: 23.20.0.0 - 23.23.255.255
CIDR: 23.20.0.0/14
NetName: AMAZON-EC2-USEAST-10
NetHandle: NET-23-20-0-0-1
Parent: NET23 (NET-23-0-0-0-0)
NetType: Direct Allocation
OriginAS: AS16509
Organization: Amazon.com, Inc. (AMAZO-4)
RegDate: 2011-09-19
Updated: 2014-09-03
Comment: The activity you have detected originates from a dynamic hosting environment.
Comment: For fastest response, please submit abuse reports at http://aws-portal.amazon.com/gp/aws/html-forms-controller/contactus/AWSAbuse
Comment: For more information regarding EC2 see:
Comment: http://ec2.amazonaws.com/
Comment: All reports MUST include:
Comment: * src IP
Comment: * dest IP (your IP)
Comment: * dest port
Comment: * Accurate date/timestamp and timezone of activity
Comment: * Intensity/frequency (short log extracts)
Comment: * Your contact details (phone and email) Without these we will be unable to identify the correct owner of the IP address at that point in time.
Ref: https://whois.arin.net/rest/net/NET-23-20-0-0-1
OrgName: Amazon.com, Inc.
OrgId: AMAZO-4
Address: Amazon Web Services, Inc.
Address: P.O. Box 81226
City: Seattle
StateProv: WA
PostalCode: 98108-1226
Country: US
RegDate: 2005-09-29
Updated: 2017-01-28
Comment: For details of this service please see
Comment: http://ec2.amazonaws.com/
Ref: https://whois.arin.net/rest/org/AMAZO-4
OrgAbuseHandle: AEA8-ARIN
OrgAbuseName: Amazon EC2 Abuse
OrgAbusePhone: +1-206-266-4064
OrgAbuseEmail: abuse@amazonaws.com
OrgAbuseRef: https://whois.arin.net/rest/poc/AEA8-ARIN
OrgTechHandle: ANO24-ARIN
OrgTechName: Amazon EC2 Network Operations
OrgTechPhone: +1-206-266-4064
OrgTechEmail: amzn-noc-contact@amazon.com
OrgTechRef: https://whois.arin.net/rest/poc/ANO24-ARIN
OrgNOCHandle: AANO1-ARIN
OrgNOCName: Amazon AWS Network Operations
OrgNOCPhone: +1-206-266-4064
OrgNOCEmail: amzn-noc-contact@amazon.com
OrgNOCRef: https://whois.arin.net/rest/poc/AANO1-ARIN
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#
不同选项
还有一个选择。
这段代码在脚本的文件夹中创建了一个文件,其中包含来自不同服务的 whois 请求的 HTML。你可以修改它以满足你的需要,我只是写了基础知识。
import urllib.request
import tempfile
import io
from bs4 import BeautifulSoup
import sys
def writeFile(text):
with io.open('whoisData.txt', "w", encoding="utf-8") as f:
f.write(text)
f.close()
def readHTML(domain):
url = 'https://www.whois.com/whois/' + domain
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html)
# kill all script and style elements
for script in soup(["script", "style"]):
script.extract() # rip it out
# get text
text = soup.get_text()
# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
# drop blank lines
text = '\n'.join(chunk for chunk in chunks if chunk)
writeFile(text)
def main():
domain = sys.argv[1]
readHTML(domain)
main()
参考了 here(在解析 HTMLs 时)。
我正在尝试使用 python whois 库来收集一些网站的 whois 记录。
问题是我没有得到某些网站的任何信息,例如 nih.gov 这是一个活跃的域名!
w = whois.whois("nih.gov")
print w
{u'updated_date': None, u'status': u'ACTIVE', u'name': None, u'dnssec': None, u'city': None, u'expiration_date': None, u'zipcode': None, u'domain_name': u'NIH.GOV', u'country': None, u'whois_server': None, u'state': None, u'registrar': None, u'referral_url': None, u'address': None, u'name_servers': None, u'org': None, u'creation_date': None, u'emails': None}
我不明白问题是什么,我应该使用哪个库或如何使用它来涵盖所有情况?
这里有一些代码可以完成这项工作。
import sys
import socket
from datetime import datetime as dt
import time
def whois(ip):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("whois.arin.net", 43))
s.send(('n ' + ip + '\r\n').encode())
response = b""
# setting time limit in secondsmd
startTime = time.mktime(dt.now().timetuple())
timeLimit = 3
while True:
elapsedTime = time.mktime(dt.now().timetuple()) - startTime
data = s.recv(4096)
response += data
if (not data) or (elapsedTime >= timeLimit):
break
s.close()
print(response.decode())
def main():
domain = sys.argv[1];
ip = socket.gethostbyname(domain);
whois(ip)
main()
例如:
c:\Temp>py test.py www.google.com
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#
#
# The following results may also be obtained via:
# https://whois.arin.net/rest/nets;q=216.58.213.196?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2
#
NetRange: 216.58.192.0 - 216.58.223.255
CIDR: 216.58.192.0/19
NetName: GOOGLE
NetHandle: NET-216-58-192-0-1
Parent: NET216 (NET-216-0-0-0-0)
NetType: Direct Allocation
OriginAS: AS15169
Organization: Google LLC (GOGL)
RegDate: 2012-01-27
Updated: 2012-01-27
Ref: https://whois.arin.net/rest/net/NET-216-58-192-0-1
OrgName: Google LLC
OrgId: GOGL
Address: 1600 Amphitheatre Parkway
City: Mountain View
StateProv: CA
PostalCode: 94043
Country: US
RegDate: 2000-03-30
Updated: 2017-12-21
Ref: https://whois.arin.net/rest/org/GOGL
OrgAbuseHandle: ABUSE5250-ARIN
OrgAbuseName: Abuse
OrgAbusePhone: +1-650-253-0000
OrgAbuseEmail: network-abuse@google.com
OrgAbuseRef: https://whois.arin.net/rest/poc/ABUSE5250-ARIN
OrgTechHandle: ZG39-ARIN
OrgTechName: Google LLC
OrgTechPhone: +1-650-253-0000
OrgTechEmail: arin-contact@google.com
OrgTechRef: https://whois.arin.net/rest/poc/ZG39-ARIN
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#
特别是 www.nih.gov 我们得到:
c:\Temp>py test.py www.nih.gov
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#
#
# The following results may also be obtained via:
# https://whois.arin.net/rest/nets;q=23.21.241.1?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2
#
NetRange: 23.20.0.0 - 23.23.255.255
CIDR: 23.20.0.0/14
NetName: AMAZON-EC2-USEAST-10
NetHandle: NET-23-20-0-0-1
Parent: NET23 (NET-23-0-0-0-0)
NetType: Direct Allocation
OriginAS: AS16509
Organization: Amazon.com, Inc. (AMAZO-4)
RegDate: 2011-09-19
Updated: 2014-09-03
Comment: The activity you have detected originates from a dynamic hosting environment.
Comment: For fastest response, please submit abuse reports at http://aws-portal.amazon.com/gp/aws/html-forms-controller/contactus/AWSAbuse
Comment: For more information regarding EC2 see:
Comment: http://ec2.amazonaws.com/
Comment: All reports MUST include:
Comment: * src IP
Comment: * dest IP (your IP)
Comment: * dest port
Comment: * Accurate date/timestamp and timezone of activity
Comment: * Intensity/frequency (short log extracts)
Comment: * Your contact details (phone and email) Without these we will be unable to identify the correct owner of the IP address at that point in time.
Ref: https://whois.arin.net/rest/net/NET-23-20-0-0-1
OrgName: Amazon.com, Inc.
OrgId: AMAZO-4
Address: Amazon Web Services, Inc.
Address: P.O. Box 81226
City: Seattle
StateProv: WA
PostalCode: 98108-1226
Country: US
RegDate: 2005-09-29
Updated: 2017-01-28
Comment: For details of this service please see
Comment: http://ec2.amazonaws.com/
Ref: https://whois.arin.net/rest/org/AMAZO-4
OrgAbuseHandle: AEA8-ARIN
OrgAbuseName: Amazon EC2 Abuse
OrgAbusePhone: +1-206-266-4064
OrgAbuseEmail: abuse@amazonaws.com
OrgAbuseRef: https://whois.arin.net/rest/poc/AEA8-ARIN
OrgTechHandle: ANO24-ARIN
OrgTechName: Amazon EC2 Network Operations
OrgTechPhone: +1-206-266-4064
OrgTechEmail: amzn-noc-contact@amazon.com
OrgTechRef: https://whois.arin.net/rest/poc/ANO24-ARIN
OrgNOCHandle: AANO1-ARIN
OrgNOCName: Amazon AWS Network Operations
OrgNOCPhone: +1-206-266-4064
OrgNOCEmail: amzn-noc-contact@amazon.com
OrgNOCRef: https://whois.arin.net/rest/poc/AANO1-ARIN
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#
不同选项
还有一个选择。
这段代码在脚本的文件夹中创建了一个文件,其中包含来自不同服务的 whois 请求的 HTML。你可以修改它以满足你的需要,我只是写了基础知识。
import urllib.request
import tempfile
import io
from bs4 import BeautifulSoup
import sys
def writeFile(text):
with io.open('whoisData.txt', "w", encoding="utf-8") as f:
f.write(text)
f.close()
def readHTML(domain):
url = 'https://www.whois.com/whois/' + domain
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html)
# kill all script and style elements
for script in soup(["script", "style"]):
script.extract() # rip it out
# get text
text = soup.get_text()
# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
# drop blank lines
text = '\n'.join(chunk for chunk in chunks if chunk)
writeFile(text)
def main():
domain = sys.argv[1]
readHTML(domain)
main()
参考了 here(在解析 HTMLs 时)。