python whois 库 return 没有活动域名

python whois library return nothing for active domain names

我正在尝试使用 python whois 库来收集一些网站的 whois 记录。

问题是我没有得到某些网站的任何信息,例如 nih.gov 这是一个活跃的域名!

w = whois.whois("nih.gov")
print w
{u'updated_date': None, u'status': u'ACTIVE', u'name': None, u'dnssec': None, u'city': None, u'expiration_date': None, u'zipcode': None, u'domain_name': u'NIH.GOV', u'country': None, u'whois_server': None, u'state': None, u'registrar': None, u'referral_url': None, u'address': None, u'name_servers': None, u'org': None, u'creation_date': None, u'emails': None}

我不明白问题是什么,我应该使用哪个库或如何使用它来涵盖所有情况?

这里有一些代码可以完成这项工作。

import sys
import socket
from datetime import datetime as dt
import time

def whois(ip):

    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect(("whois.arin.net", 43))
    s.send(('n ' + ip + '\r\n').encode())

    response = b""

    # setting time limit in secondsmd
    startTime = time.mktime(dt.now().timetuple())
    timeLimit = 3
    while True:
        elapsedTime = time.mktime(dt.now().timetuple()) - startTime
        data = s.recv(4096)
        response += data
        if (not data) or (elapsedTime >= timeLimit):
            break
    s.close()

    print(response.decode())

def main():
    domain = sys.argv[1];
    ip = socket.gethostbyname(domain);
    whois(ip)

main()

例如:

c:\Temp>py test.py www.google.com

#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#


#
# The following results may also be obtained via:
# https://whois.arin.net/rest/nets;q=216.58.213.196?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2
#

NetRange:       216.58.192.0 - 216.58.223.255
CIDR:           216.58.192.0/19
NetName:        GOOGLE
NetHandle:      NET-216-58-192-0-1
Parent:         NET216 (NET-216-0-0-0-0)
NetType:        Direct Allocation
OriginAS:       AS15169
Organization:   Google LLC (GOGL)
RegDate:        2012-01-27
Updated:        2012-01-27
Ref:            https://whois.arin.net/rest/net/NET-216-58-192-0-1


OrgName:        Google LLC
OrgId:          GOGL
Address:        1600 Amphitheatre Parkway
City:           Mountain View
StateProv:      CA
PostalCode:     94043
Country:        US
RegDate:        2000-03-30
Updated:        2017-12-21
Ref:            https://whois.arin.net/rest/org/GOGL


OrgAbuseHandle: ABUSE5250-ARIN
OrgAbuseName:   Abuse
OrgAbusePhone:  +1-650-253-0000
OrgAbuseEmail:  network-abuse@google.com
OrgAbuseRef:    https://whois.arin.net/rest/poc/ABUSE5250-ARIN

OrgTechHandle: ZG39-ARIN
OrgTechName:   Google LLC
OrgTechPhone:  +1-650-253-0000
OrgTechEmail:  arin-contact@google.com
OrgTechRef:    https://whois.arin.net/rest/poc/ZG39-ARIN


#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#

特别是 www.nih.gov 我们得到:

c:\Temp>py test.py www.nih.gov

#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#


#
# The following results may also be obtained via:
# https://whois.arin.net/rest/nets;q=23.21.241.1?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2
#

NetRange:       23.20.0.0 - 23.23.255.255
CIDR:           23.20.0.0/14
NetName:        AMAZON-EC2-USEAST-10
NetHandle:      NET-23-20-0-0-1
Parent:         NET23 (NET-23-0-0-0-0)
NetType:        Direct Allocation
OriginAS:       AS16509
Organization:   Amazon.com, Inc. (AMAZO-4)
RegDate:        2011-09-19
Updated:        2014-09-03
Comment:        The activity you have detected originates from a dynamic hosting environment.
Comment:        For fastest response, please submit abuse reports at http://aws-portal.amazon.com/gp/aws/html-forms-controller/contactus/AWSAbuse
Comment:        For more information regarding EC2 see:
Comment:        http://ec2.amazonaws.com/
Comment:        All reports MUST include:
Comment:        * src IP
Comment:        * dest IP (your IP)
Comment:        * dest port
Comment:        * Accurate date/timestamp and timezone of activity
Comment:        * Intensity/frequency (short log extracts)
Comment:        * Your contact details (phone and email) Without these we will be unable to identify the correct owner of the IP address at that point in time.
Ref:            https://whois.arin.net/rest/net/NET-23-20-0-0-1


OrgName:        Amazon.com, Inc.
OrgId:          AMAZO-4
Address:        Amazon Web Services, Inc.
Address:        P.O. Box 81226
City:           Seattle
StateProv:      WA
PostalCode:     98108-1226
Country:        US
RegDate:        2005-09-29
Updated:        2017-01-28
Comment:        For details of this service please see
Comment:        http://ec2.amazonaws.com/
Ref:            https://whois.arin.net/rest/org/AMAZO-4


OrgAbuseHandle: AEA8-ARIN
OrgAbuseName:   Amazon EC2 Abuse
OrgAbusePhone:  +1-206-266-4064
OrgAbuseEmail:  abuse@amazonaws.com
OrgAbuseRef:    https://whois.arin.net/rest/poc/AEA8-ARIN

OrgTechHandle: ANO24-ARIN
OrgTechName:   Amazon EC2 Network Operations
OrgTechPhone:  +1-206-266-4064
OrgTechEmail:  amzn-noc-contact@amazon.com
OrgTechRef:    https://whois.arin.net/rest/poc/ANO24-ARIN

OrgNOCHandle: AANO1-ARIN
OrgNOCName:   Amazon AWS Network Operations
OrgNOCPhone:  +1-206-266-4064
OrgNOCEmail:  amzn-noc-contact@amazon.com
OrgNOCRef:    https://whois.arin.net/rest/poc/AANO1-ARIN


#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#

不同选项

还有一个选择。

这段代码在脚本的文件夹中创建了一个文件,其中包含来自不同服务的 whois 请求的 HTML。你可以修改它以满足你的需要,我只是写了基础知识。

import urllib.request
import tempfile
import io
from bs4 import BeautifulSoup
import sys

def writeFile(text):
    with io.open('whoisData.txt', "w", encoding="utf-8") as f:
        f.write(text)
    f.close()

def readHTML(domain):
    url = 'https://www.whois.com/whois/' + domain
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html)

    # kill all script and style elements
    for script in soup(["script", "style"]):
        script.extract()    # rip it out

    # get text
    text = soup.get_text()

    # break into lines and remove leading and trailing space on each
    lines = (line.strip() for line in text.splitlines())
    # break multi-headlines into a line each
    chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
    # drop blank lines
    text = '\n'.join(chunk for chunk in chunks if chunk)
    writeFile(text)

def main():
    domain = sys.argv[1]
    readHTML(domain)

main()

参考了 here(在解析 HTMLs 时)。