无法从 pandas 数据框中的一堆 IP 地址中获取国家名称

Unable to get country name from bunch of IP address in pandas dataframe

我有一个 pandas 数据帧 df_test 由如下 IP 地址组成:

     |  cs-username |   c-ip      |
     +--------------+-------------+
     |-             | 70.80.84.76 |           
     |-             | 70.80.84.76 |
     |-             | 70.80.84.76 |
     |-             | 70.80.84.76 |

我的目标是从每个 IP 地址中获取国家名称,并且我使用了 ip2geotools.So 中的 DbIpCity 我编写了如下代码。

from ip2geotools.databases.noncommercial import DbIpCity

#Your code
df_test['Country'] = df_test.apply(lambda row: DbIpCity.get(row['c-ip'],api_key='free').country, axis=1)

然而,这会导致如下错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-8-3772268ef132> in <module>()
      2 
      3 #Your code
----> 4 df_test['Country'] = df_test.apply(lambda row: DbIpCity.get(row['c-ip'],api_key='free').country, axis=1)

5 frames
/usr/local/lib/python3.7/dist-packages/ip2geotools/databases/noncommercial.py in get(ip_address, api_key, db_path, username, password)
     65         # format data
     66         ip_location.country = content['countryCode']
---> 67         ip_location.region = content['stateProv']
     68         ip_location.city = content['city']
     69 

KeyError: 'stateProv'

代码在下面的colab link(最后一个单元格)中以供参考: https://colab.research.google.com/drive/1zz1LZ2uOAp1YsX0x0CJfvcM21XGkeCO5?usp=sharing

那么我该如何解决这个错误?

谢谢

程序在无法获取有关 IP 地址的任何数据时抛出 KeyError。为避免脚本停止,您可以使用 exception。但是因为 ip2geotools 库有请求限制,我决定改用 geolocation-db : (我使用 for loop 而不是 lambda

import pandas as pd
import numpy as np
import urllib.request
import json

df = pd.read_csv('temp.csv')
countries = []
ips = []

# Get Country info from https://geolocation-db.com
def getCountry(ip):
  with urllib.request.urlopen("https://geolocation-db.com/jsonp/"+ip) as url:
    data = url.read().decode()
    data = data.split("(")[1].strip(")")
    return json.loads(data)['country_name']

for index, row in df.iterrows():
    # Get IP data
    data = row['c-ip']
    if data not in ips:
        print(data)
        ips.append(data)
        #response = DbIpCity.get(row['c-ip'], api_key='free')
        response = getCountry(row['c-ip'])
        if response != None:
            print(response)

            # Add to country list
            countries.append(response)
        
        # If contry is None, add np.nan instead of None
        else:
            print(np.nan)
            countries.append(np.nan)

# Insert all data into a new df
ips = {'ip': ips,
       'country': countries, 
       }

df_ips = pd.DataFrame(ips, columns = ['ip', 'country'])    
print(df_ips)

并且由于您的 CSV 文件太大,请使用过滤器来避免处理重复的 IP。

我在您的日志中发现了这些错误:

ERROR: geoip2 4.1.0 has requirement requests<3.0.0,>=2.24.0, but you'll have requests 2.23.0 which is incompatible.
ERROR: geoip2 4.1.0 has requirement urllib3<2.0.0,>=1.25.2, but you'll have urllib3 1.24.3 which is incompatible.

尝试pip install --upgrade requests urllib3。您可能需要升级它们。

为了避免KeyError:'stateProv',在noncommercial.py:

.../ip2geotools/databases/noncommercial.py

第67行需要注释,插入一行: ip_location.region = ''

为 ip_location.region 创建空字符串。

67      #ip_location.region = content['stateProv']

        ip_location.region = ''

我在 class:

中举了一个 bash shell 实践的例子

$ whois 70.80.84.76|grep 国家:|uniq|cut -d ':' -f 2