无法从 pandas 数据框中的一堆 IP 地址中获取国家名称
Unable to get country name from bunch of IP address in pandas dataframe
我有一个 pandas 数据帧 df_test
由如下 IP 地址组成:
| cs-username | c-ip |
+--------------+-------------+
|- | 70.80.84.76 |
|- | 70.80.84.76 |
|- | 70.80.84.76 |
|- | 70.80.84.76 |
我的目标是从每个 IP 地址中获取国家名称,并且我使用了 ip2geotools.So 中的 DbIpCity 我编写了如下代码。
from ip2geotools.databases.noncommercial import DbIpCity
#Your code
df_test['Country'] = df_test.apply(lambda row: DbIpCity.get(row['c-ip'],api_key='free').country, axis=1)
然而,这会导致如下错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-8-3772268ef132> in <module>()
2
3 #Your code
----> 4 df_test['Country'] = df_test.apply(lambda row: DbIpCity.get(row['c-ip'],api_key='free').country, axis=1)
5 frames
/usr/local/lib/python3.7/dist-packages/ip2geotools/databases/noncommercial.py in get(ip_address, api_key, db_path, username, password)
65 # format data
66 ip_location.country = content['countryCode']
---> 67 ip_location.region = content['stateProv']
68 ip_location.city = content['city']
69
KeyError: 'stateProv'
代码在下面的colab link(最后一个单元格)中以供参考:
https://colab.research.google.com/drive/1zz1LZ2uOAp1YsX0x0CJfvcM21XGkeCO5?usp=sharing
那么我该如何解决这个错误?
谢谢
程序在无法获取有关 IP 地址的任何数据时抛出 KeyError
。为避免脚本停止,您可以使用 exception
。但是因为 ip2geotools
库有请求限制,我决定改用 geolocation-db :
(我使用 for loop
而不是 lambda
)
import pandas as pd
import numpy as np
import urllib.request
import json
df = pd.read_csv('temp.csv')
countries = []
ips = []
# Get Country info from https://geolocation-db.com
def getCountry(ip):
with urllib.request.urlopen("https://geolocation-db.com/jsonp/"+ip) as url:
data = url.read().decode()
data = data.split("(")[1].strip(")")
return json.loads(data)['country_name']
for index, row in df.iterrows():
# Get IP data
data = row['c-ip']
if data not in ips:
print(data)
ips.append(data)
#response = DbIpCity.get(row['c-ip'], api_key='free')
response = getCountry(row['c-ip'])
if response != None:
print(response)
# Add to country list
countries.append(response)
# If contry is None, add np.nan instead of None
else:
print(np.nan)
countries.append(np.nan)
# Insert all data into a new df
ips = {'ip': ips,
'country': countries,
}
df_ips = pd.DataFrame(ips, columns = ['ip', 'country'])
print(df_ips)
并且由于您的 CSV 文件太大,请使用过滤器来避免处理重复的 IP。
我在您的日志中发现了这些错误:
ERROR: geoip2 4.1.0 has requirement requests<3.0.0,>=2.24.0, but you'll have requests 2.23.0 which is incompatible.
ERROR: geoip2 4.1.0 has requirement urllib3<2.0.0,>=1.25.2, but you'll have urllib3 1.24.3 which is incompatible.
尝试pip install --upgrade requests urllib3
。您可能需要升级它们。
为了避免KeyError:'stateProv'
,在noncommercial.py:
.../ip2geotools/databases/noncommercial.py
第67行需要注释,插入一行:
ip_location.region = ''
为 ip_location.region 创建空字符串。
67 #ip_location.region = content['stateProv']
ip_location.region = ''
我在 class:
中举了一个 bash shell 实践的例子
$ whois 70.80.84.76|grep 国家:|uniq|cut -d ':' -f 2
我有一个 pandas 数据帧 df_test
由如下 IP 地址组成:
| cs-username | c-ip |
+--------------+-------------+
|- | 70.80.84.76 |
|- | 70.80.84.76 |
|- | 70.80.84.76 |
|- | 70.80.84.76 |
我的目标是从每个 IP 地址中获取国家名称,并且我使用了 ip2geotools.So 中的 DbIpCity 我编写了如下代码。
from ip2geotools.databases.noncommercial import DbIpCity
#Your code
df_test['Country'] = df_test.apply(lambda row: DbIpCity.get(row['c-ip'],api_key='free').country, axis=1)
然而,这会导致如下错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-8-3772268ef132> in <module>()
2
3 #Your code
----> 4 df_test['Country'] = df_test.apply(lambda row: DbIpCity.get(row['c-ip'],api_key='free').country, axis=1)
5 frames
/usr/local/lib/python3.7/dist-packages/ip2geotools/databases/noncommercial.py in get(ip_address, api_key, db_path, username, password)
65 # format data
66 ip_location.country = content['countryCode']
---> 67 ip_location.region = content['stateProv']
68 ip_location.city = content['city']
69
KeyError: 'stateProv'
代码在下面的colab link(最后一个单元格)中以供参考: https://colab.research.google.com/drive/1zz1LZ2uOAp1YsX0x0CJfvcM21XGkeCO5?usp=sharing
那么我该如何解决这个错误?
谢谢
程序在无法获取有关 IP 地址的任何数据时抛出 KeyError
。为避免脚本停止,您可以使用 exception
。但是因为 ip2geotools
库有请求限制,我决定改用 geolocation-db :
(我使用 for loop
而不是 lambda
)
import pandas as pd
import numpy as np
import urllib.request
import json
df = pd.read_csv('temp.csv')
countries = []
ips = []
# Get Country info from https://geolocation-db.com
def getCountry(ip):
with urllib.request.urlopen("https://geolocation-db.com/jsonp/"+ip) as url:
data = url.read().decode()
data = data.split("(")[1].strip(")")
return json.loads(data)['country_name']
for index, row in df.iterrows():
# Get IP data
data = row['c-ip']
if data not in ips:
print(data)
ips.append(data)
#response = DbIpCity.get(row['c-ip'], api_key='free')
response = getCountry(row['c-ip'])
if response != None:
print(response)
# Add to country list
countries.append(response)
# If contry is None, add np.nan instead of None
else:
print(np.nan)
countries.append(np.nan)
# Insert all data into a new df
ips = {'ip': ips,
'country': countries,
}
df_ips = pd.DataFrame(ips, columns = ['ip', 'country'])
print(df_ips)
并且由于您的 CSV 文件太大,请使用过滤器来避免处理重复的 IP。
我在您的日志中发现了这些错误:
ERROR: geoip2 4.1.0 has requirement requests<3.0.0,>=2.24.0, but you'll have requests 2.23.0 which is incompatible.
ERROR: geoip2 4.1.0 has requirement urllib3<2.0.0,>=1.25.2, but you'll have urllib3 1.24.3 which is incompatible.
尝试pip install --upgrade requests urllib3
。您可能需要升级它们。
为了避免KeyError:'stateProv'
,在noncommercial.py:
.../ip2geotools/databases/noncommercial.py
第67行需要注释,插入一行:
ip_location.region = ''
为 ip_location.region 创建空字符串。
67 #ip_location.region = content['stateProv']
ip_location.region = ''
我在 class:
中举了一个 bash shell 实践的例子$ whois 70.80.84.76|grep 国家:|uniq|cut -d ':' -f 2