python 中 df 中的空值?

Null values in df in python?

我在使用以下代码时遇到问题:

for i in np.arange(37,finaldf.shape[0]):

# We choose to search by category with a 500m radius. radius = 500 LIMIT = 100 category_id = '4bf58dd8d48988d102951735' #ID for Accessory stores

latitude = finaldf['Latitude'][i] longitude = finaldf['Longitude'][i]

# Define the corresponding URL url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, category_id, radius, LIMIT)

# Send the GET Request results = requests.get(url).json()

# Get relevant part of JSON and transform it into a pandas dataframe
# assign relevant part of JSON to venues venues = results['response']['venues']

# tranform venues into a dataframe dataframe = json_normalize(venues) dataframe.head()

# keep only columns that include venue name, and anything that is associated with location filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id'] dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']

    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

print(str(i) + ') The number of shops in '
+finaldf['Neighbourhood'][i] + ' is ' +str(dataframe_filtered.shape[0]) + '\n') N_shop.append(dataframe_filtered.shape[0])

这次迭代让我计算每个社区对应的商店数量,但在执行它时我收到以下错误:


KeyError                                  Traceback (most recent call last)
<ipython-input-109-94d4817fe1e7> in <module>
      6     category_id = '4bf58dd8d48988d102951735' #ID for Accessory stores
      7 
----> 8     latitude = finaldf['Latitude'][i]
      9     longitude = finaldf['Longitude'][i]
     10 

/opt/conda/envs/Python36/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
    866         key = com.apply_if_callable(key, self)
    867         try:
--> 868             result = self.index.get_value(self, key)
    869 
    870             if not is_scalar(result):

/opt/conda/envs/Python36/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
   4372         try:
   4373             return self._engine.get_value(s, k,
-> 4374                                           tz=getattr(series.dtype, 'tz', None))
   4375         except KeyError as e1:
   4376             if len(self) > 0 and (self.holds_integer() or self.is_boolean()):

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 38

最终的 df 由 5 列和 39 行组成,其中包含邮政编码、地区、社区、经度和纬度,因为我将使用该数据在地图上定位它们。我已经搜索了空值,或者有其他类型的格式,但我没有找到任何。怎么了?因为据我了解,有一行(第 38 行)是导致错误的那一行。感谢您的帮助。

如果不知道你的数据框的形式,这个问题很难回答,但我猜你的索引包含整数,但不是特定值 38,这可能是早期过滤的结果。 Pandas 可能将 38 解释为潜在的 标签 ,而不是整数索引。

来自 pandas 索引文档:

.ix offers a lot of magic on the inference of what the user wants to do. To wit, .ix can decide to index positionally OR via labels depending on the data type of the index. This has caused quite a bit of user confusion over the years.

您的 for 循环建议您要遍历行,因此您可以将其更改为使用 .iloc:

for i in np.arange(37, finaldf.shape[0]):
    latitude = finaldf['Latitude'].iloc[i]    # Use .iloc[i] 
    longitude = finaldf['Longitude'].iloc[i]

如果你想以巧妙的方式重写它,你可以尝试:

for lat, long in final_df[['Latitude', 'Longitude']].iloc[37:].iterrows():
    # Use lat, long   
    ...

这依赖于 Python 的自动解包来遍历每一行的系列。