由于年份重复,我在创建字典时遇到了麻烦- Python/Hurricane 项目

Im having troubles creating a dicitionary due to duplicated years- Python/Hurricane Project

我正在做 Coadeacademy 的飓风项目。

请参阅下面的变量和值,exercise.It 是 34 次飓风的样本。请注意,有些年它有 2 次飓风。例如在 1933 年,飓风 'Bahamas' 和 'Cuba II'.

都发生了

飓风的名称

names = ['Cuba I', 'San Felipe II Okeechobee', 'Bahamas', 'Cuba II', 'CubaBrownsville', 'Tampico', 'Labor Day', 'New England', 'Carol', 'Janet', 'Carla', 'Hattie', 'Beulah', 'Camille', 'Edith', 'Anita', 'David', 'Allen', 'Gilbert', 'Hugo', 'Andrew', 'Mitch', 'Isabel', 'Ivan', 'Emily', 'Katrina', 'Rita', 'Wilma', 'Dean', 'Felix', 'Matthew', 'Irma', 'Maria', 'Michael']

几个月的飓风

`months = ['October', 'September', 'September', 'November', 'August', 'September', 'September', 'September', 'September', 'September', 'September', 'October', 'September', 'August', 'September', 'September', 'August', 'August', 'September', 'September', 'August', 'October', 'September', 'September', 'July', 'August', 'September', 'October', 'August', 'September', 'October', 'September', 'September', 'October']`

年的飓风

`years = [1924, 1928, 1932, 1932, 1933, 1933, 1935, 1938, 1953, 1955, 1961, 1961, 1967, 1969, 1971, 1977, 1979, 1980, 1988, 1989, 1992, 1998, 2003, 2004, 2005, 2005, 2005, 2005, 2007, 2007, 2016, 2017, 2017, 2018]`

飓风的最大持续风速 (mph)

 max_sustained_winds = [165, 160, 160, 175, 160, 160, 185, 160, 160, 175, 175, 160, 160, 175, 160, 175, 175, 190, 185, 160, 175, 180, 165, 165, 160, 175, 180, 185, 175, 175, 165, 180, 175, 160]

每次飓风影响的地区

areas_affected = [['Central America', 'Mexico', 'Cuba', 'Florida', 'The Bahamas'], ['Lesser Antilles', 'The Bahamas', 'United States East Coast', 'Atlantic Canada'], ['The Bahamas', 'Northeastern United States'], ['Lesser Antilles', 'Jamaica', 'Cayman Islands', 'Cuba', 'The Bahamas', 'Bermuda'], ['The Bahamas', 'Cuba', 'Florida', 'Texas', 'Tamaulipas'], ['Jamaica', 'Yucatn Peninsula'], ['The Bahamas', 'Florida', 'Georgia', 'The Carolinas', 'Virginia'], ['Southeastern United States', 'Northeastern United States', 'Southwestern Quebec'], ['Bermuda', 'New England', 'Atlantic Canada'], ['Lesser Antilles', 'Central America'], ['Texas', 'Louisiana', 'Midwestern United States'], ['Central America'], ['The Caribbean', 'Mexico', 'Texas'], ['Cuba', 'United States Gulf Coast'], ['The Caribbean', 'Central America', 'Mexico', 'United States Gulf Coast'], ['Mexico'], ['The Caribbean', 'United States East coast'], ['The Caribbean', 'Yucatn Peninsula', 'Mexico', 'South Texas'], ['Jamaica', 'Venezuela', 'Central America', 'Hispaniola', 'Mexico'], ['The Caribbean', 'United States East Coast'], ['The Bahamas', 'Florida', 'United States Gulf Coast'], ['Central America', 'Yucatn Peninsula', 'South Florida'], ['Greater Antilles', 'Bahamas', 'Eastern United States', 'Ontario'], ['The Caribbean', 'Venezuela', 'United States Gulf Coast'], ['Windward Islands', 'Jamaica', 'Mexico', 'Texas'], ['Bahamas', 'United States Gulf Coast'], ['Cuba', 'United States Gulf Coast'], ['Greater Antilles', 'Central America', 'Florida'], ['The Caribbean', 'Central America'], ['Nicaragua', 'Honduras'], ['Antilles', 'Venezuela', 'Colombia', 'United States East Coast', 'Atlantic Canada'], ['Cape Verde', 'The Caribbean', 'British Virgin Islands', 'U.S. Virgin Islands', 'Cuba', 'Florida'], ['Lesser Antilles', 'Virgin Islands', 'Puerto Rico', 'Dominican Republic', 'Turks and Caicos Islands'], ['Central America', 'United States Gulf Coast (especially Florida Panhandle)']]

飓风造成的损失(美元($))

damages = ['Damages not recorded', '100M', 'Damages not recorded', '40M', '27.9M', '5M', 'Damages not recorded', '306M', '2M', '65.8M', '326M', '60.3M', '208M', '1.42B', '25.4M', 'Damages not recorded', '1.54B', '1.24B', '7.1B', '10B', '26.5B', '6.2B', '5.37B', '23.3B', '1.01B', '125B', '12B', '29.4B', '1.76B', '720M', '15.1B', '64.8B', '91.6B', '25.1B']

每次飓风造成的死亡人数

deaths = [90,4000,16,3103,179,184,408,682,5,1023,43,319,688,259,37,11,2068,269,318,107,65,19325,51,124,17,1836,125,87,45,133,603,138,3057,74]

第一题是写一个以name为key的飓风字典函数:

我创建了以下功能,效果很好。

def hurricane_dict(names, month, year, sustained_winds, areas_affected, damage, death):
    hurricane = {}
    for i in range(len(names)):
        hurricane[names[i]] = {"Name": names[i], 
                               "Month": month[i],
                               "Year" : year[i],
                                "Max Sustained Wind": sustained_winds[i],
                                "Areas Affected": areas_affected[i],
                                "Damage": damage[i],
                                "Deaths": death[i]}
    return hurricane
    

hurricane = hurricane_dict(names, months, years,max_sustained_winds, areas_affected, update_damages, deaths)

hurricane['Cuba I']
Output: {'Name': 'Cuba I',
 'Month': 'October',
 'Year': 1924,
 'Max Sustained Wind': 165,
 'Areas Affected': ['Central America',
  'Mexico',
  'Cuba',
  'Florida',
  'The Bahamas'],
 'Damage': 'Damages not recorded',
 'Deaths': 90}

第二题是再写一个飓风字典函数但是用年份作为key:

我本可以按照之前的逻辑构建字典,但是我正在尝试使用现有字典 (hurricane) 作为构建新字典的参数。见下面的编码:

def hurricane_by_year(dictionary):
    for name in names:
        for year in years:
            if year == hurricane[name]['Year']:
                hurricanes_by_year_v2[year] = hurricane[name]
    return  hurricanes_by_year_v2

hurricanes_by_year_v2[1924]
Output: {'Name': 'Cuba I',
 'Month': 'October',
 'Year': 1924,
 'Max Sustained Wind': 165,
 'Areas Affected': ['Central America',
  'Mexico',
  'Cuba',
  'Florida',
  'The Bahamas'],
 'Damage': 'Damages not recorded',
 'Deaths': 90}

乍一看,功能和字典看起来还不错,但是它并没有记录所有数据样本。仅记录这些年的第一场飓风,如果同一年发生了另一场飓风,则不会显示。完整的样本是34个,创建的字典只有26个值。

print(range(len(hurricanes_by_year_v2)))
range(0, 26)

如果有人可以帮助我创建正确的函数并创建一个以 Years 为键并使用先前的字典作为参数的完整字典,我将不胜感激。

提前致谢, 米贾尔

每年可能有很多值,因此您应该使用列表来列出年份中的所有值。

def hurricane_by_year(hurricanes):
    results = {}
    
    for name, data in hurricanes.items():
        year = data['Year']
        
        if year not in results:
            results[year] = []      # create list for all values
            
        results[year].append(data)  # add to list
        
    return results

hurricanes_by_year_v2 = hurricane_by_year(hurricane)

完整的工作代码:

names = ['Cuba I', 'San Felipe II Okeechobee', 'Bahamas', 'Cuba II', 'CubaBrownsville', 'Tampico', 'Labor Day', 'New England', 'Carol', 'Janet', 'Carla', 'Hattie', 'Beulah', 'Camille', 'Edith', 'Anita', 'David', 'Allen', 'Gilbert', 'Hugo', 'Andrew', 'Mitch', 'Isabel', 'Ivan', 'Emily', 'Katrina', 'Rita', 'Wilma', 'Dean', 'Felix', 'Matthew', 'Irma', 'Maria', 'Michael']
months = ['October', 'September', 'September', 'November', 'August', 'September', 'September', 'September', 'September', 'September', 'September', 'October', 'September', 'August', 'September', 'September', 'August', 'August', 'September', 'September', 'August', 'October', 'September', 'September', 'July', 'August', 'September', 'October', 'August', 'September', 'October', 'September', 'September', 'October']
years = [1924, 1928, 1932, 1932, 1933, 1933, 1935, 1938, 1953, 1955, 1961, 1961, 1967, 1969, 1971, 1977, 1979, 1980, 1988, 1989, 1992, 1998, 2003, 2004, 2005, 2005, 2005, 2005, 2007, 2007, 2016, 2017, 2017, 2018]
max_sustained_winds = [165, 160, 160, 175, 160, 160, 185, 160, 160, 175, 175, 160, 160, 175, 160, 175, 175, 190, 185, 160, 175, 180, 165, 165, 160, 175, 180, 185, 175, 175, 165, 180, 175, 160]
areas_affected = [['Central America', 'Mexico', 'Cuba', 'Florida', 'The Bahamas'], ['Lesser Antilles', 'The Bahamas', 'United States East Coast', 'Atlantic Canada'], ['The Bahamas', 'Northeastern United States'], ['Lesser Antilles', 'Jamaica', 'Cayman Islands', 'Cuba', 'The Bahamas', 'Bermuda'], ['The Bahamas', 'Cuba', 'Florida', 'Texas', 'Tamaulipas'], ['Jamaica', 'Yucatn Peninsula'], ['The Bahamas', 'Florida', 'Georgia', 'The Carolinas', 'Virginia'], ['Southeastern United States', 'Northeastern United States', 'Southwestern Quebec'], ['Bermuda', 'New England', 'Atlantic Canada'], ['Lesser Antilles', 'Central America'], ['Texas', 'Louisiana', 'Midwestern United States'], ['Central America'], ['The Caribbean', 'Mexico', 'Texas'], ['Cuba', 'United States Gulf Coast'], ['The Caribbean', 'Central America', 'Mexico', 'United States Gulf Coast'], ['Mexico'], ['The Caribbean', 'United States East coast'], ['The Caribbean', 'Yucatn Peninsula', 'Mexico', 'South Texas'], ['Jamaica', 'Venezuela', 'Central America', 'Hispaniola', 'Mexico'], ['The Caribbean', 'United States East Coast'], ['The Bahamas', 'Florida', 'United States Gulf Coast'], ['Central America', 'Yucatn Peninsula', 'South Florida'], ['Greater Antilles', 'Bahamas', 'Eastern United States', 'Ontario'], ['The Caribbean', 'Venezuela', 'United States Gulf Coast'], ['Windward Islands', 'Jamaica', 'Mexico', 'Texas'], ['Bahamas', 'United States Gulf Coast'], ['Cuba', 'United States Gulf Coast'], ['Greater Antilles', 'Central America', 'Florida'], ['The Caribbean', 'Central America'], ['Nicaragua', 'Honduras'], ['Antilles', 'Venezuela', 'Colombia', 'United States East Coast', 'Atlantic Canada'], ['Cape Verde', 'The Caribbean', 'British Virgin Islands', 'U.S. Virgin Islands', 'Cuba', 'Florida'], ['Lesser Antilles', 'Virgin Islands', 'Puerto Rico', 'Dominican Republic', 'Turks and Caicos Islands'], ['Central America', 'United States Gulf Coast (especially Florida Panhandle)']]
damages = ['Damages not recorded', '100M', 'Damages not recorded', '40M', '27.9M', '5M', 'Damages not recorded', '306M', '2M', '65.8M', '326M', '60.3M', '208M', '1.42B', '25.4M', 'Damages not recorded', '1.54B', '1.24B', '7.1B', '10B', '26.5B', '6.2B', '5.37B', '23.3B', '1.01B', '125B', '12B', '29.4B', '1.76B', '720M', '15.1B', '64.8B', '91.6B', '25.1B']
deaths = [90,4000,16,3103,179,184,408,682,5,1023,43,319,688,259,37,11,2068,269,318,107,65,19325,51,124,17,1836,125,87,45,133,603,138,3057,74]

# ----------------------------------------

def hurricane_dict(names, month, year, sustained_winds, areas_affected, damage, death):
    results = {}
    
    for data in zip(names, month, year, sustained_winds, areas_affected, damage, death):
        results[data[0]] = {
            "Name" : data[0], 
            "Month": data[1],
            "Year" : data[2],
            "Max Sustained Wind": data[3],
            "Areas Affected"    : data[4],
            "Damage": data[5],
            "Deaths": data[6]
        }

    return results

hurricane = hurricane_dict(names, months, years, max_sustained_winds, areas_affected, damages, deaths)

#print(hurricane['Cuba II'])

def hurricane_by_year(hurricanes):
    results = {}
    
    for name, data in hurricane.items():
        year = data['Year']
        
        if year not in results:
            results[year] = []
            
        results[year].append(data)
        
    return results

hurricanes_by_year_v2 = hurricane_by_year(hurricane)

print('\n--- year 1932 ---\n')

for item in hurricanes_by_year_v2[1932]:
    print(item)
    print('---')

结果:

--- year 1932 ---

{'Name': 'Bahamas', 'Month': 'September', 'Year': 1932, 'Max Sustained Wind': 160, 'Areas Affected': ['The Bahamas', 'Northeastern United States'], 'Damage': 'Damages not recorded', 'Deaths': 16}
---
{'Name': 'Cuba II', 'Month': 'November', 'Year': 1932, 'Max Sustained Wind': 175, 'Areas Affected': ['Lesser Antilles', 'Jamaica', 'Cayman Islands', 'Cuba', 'The Bahamas', 'Bermuda'], 'Damage': '40M', 'Deaths': 3103}
---    

编辑:

我认为使用DataFrame会更简单。

  • 它可以简单地select按年、月、名。
  • 可以在<>
  • 范围内过滤
  • 它可以像求和一样进行计算,average/mean。
  • 它可以绘制它。
import pandas as pd

df = pd.DataFrame({
    'Name': ['Cuba I', 'San Felipe II Okeechobee', 'Bahamas', 'Cuba II', 'CubaBrownsville', 'Tampico', 'Labor Day', 'New England', 'Carol', 'Janet', 'Carla', 'Hattie', 'Beulah', 'Camille', 'Edith', 'Anita', 'David', 'Allen', 'Gilbert', 'Hugo', 'Andrew', 'Mitch', 'Isabel', 'Ivan', 'Emily', 'Katrina', 'Rita', 'Wilma', 'Dean', 'Felix', 'Matthew', 'Irma', 'Maria', 'Michael'],
    'Month': ['October', 'September', 'September', 'November', 'August', 'September', 'September', 'September', 'September', 'September', 'September', 'October', 'September', 'August', 'September', 'September', 'August', 'August', 'September', 'September', 'August', 'October', 'September', 'September', 'July', 'August', 'September', 'October', 'August', 'September', 'October', 'September', 'September', 'October'],
    'Year': [1924, 1928, 1932, 1932, 1933, 1933, 1935, 1938, 1953, 1955, 1961, 1961, 1967, 1969, 1971, 1977, 1979, 1980, 1988, 1989, 1992, 1998, 2003, 2004, 2005, 2005, 2005, 2005, 2007, 2007, 2016, 2017, 2017, 2018],
    'Max sustained winds': [165, 160, 160, 175, 160, 160, 185, 160, 160, 175, 175, 160, 160, 175, 160, 175, 175, 190, 185, 160, 175, 180, 165, 165, 160, 175, 180, 185, 175, 175, 165, 180, 175, 160],
    'Areas affected': [['Central America', 'Mexico', 'Cuba', 'Florida', 'The Bahamas'], ['Lesser Antilles', 'The Bahamas', 'United States East Coast', 'Atlantic Canada'], ['The Bahamas', 'Northeastern United States'], ['Lesser Antilles', 'Jamaica', 'Cayman Islands', 'Cuba', 'The Bahamas', 'Bermuda'], ['The Bahamas', 'Cuba', 'Florida', 'Texas', 'Tamaulipas'], ['Jamaica', 'Yucatn Peninsula'], ['The Bahamas', 'Florida', 'Georgia', 'The Carolinas', 'Virginia'], ['Southeastern United States', 'Northeastern United States', 'Southwestern Quebec'], ['Bermuda', 'New England', 'Atlantic Canada'], ['Lesser Antilles', 'Central America'], ['Texas', 'Louisiana', 'Midwestern United States'], ['Central America'], ['The Caribbean', 'Mexico', 'Texas'], ['Cuba', 'United States Gulf Coast'], ['The Caribbean', 'Central America', 'Mexico', 'United States Gulf Coast'], ['Mexico'], ['The Caribbean', 'United States East coast'], ['The Caribbean', 'Yucatn Peninsula', 'Mexico', 'South Texas'], ['Jamaica', 'Venezuela', 'Central America', 'Hispaniola', 'Mexico'], ['The Caribbean', 'United States East Coast'], ['The Bahamas', 'Florida', 'United States Gulf Coast'], ['Central America', 'Yucatn Peninsula', 'South Florida'], ['Greater Antilles', 'Bahamas', 'Eastern United States', 'Ontario'], ['The Caribbean', 'Venezuela', 'United States Gulf Coast'], ['Windward Islands', 'Jamaica', 'Mexico', 'Texas'], ['Bahamas', 'United States Gulf Coast'], ['Cuba', 'United States Gulf Coast'], ['Greater Antilles', 'Central America', 'Florida'], ['The Caribbean', 'Central America'], ['Nicaragua', 'Honduras'], ['Antilles', 'Venezuela', 'Colombia', 'United States East Coast', 'Atlantic Canada'], ['Cape Verde', 'The Caribbean', 'British Virgin Islands', 'U.S. Virgin Islands', 'Cuba', 'Florida'], ['Lesser Antilles', 'Virgin Islands', 'Puerto Rico', 'Dominican Republic', 'Turks and Caicos Islands'], ['Central America', 'United States Gulf Coast (especially Florida Panhandle)']],
    'Damages': ['Damages not recorded', '100M', 'Damages not recorded', '40M', '27.9M', '5M', 'Damages not recorded', '306M', '2M', '65.8M', '326M', '60.3M', '208M', '1.42B', '25.4M', 'Damages not recorded', '1.54B', '1.24B', '7.1B', '10B', '26.5B', '6.2B', '5.37B', '23.3B', '1.01B', '125B', '12B', '29.4B', '1.76B', '720M', '15.1B', '64.8B', '91.6B', '25.1B'],
    'Deaths': [90,4000,16,3103,179,184,408,682,5,1023,43,319,688,259,37,11,2068,269,318,107,65,19325,51,124,17,1836,125,87,45,133,603,138,3057,74],
})

#groups = df.groupby('years')

print('\n--- Year 1932 ---\n')

selected = df[ df['Year'] == 1932 ]
print( selected )

print('\n--- Name contains Cuba ---\n')

selected = df[ df['Name'].str.contains('Cuba') ]
print( selected )

print('\n--- Month August ---\n')

selected = df[ df['Month'] == 'August' ]
print( selected )

print('\n--- Deaths < 20 ---\n')

selected = df[ df['Deaths'] < 20 ].sort_values('Deaths')
print( selected[ ['Deaths', 'Year'] ] )

print('\n--- sum Deaths ---\n')

result = df['Deaths'].sum()
print( result )
print( f'{result:_}' )  # display with `_` to make it more readble

print('\n--- Area Mexico ---\n')

selected = df[ df['Areas affected'].apply(lambda item: 'Mexico' in item)  ]
print( selected[ ['Year', 'Areas affected'] ].to_string() )  # `to_string()` to display without `...`

# ---

import matplotlib.pyplot as plt

df.plot(x='Year', y='Deaths')
plt.show()

结果:

--- Year 1932 ---

      Name      Month  ...               Damages  Deaths
2  Bahamas  September  ...  Damages not recorded      16
3  Cuba II   November  ...                   40M    3103

[2 rows x 7 columns]

--- Name contains Cuba ---

              Name     Month  ...               Damages  Deaths
0           Cuba I   October  ...  Damages not recorded      90
3          Cuba II  November  ...                   40M    3103
4  CubaBrownsville    August  ...                 27.9M     179

[3 rows x 7 columns]

--- Month August ---

               Name   Month  ...  Damages  Deaths
4   CubaBrownsville  August  ...    27.9M     179
13          Camille  August  ...    1.42B     259
16            David  August  ...    1.54B    2068
17            Allen  August  ...    1.24B     269
20           Andrew  August  ...    26.5B      65
25          Katrina  August  ...     125B    1836
28             Dean  August  ...    1.76B      45

[7 rows x 7 columns]

--- Deaths < 20 ---

    Deaths  Year
8        5  1953
15      11  1977
2       16  1932
24      17  2005

--- sum Deaths ---

39489
39_489

--- Area Mexico ---

    Year                                                      Areas affected
0   1924               [Central America, Mexico, Cuba, Florida, The Bahamas]
12  1967                                      [The Caribbean, Mexico, Texas]
14  1971  [The Caribbean, Central America, Mexico, United States Gulf Coast]
15  1977                                                            [Mexico]
17  1980              [The Caribbean, Yucatn Peninsula, Mexico, South Texas]
18  1988           [Jamaica, Venezuela, Central America, Hispaniola, Mexico]
24  2005                          [Windward Islands, Jamaica, Mexico, Texas]