Python - 在可迭代对象中搜索项目并按特定顺序创建列表
Python - search for items in an iterable and create a list in a specific order
使用此数据集:
df = pd.read_csv('https://covid.ourworldindata.org/data/ecdc/total_cases.csv')
我可以像这样提取所有国家:
countries = list(df)
结束于:
countries = ['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola', 'Anguilla', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia', 'Bonaire Sint Eustatius and Saba', 'Bosnia and Herzegovina', 'Botswana', 'Brazil', 'British Virgin Islands', 'Brunei', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands', 'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia', 'Congo', 'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Curacao', 'Cyprus', 'Czech Republic', 'Democratic Republic of Congo', 'Denmark', 'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Ethiopia', 'Faeroe Islands', 'Falkland Islands', 'Fiji', 'Finland', 'France', 'French Polynesia', 'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland', 'Grenada', 'Guam', 'Guatemala', 'Guernsey', 'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti', 'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia', 'International', 'Iran', 'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jersey', 'Jordan', 'Kazakhstan', 'Kenya', 'Kosovo', 'Kuwait', 'Kyrgyzstan', 'Laos', 'Latvia', 'Lebanon', 'Liberia', 'Libya', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macedonia', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Mauritania', 'Mauritius', 'Mexico', 'Moldova', 'Monaco', 'Mongolia', 'Montenegro', 'Montserrat', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia', 'Nepal', 'Netherlands', 'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan', 'Palestine', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar', 'Romania', 'Russia', 'Rwanda', 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Vincent and the Grenadines', 'San Marino', 'Saudi Arabia', 'Senegal', 'Serbia', 'Seychelles', 'Sierra Leone', 'Singapore', 'Sint Maarten (Dutch part)', 'Slovakia', 'Slovenia', 'Somalia', 'South Africa', 'South Korea', 'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Swaziland', 'Sweden', 'Switzerland', 'Syria', 'Taiwan', 'Tanzania', 'Thailand', 'Timor', 'Togo', 'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Turks and Caicos Islands', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'United States Virgin Islands', 'Uruguay', 'Uzbekistan', 'Vatican', 'Venezuela', 'Vietnam', 'Zambia', 'Zimbabwe']
每个国家的最新病例数:
cases=[]
for item in df:
if item in countries:
# most recent is the last
n = df[item].iloc[-1]
cases.append(n)
结束于:
cases = [367.0, 383.0, 1468.0, 545.0, 17.0, 3.0, 15.0, 1715.0, 853.0, 74.0, 5956, 12640, 717.0, 36.0, 811.0, 164.0, 63.0, 861.0, 22194, 7.0, 26.0, 39.0, 5.0, 210.0, 2.0, 781.0, 7.0, 13717.0, 3.0, 135.0, 577.0, 384.0, 3.0, 117.0, 685.0, 17883, 7.0, 45.0, 9.0, 9.0, 5116.0, 82784, 1780.0, 45.0, 483.0, 349.0, 1282.0, 396.0, 13.0, 494.0, 5017, 180.0, 5071, 121.0, 15.0, 1956.0, 3995.0, 1322.0, 78.0, 16.0, 31.0, 1149.0, 52.0, 184.0, 5.0, 15.0, 2308.0, 78167, 47.0, 30.0, 4.0, 196.0, 103228, 287.0, 113.0, 1832.0, 11.0, 12.0, 121.0, 80.0, 166.0, 144.0, 33.0, 33.0, 25.0, 312.0, 895.0, 1586, 5194.0, 2738.0, nan, 62589, 1031.0, 5709.0, 150.0, 9248.0, 135586, 63.0, 3906, 170.0, 349.0, 704.0, 172.0, 184.0, 743.0, 270.0, 12.0, 548.0, 548.0, 14.0, 19.0, 78.0, 880.0, 2970.0, 599.0, 85.0, 8.0, 3963.0, 19.0, 56.0, 293.0, 6.0, 268.0, 2785.0, 1056.0, 79.0, 15.0, 241.0, 6.0, 1184.0, 10.0, 22.0, 16.0, 9.0, 19580, 18.0, 969.0, 6.0, 278.0, 254.0, 8.0, 5863, 419.0, 4072.0, 260.0, 2249.0, 2.0, 115.0, 2954.0, 3764.0, 4848.0, 12442.0, 573.0, 2057.0, 4417.0, 7497.0, 105.0, 11.0, 14.0, 8.0, 279.0, 2795.0, 237.0, 2447.0, 11.0, 6.0, 1481, 40.0, 581.0, 1055.0, 8.0, 1749.0, 10384, 1.0, 140510, 185.0, 14.0, 10.0, 10.0, 7693, 22164, 19.0, 376.0, 24.0, 2369.0, 1.0, 65.0, 107.0, 596.0, 34109.0, 8.0, 52.0, 1462.0, 2359.0, 55242, 398809, 45.0, 424.0, 504.0, 7.0, 166.0, 251.0, 39.0, 10.0]
现在我需要将所有这些绘制在地图上,为此我需要每个国家/地区的 ISO3
(alpha_3) 代码。项目的顺序在这里很重要。
现在,我找到了这个为每个国家/地区提供此信息的软件包:
import pycountry
如果我 print (list(pycountry.countries))
,我会得到一个像这样的可迭代对象:
[Country(alpha_2='AW', alpha_3='ABW', name='Aruba', numeric='533'), Country(alpha_2='AF', alpha_3='AFG', name='Afghanistan', numeric='004', official_name='Islamic Republic of Afghanistan'), Country(alpha_2='AO', alpha_3='AGO', name='Angola', numeric='024', official_name='Republic of Angola'), Country(alpha_2='AI', alpha_3='AIA', name='Anguilla', numeric='660'), Country(alpha_2='AX', alpha_3='ALA', name='Åland Islands', numeric='248'), Country(alpha_2='AL', alpha_3='ALB', name='Albania', numeric='008', official_name='Republic of Albania'), ...]
问题
我如何搜索最后一个可迭代对象并以 alpha_3
代码的有序列表结束,我上面的国家/地区列表中的每个国家/地区都有一个代码(并且顺序相同),如下所示:
alpha_codes = ['AFG', 'ALB', ...]
Ps:不在列表中的国家'countries'应该从迭代中丢弃,三个最终列表的长度必须相同。
您可以创建字典:
d = {i.name: i.alpha_3 for i in list(pycountry.countries)}
然后用get
映射代码,第二个值表示如果不匹配return原始值:
countries = [d.get(c, c) for c in countries]
不幸的是很多国家没有mach,所以可以使用search_fuzzy
:
def look(x):
try:
return pycountry.countries.search_fuzzy(x)[0].alpha_3
except:
return x
countries = [look(c) for c in countries]
print (countries)
使用此数据集:
df = pd.read_csv('https://covid.ourworldindata.org/data/ecdc/total_cases.csv')
我可以像这样提取所有国家:
countries = list(df)
结束于:
countries = ['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola', 'Anguilla', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia', 'Bonaire Sint Eustatius and Saba', 'Bosnia and Herzegovina', 'Botswana', 'Brazil', 'British Virgin Islands', 'Brunei', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands', 'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia', 'Congo', 'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Curacao', 'Cyprus', 'Czech Republic', 'Democratic Republic of Congo', 'Denmark', 'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Ethiopia', 'Faeroe Islands', 'Falkland Islands', 'Fiji', 'Finland', 'France', 'French Polynesia', 'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland', 'Grenada', 'Guam', 'Guatemala', 'Guernsey', 'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti', 'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia', 'International', 'Iran', 'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jersey', 'Jordan', 'Kazakhstan', 'Kenya', 'Kosovo', 'Kuwait', 'Kyrgyzstan', 'Laos', 'Latvia', 'Lebanon', 'Liberia', 'Libya', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macedonia', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Mauritania', 'Mauritius', 'Mexico', 'Moldova', 'Monaco', 'Mongolia', 'Montenegro', 'Montserrat', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia', 'Nepal', 'Netherlands', 'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan', 'Palestine', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar', 'Romania', 'Russia', 'Rwanda', 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Vincent and the Grenadines', 'San Marino', 'Saudi Arabia', 'Senegal', 'Serbia', 'Seychelles', 'Sierra Leone', 'Singapore', 'Sint Maarten (Dutch part)', 'Slovakia', 'Slovenia', 'Somalia', 'South Africa', 'South Korea', 'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Swaziland', 'Sweden', 'Switzerland', 'Syria', 'Taiwan', 'Tanzania', 'Thailand', 'Timor', 'Togo', 'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Turks and Caicos Islands', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'United States Virgin Islands', 'Uruguay', 'Uzbekistan', 'Vatican', 'Venezuela', 'Vietnam', 'Zambia', 'Zimbabwe']
每个国家的最新病例数:
cases=[]
for item in df:
if item in countries:
# most recent is the last
n = df[item].iloc[-1]
cases.append(n)
结束于:
cases = [367.0, 383.0, 1468.0, 545.0, 17.0, 3.0, 15.0, 1715.0, 853.0, 74.0, 5956, 12640, 717.0, 36.0, 811.0, 164.0, 63.0, 861.0, 22194, 7.0, 26.0, 39.0, 5.0, 210.0, 2.0, 781.0, 7.0, 13717.0, 3.0, 135.0, 577.0, 384.0, 3.0, 117.0, 685.0, 17883, 7.0, 45.0, 9.0, 9.0, 5116.0, 82784, 1780.0, 45.0, 483.0, 349.0, 1282.0, 396.0, 13.0, 494.0, 5017, 180.0, 5071, 121.0, 15.0, 1956.0, 3995.0, 1322.0, 78.0, 16.0, 31.0, 1149.0, 52.0, 184.0, 5.0, 15.0, 2308.0, 78167, 47.0, 30.0, 4.0, 196.0, 103228, 287.0, 113.0, 1832.0, 11.0, 12.0, 121.0, 80.0, 166.0, 144.0, 33.0, 33.0, 25.0, 312.0, 895.0, 1586, 5194.0, 2738.0, nan, 62589, 1031.0, 5709.0, 150.0, 9248.0, 135586, 63.0, 3906, 170.0, 349.0, 704.0, 172.0, 184.0, 743.0, 270.0, 12.0, 548.0, 548.0, 14.0, 19.0, 78.0, 880.0, 2970.0, 599.0, 85.0, 8.0, 3963.0, 19.0, 56.0, 293.0, 6.0, 268.0, 2785.0, 1056.0, 79.0, 15.0, 241.0, 6.0, 1184.0, 10.0, 22.0, 16.0, 9.0, 19580, 18.0, 969.0, 6.0, 278.0, 254.0, 8.0, 5863, 419.0, 4072.0, 260.0, 2249.0, 2.0, 115.0, 2954.0, 3764.0, 4848.0, 12442.0, 573.0, 2057.0, 4417.0, 7497.0, 105.0, 11.0, 14.0, 8.0, 279.0, 2795.0, 237.0, 2447.0, 11.0, 6.0, 1481, 40.0, 581.0, 1055.0, 8.0, 1749.0, 10384, 1.0, 140510, 185.0, 14.0, 10.0, 10.0, 7693, 22164, 19.0, 376.0, 24.0, 2369.0, 1.0, 65.0, 107.0, 596.0, 34109.0, 8.0, 52.0, 1462.0, 2359.0, 55242, 398809, 45.0, 424.0, 504.0, 7.0, 166.0, 251.0, 39.0, 10.0]
现在我需要将所有这些绘制在地图上,为此我需要每个国家/地区的 ISO3
(alpha_3) 代码。项目的顺序在这里很重要。
现在,我找到了这个为每个国家/地区提供此信息的软件包:
import pycountry
如果我 print (list(pycountry.countries))
,我会得到一个像这样的可迭代对象:
[Country(alpha_2='AW', alpha_3='ABW', name='Aruba', numeric='533'), Country(alpha_2='AF', alpha_3='AFG', name='Afghanistan', numeric='004', official_name='Islamic Republic of Afghanistan'), Country(alpha_2='AO', alpha_3='AGO', name='Angola', numeric='024', official_name='Republic of Angola'), Country(alpha_2='AI', alpha_3='AIA', name='Anguilla', numeric='660'), Country(alpha_2='AX', alpha_3='ALA', name='Åland Islands', numeric='248'), Country(alpha_2='AL', alpha_3='ALB', name='Albania', numeric='008', official_name='Republic of Albania'), ...]
问题
我如何搜索最后一个可迭代对象并以 alpha_3
代码的有序列表结束,我上面的国家/地区列表中的每个国家/地区都有一个代码(并且顺序相同),如下所示:
alpha_codes = ['AFG', 'ALB', ...]
Ps:不在列表中的国家'countries'应该从迭代中丢弃,三个最终列表的长度必须相同。
您可以创建字典:
d = {i.name: i.alpha_3 for i in list(pycountry.countries)}
然后用get
映射代码,第二个值表示如果不匹配return原始值:
countries = [d.get(c, c) for c in countries]
不幸的是很多国家没有mach,所以可以使用search_fuzzy
:
def look(x):
try:
return pycountry.countries.search_fuzzy(x)[0].alpha_3
except:
return x
countries = [look(c) for c in countries]
print (countries)