如何将列表中的一个元素转换为多个元素?
How do you convert one element in a list into many elements?
我在网上抓取了 COVID-19 的每日死亡人数,并将它们作为字符串附加到列表中。然而,所有的死亡都在一个元素中作为一个字符串。如何将一个元素(字符串)拆分为多个元素,以便每个数据值都有自己的元素?
下面我展示了我得到的输出以及我想要实现的目标。代码没有错误。
代码:
import requests
from bs4 import BeautifulSoup
# web scrapes data (this section is all fine, no trouble here)
URL = ("https://www.worldometers.info/coronavirus/country/us/")
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find_all("div", {"class": "col-md-12"})
data = results[4]
script = data.find('script')
string = script.string
# gets the dates
remove = (911 + 292 + 8026)
# gets the deaths and irrelevant/extra information in the script
deathsTEMP = string.strip()[remove:]
i = 0
y = []
count = 0
# this loop gets rid of the irrelevant/extra information in the script
while count < 1:
if deathsTEMP[i] == "]":
count = count + 1
else:
y.append(deathsTEMP[i])
i = i + 1
yaxislength = len(y)
# gets all the neccessary deaths in the list
deaths = [''.join(y[0:yaxislength])]
print(deaths)
实际输出:
['null,null,null,null,null,0,0,0,0,0,0,0,0,0,1,0,5,3,2,1,3,4,3,4,4,8,2,7,10,20,22,27,50,70,74,63,138,180,272,311,361,505,660,508,823,1103,1280,1202,1304,1592,1451,1510,2289,2218,2162,2294,2084,1777,1739,2627,2713,2271,2628,1922,1618,2001,2763,2434,2423,2010,2124,1205,1430,2548,2476,2299,1978,1759,1186,1367,2415,2585,2189,1747,1476,1245,1100,1910,1880,1802,1393,1218,883,1065,1600,1442,1500,1340,1046,644,676,789,1527,1252,1236,1036,676,689,1173,1119,1042,1004,735,412,582,1100,1012,914,766,755,351,441,848,838,767,748,583,279,378,884,822,682,670,534,316,376,718,690,669,636,307,271,425,994,911,1012,865,748,424,481,950,1023,1014,973,826,467,557,1202,1236,1167,1229,927,509,608,1332,1461,1866,1408,1060,457,627,1412,1337,1246,1364,1040,562,578,1481,1455,1299,1181,1103,565,603,1383,1289,1081,1167,1032,490,499,1252,1315,1135,1110,961,408,554,1170,1104,1116,1035,708,474,310,485,1195,1075,1118,715,440,482,1179,1011,895,933,682,333,374,971,1119,938,902,771,317,367,966,944,905,909,736,383,452,799,938,948,949,742,371,363,809,957,897,975,709,549,491,957,1218,970,888,863,489,583,1026,1051,1056,998,1017,539,645,1237,1368,1255,1342,1171,749,836,1539,1541,1376,1504,1406,840,965,1688,2149,2123,2136,1678,1021,1315,2244,2437,1564,1579,1388,994,1547,2760,2968,2866,2804,2418,1337,1769,2910,3242,3130,3040,2544,1590,1898,3180,3686,3448,3154,2709,1658,2249,3440,3434,2952,1615,1638,1406,2247,3754,4079,3601,2401,2157,1474,1946,3815,4079,4274,4001,3313,1998,2239,4464,4188,4216,3938,3410,1967,1592,2916,4340,4430,3948,3499,1949,2026,4151,4373,4024,3734,2778,1781,2355,3705,4042,3608,3628,2914,1429,1615,3274,3356,3278,3050,2367,1142,1004,1830,2472,2710,2695,1874,1304,1377,2440,2527,2442,2271,1617,1352,1514,2086,2394,2020,1886,1665,836,924,1481,1676,1576,1503,1078,707,867,1142,1360,1384,1247,846,505,681,1024,1416,1240,1313,866,566,714,911,1121,975,990,747,315,538,872,892,1021,915,722,339,508,833,960,907,892,771,322,505,874,919,927,800,729,312,483,878,982,891,765,724,348,462,848,765,864,758,636,274,396,712,823,775,753,535,340,409,715,680,698,623,549,295,371,680,650,676,617,390,203,158,309,520,612,570,450,240,364,433,502,489,407,368,170,263,346,439,379,393,243,148,249,434,391,377,374,203,135,173,308,304,318,327,157,132,130,249,283,356,331,199,127,217,344,395,360,309,203,116,142,277,408,399,331,236,138,177,360,493,413,435,369,165,213']
想要的输出:
[null,null,null,null,null,0,0,0,0,0,0,0,0,0,1,0,5,3,2,1,3,4,3,4,4,8,2,7,10,20,22,27,50,70,74,63,138,180,272,311,361,505,660,508,823,1103,1280,1202,1304,1592,1451,1510,2289,2218,2162,2294,2084,1777,1739,2627,2713,2271,2628,1922,1618,2001,2763,2434,2423,2010,2124,1205,1430,2548,2476,2299,1978,1759,1186,1367,2415,2585,2189,1747,1476,1245,1100,1910,1880,1802,1393,1218,883,1065,1600,1442,1500,1340,1046,644,676,789,1527,1252,1236,1036,676,689,1173,1119,1042,1004,735,412,582,1100,1012,914,766,755,351,441,848,838,767,748,583,279,378,884,822,682,670,534,316,376,718,690,669,636,307,271,425,994,911,1012,865,748,424,481,950,1023,1014,973,826,467,557,1202,1236,1167,1229,927,509,608,1332,1461,1866,1408,1060,457,627,1412,1337,1246,1364,1040,562,578,1481,1455,1299,1181,1103,565,603,1383,1289,1081,1167,1032,490,499,1252,1315,1135,1110,961,408,554,1170,1104,1116,1035,708,474,310,485,1195,1075,1118,715,440,482,1179,1011,895,933,682,333,374,971,1119,938,902,771,317,367,966,944,905,909,736,383,452,799,938,948,949,742,371,363,809,957,897,975,709,549,491,957,1218,970,888,863,489,583,1026,1051,1056,998,1017,539,645,1237,1368,1255,1342,1171,749,836,1539,1541,1376,1504,1406,840,965,1688,2149,2123,2136,1678,1021,1315,2244,2437,1564,1579,1388,994,1547,2760,2968,2866,2804,2418,1337,1769,2910,3242,3130,3040,2544,1590,1898,3180,3686,3448,3154,2709,1658,2249,3440,3434,2952,1615,1638,1406,2247,3754,4079,3601,2401,2157,1474,1946,3815,4079,4274,4001,3313,1998,2239,4464,4188,4216,3938,3410,1967,1592,2916,4340,4430,3948,3499,1949,2026,4151,4373,4024,3734,2778,1781,2355,3705,4042,3608,3628,2914,1429,1615,3274,3356,3278,3050,2367,1142,1004,1830,2472,2710,2695,1874,1304,1377,2440,2527,2442,2271,1617,1352,1514,2086,2394,2020,1886,1665,836,924,1481,1676,1576,1503,1078,707,867,1142,1360,1384,1247,846,505,681,1024,1416,1240,1313,866,566,714,911,1121,975,990,747,315,538,872,892,1021,915,722,339,508,833,960,907,892,771,322,505,874,919,927,800,729,312,483,878,982,891,765,724,348,462,848,765,864,758,636,274,396,712,823,775,753,535,340,409,715,680,698,623,549,295,371,680,650,676,617,390,203,158,309,520,612,570,450,240,364,433,502,489,407,368,170,263,346,439,379,393,243,148,249,434,391,377,374,203,135,173,308,304,318,327,157,132,130,249,283,356,331,199,127,217,344,395,360,309,203,116,142,277,408,399,331,236,138,177,360,493,413,435,369,165,213]
如果你仔细查看 'actual' 输出,只有一个元素是逗号分隔的字符串,而我正在寻找一个输出,其中每个数字都是它自己的元素。
sep=deaths[0].split(',')
dic = {'null':float('NaN')} # to replace null with float NaNs
sep=[dic.get(n, n) for n in sep] # filtering nulls
deaths_new = list(map(float, sep)) # new correct list with floats
要正确格式化输出,您可以这样做:
SEP = ',' # separator
data_sample = ['null,null,null,null,null,0,0,0,0,0,0,0,0']
data_final = map(lambda d: d.split(SEP), data_sample)
print(data_final)
输出
['null', 'null', 'null', 'null', 'null', '0', '0', '0', '0', '0', '0', '0', '0']
在python3
映射returns一个生成器。如果我们想使用列表,要么使用列表理解,要么直接转换它 data_final = list(data_final)
.
如果分隔符并不总是“很好”,就像您的情况一样,您可以使用标准库中的正则表达式:import re
.
请注意,既然您说您需要一个字符串列表,那么您提供的 'desired output' 并不完全正确。
我在网上抓取了 COVID-19 的每日死亡人数,并将它们作为字符串附加到列表中。然而,所有的死亡都在一个元素中作为一个字符串。如何将一个元素(字符串)拆分为多个元素,以便每个数据值都有自己的元素?
下面我展示了我得到的输出以及我想要实现的目标。代码没有错误。
代码:
import requests
from bs4 import BeautifulSoup
# web scrapes data (this section is all fine, no trouble here)
URL = ("https://www.worldometers.info/coronavirus/country/us/")
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find_all("div", {"class": "col-md-12"})
data = results[4]
script = data.find('script')
string = script.string
# gets the dates
remove = (911 + 292 + 8026)
# gets the deaths and irrelevant/extra information in the script
deathsTEMP = string.strip()[remove:]
i = 0
y = []
count = 0
# this loop gets rid of the irrelevant/extra information in the script
while count < 1:
if deathsTEMP[i] == "]":
count = count + 1
else:
y.append(deathsTEMP[i])
i = i + 1
yaxislength = len(y)
# gets all the neccessary deaths in the list
deaths = [''.join(y[0:yaxislength])]
print(deaths)
实际输出:
['null,null,null,null,null,0,0,0,0,0,0,0,0,0,1,0,5,3,2,1,3,4,3,4,4,8,2,7,10,20,22,27,50,70,74,63,138,180,272,311,361,505,660,508,823,1103,1280,1202,1304,1592,1451,1510,2289,2218,2162,2294,2084,1777,1739,2627,2713,2271,2628,1922,1618,2001,2763,2434,2423,2010,2124,1205,1430,2548,2476,2299,1978,1759,1186,1367,2415,2585,2189,1747,1476,1245,1100,1910,1880,1802,1393,1218,883,1065,1600,1442,1500,1340,1046,644,676,789,1527,1252,1236,1036,676,689,1173,1119,1042,1004,735,412,582,1100,1012,914,766,755,351,441,848,838,767,748,583,279,378,884,822,682,670,534,316,376,718,690,669,636,307,271,425,994,911,1012,865,748,424,481,950,1023,1014,973,826,467,557,1202,1236,1167,1229,927,509,608,1332,1461,1866,1408,1060,457,627,1412,1337,1246,1364,1040,562,578,1481,1455,1299,1181,1103,565,603,1383,1289,1081,1167,1032,490,499,1252,1315,1135,1110,961,408,554,1170,1104,1116,1035,708,474,310,485,1195,1075,1118,715,440,482,1179,1011,895,933,682,333,374,971,1119,938,902,771,317,367,966,944,905,909,736,383,452,799,938,948,949,742,371,363,809,957,897,975,709,549,491,957,1218,970,888,863,489,583,1026,1051,1056,998,1017,539,645,1237,1368,1255,1342,1171,749,836,1539,1541,1376,1504,1406,840,965,1688,2149,2123,2136,1678,1021,1315,2244,2437,1564,1579,1388,994,1547,2760,2968,2866,2804,2418,1337,1769,2910,3242,3130,3040,2544,1590,1898,3180,3686,3448,3154,2709,1658,2249,3440,3434,2952,1615,1638,1406,2247,3754,4079,3601,2401,2157,1474,1946,3815,4079,4274,4001,3313,1998,2239,4464,4188,4216,3938,3410,1967,1592,2916,4340,4430,3948,3499,1949,2026,4151,4373,4024,3734,2778,1781,2355,3705,4042,3608,3628,2914,1429,1615,3274,3356,3278,3050,2367,1142,1004,1830,2472,2710,2695,1874,1304,1377,2440,2527,2442,2271,1617,1352,1514,2086,2394,2020,1886,1665,836,924,1481,1676,1576,1503,1078,707,867,1142,1360,1384,1247,846,505,681,1024,1416,1240,1313,866,566,714,911,1121,975,990,747,315,538,872,892,1021,915,722,339,508,833,960,907,892,771,322,505,874,919,927,800,729,312,483,878,982,891,765,724,348,462,848,765,864,758,636,274,396,712,823,775,753,535,340,409,715,680,698,623,549,295,371,680,650,676,617,390,203,158,309,520,612,570,450,240,364,433,502,489,407,368,170,263,346,439,379,393,243,148,249,434,391,377,374,203,135,173,308,304,318,327,157,132,130,249,283,356,331,199,127,217,344,395,360,309,203,116,142,277,408,399,331,236,138,177,360,493,413,435,369,165,213']
想要的输出:
[null,null,null,null,null,0,0,0,0,0,0,0,0,0,1,0,5,3,2,1,3,4,3,4,4,8,2,7,10,20,22,27,50,70,74,63,138,180,272,311,361,505,660,508,823,1103,1280,1202,1304,1592,1451,1510,2289,2218,2162,2294,2084,1777,1739,2627,2713,2271,2628,1922,1618,2001,2763,2434,2423,2010,2124,1205,1430,2548,2476,2299,1978,1759,1186,1367,2415,2585,2189,1747,1476,1245,1100,1910,1880,1802,1393,1218,883,1065,1600,1442,1500,1340,1046,644,676,789,1527,1252,1236,1036,676,689,1173,1119,1042,1004,735,412,582,1100,1012,914,766,755,351,441,848,838,767,748,583,279,378,884,822,682,670,534,316,376,718,690,669,636,307,271,425,994,911,1012,865,748,424,481,950,1023,1014,973,826,467,557,1202,1236,1167,1229,927,509,608,1332,1461,1866,1408,1060,457,627,1412,1337,1246,1364,1040,562,578,1481,1455,1299,1181,1103,565,603,1383,1289,1081,1167,1032,490,499,1252,1315,1135,1110,961,408,554,1170,1104,1116,1035,708,474,310,485,1195,1075,1118,715,440,482,1179,1011,895,933,682,333,374,971,1119,938,902,771,317,367,966,944,905,909,736,383,452,799,938,948,949,742,371,363,809,957,897,975,709,549,491,957,1218,970,888,863,489,583,1026,1051,1056,998,1017,539,645,1237,1368,1255,1342,1171,749,836,1539,1541,1376,1504,1406,840,965,1688,2149,2123,2136,1678,1021,1315,2244,2437,1564,1579,1388,994,1547,2760,2968,2866,2804,2418,1337,1769,2910,3242,3130,3040,2544,1590,1898,3180,3686,3448,3154,2709,1658,2249,3440,3434,2952,1615,1638,1406,2247,3754,4079,3601,2401,2157,1474,1946,3815,4079,4274,4001,3313,1998,2239,4464,4188,4216,3938,3410,1967,1592,2916,4340,4430,3948,3499,1949,2026,4151,4373,4024,3734,2778,1781,2355,3705,4042,3608,3628,2914,1429,1615,3274,3356,3278,3050,2367,1142,1004,1830,2472,2710,2695,1874,1304,1377,2440,2527,2442,2271,1617,1352,1514,2086,2394,2020,1886,1665,836,924,1481,1676,1576,1503,1078,707,867,1142,1360,1384,1247,846,505,681,1024,1416,1240,1313,866,566,714,911,1121,975,990,747,315,538,872,892,1021,915,722,339,508,833,960,907,892,771,322,505,874,919,927,800,729,312,483,878,982,891,765,724,348,462,848,765,864,758,636,274,396,712,823,775,753,535,340,409,715,680,698,623,549,295,371,680,650,676,617,390,203,158,309,520,612,570,450,240,364,433,502,489,407,368,170,263,346,439,379,393,243,148,249,434,391,377,374,203,135,173,308,304,318,327,157,132,130,249,283,356,331,199,127,217,344,395,360,309,203,116,142,277,408,399,331,236,138,177,360,493,413,435,369,165,213]
如果你仔细查看 'actual' 输出,只有一个元素是逗号分隔的字符串,而我正在寻找一个输出,其中每个数字都是它自己的元素。
sep=deaths[0].split(',')
dic = {'null':float('NaN')} # to replace null with float NaNs
sep=[dic.get(n, n) for n in sep] # filtering nulls
deaths_new = list(map(float, sep)) # new correct list with floats
要正确格式化输出,您可以这样做:
SEP = ',' # separator
data_sample = ['null,null,null,null,null,0,0,0,0,0,0,0,0']
data_final = map(lambda d: d.split(SEP), data_sample)
print(data_final)
输出
['null', 'null', 'null', 'null', 'null', '0', '0', '0', '0', '0', '0', '0', '0']
在python3
映射returns一个生成器。如果我们想使用列表,要么使用列表理解,要么直接转换它 data_final = list(data_final)
.
如果分隔符并不总是“很好”,就像您的情况一样,您可以使用标准库中的正则表达式:import re
.
请注意,既然您说您需要一个字符串列表,那么您提供的 'desired output' 并不完全正确。