需要 space 的数据,Strip 函数问题

Data with space required, Strip function issue

我需要帮助 space(如 HTML 源代码中给出的)在从 table 中提取的数据之间。 6 在我的提取输出中使用 Python like **195640 421 *******。目前我的输出没有任何分隔符或 spaces。任何帮助将不胜感激:

URL 给出 here

下面给出了格式化的代码,需要帮助输出数据,如网页和 html 源代码中给出的那样具有数字间距。我认为这是由使用 strip 函数引起的,因此我在报废输出之间没有 spaces:

import requests
from bs4 import BeautifulSoup

url = "http://ec.europa.eu/environment/ets/ohaDetails.do?returnURL=&languageCode=en&accountID=&registryCode=&buttonAction=all&action=&account.registryCode=&accountType=&identifierInReg=&accountHolder=&primaryAuthRep=&installationIdentifier=&installationName=&accountStatus=&permitIdentifier=&complianceStatus=&mainActivityType=-1&searchType=oha&backList=%3C%C2%A0Back&resultList.currentPageNumber=1589&selectedPeriods="

    r = requests.get(url)
    soup = BeautifulSoup(r.text,"lxml")
    for items in soup.find(id="tblInstallationContacts").find_next_sibling().find_all("tr")[:-5]:
        data = [item.get_text(strip=True) for item in items.find_all("td")]
        print(data)

str.splitstr.join

结合使用

例如:

for items in soup.find(id="tblInstallationContacts").find_next_sibling().find_all("tr")[:-5]:
    data = [' '.join(item.text.split()) for item in items.find_all("td")]
    print(data)

输出:

[u'Compliance Information']
[u'EU ETS Phase', u'Year', u'Allowances in Allocation', u'Verified Emissions', u'Units Surrendered', u'Cumulative Surrendered Units**', u'Cumulative Verified Emissions***', u'Compliance Code', u'Options']
[u'2005-2007', u'2005', '', '', '', u'0', u'0', u'A', u'History', u'Details on Surrendered Units']
[u'2005-2007', u'2006', '', '', '', u'0', u'0', u'A', u'History']
[u'2005-2007', u'2007', '', '', '', u'0', u'0', u'A', u'History']
[u'2008-2012', u'2008', u'272063', u'219592', u'219592', u'219592', u'219592', u'A', u'History', u'Details on Surrendered Units']
[u'2008-2012', u'2009', u'272063', u'188608', u'188608', u'408200', u'408200', u'A', u'History']
[u'2008-2012', u'2010', u'272063', u'246152', u'246152', u'654352', u'654352', u'A', u'History']
[u'2008-2012', u'2011', u'272063', u'214697', u'214697', u'869049', u'869049', u'A', u'History']
[u'2008-2012', u'2012', u'272063', u'219409', u'219409', u'1088458', u'1088458', u'A', u'History']
[u'2013-2020', u'2013', u'199349', u'235869', u'235869', u'235869', u'235869', u'A', u'History', u'Details on Surrendered Units']
[u'2013-2020', u'2014', u'195640 421 *****', u'244203', u'244203', u'480072', u'480072', u'A', u'History']
[u'2013-2020', u'2015', u'191900 416 *****', u'248367', u'248367', u'728439', u'728439', u'A', u'History']
[u'2013-2020', u'2016', u'188132 364 *****', u'279441', u'279441', u'1007880', u'1007880', u'A', u'History']
[u'2013-2020', u'2017', u'184336 314 *****', u'259952', u'259952', u'1267832', u'1267832', u'A', u'History']
[u'2013-2020', u'2018', u'180513 265 *****', '', '', '', '', '', u'History']
[u'2013-2020', u'2019', u'176655 218 *****', '', '', '', '', '', u'History']
[u'2013-2020', u'2020', u'172794 174 *****', '', '', '', '', '', u'History']
[u'* Verified Emissions entered/updated after deadline of EU ETS Phase Year']