列表在循环中追加一个 object。我该如何纠正这个错误?
List appending one object in loop. How do I rectify this error?
我对此有点陌生。这个想法是抓取数据。我的部分代码有问题。代码运行完美,但在执行 'print("The output is:",output)' 期间,如下面的代码所示。我希望得到 object 的列表。我得到的结果只是列表中的一个object。
例如,我希望在 运行 之后输出的代码应该是 [resp, resp, ...],它只显示在循环中执行的最后一个 object 为:[resp]。
我的输出示例是
The output is: [{'Company Name': ' ELEVEN', 'Department': '7', 'Ad Posted To': 'MATUN SIGNNATURE', 'Publication References': 'AGOSTI, annonce n°1125\n\t\t\t\t\t\t\t'}]
我的代码中是否遗漏了什么?
谢谢你。
在下面的代码中,我省略了 url 和 header 中的一些信息。希望下面的代码足以解释这种情况。
import requests
from lxml import html
import csv
data = []
for i in range(1, 51, 1):
print('page: %s' % i)
headers = {
//header info
}
response = requests.get(url)
_txt = response.text
data.append(_txt)
print(len(data))
# print(data)
# parsing the html
for text in data:
doc = html.fromstring(text)
container = doc.xpath("//tr[contains(@class, 'pair')]")[1]
company_name = container.xpath('.//dt[contains(text(), "Dénomination sociale :")]/following-sibling::dd[1]/text()')
# publication_date = container.xpath(".//td[@class='colonnel']/text()")
#ad_category = container.xpath('.//dt[contains(text(), "Catégorie d'annonce :")]/following-sibling::dd[1]/text()')
department = container.xpath('.//dt[contains(text(), "Département :")]/following-sibling::dd[1]/text()')
ad_posted_to = container.xpath('.//dt[contains(text(), "Annonce déposée au :")]/following-sibling::dd[1]/text()')
publication_references = container.xpath('.//dt[contains(text(), "Références de publication :")]/following-sibling::dd[1]/text()')
# json output
resp = {}
output = []
for info in zip(company_name, department, ad_posted_to, publication_references):
resp['Company Name'] = info[0]
resp['Department'] = info[1]
resp['Ad Posted To'] = info[2]
resp['Publication References'] = info[3]
print(resp)
output.append(resp)
print("The output is:",output)
你只拿了最后一个。尝试将 resp
添加到 for
循环内的输出并重新启动 resp
dict
for info in zip(company_name, department, ad_posted_to, publication_references):
resp = {}
resp['Company Name'] = info[0]
resp['Department'] = info[1]
resp['Ad Posted To'] = info[2]
resp['Publication References'] = info[3]
print(resp)
output.append(resp)
问题是您每次迭代都会覆盖 output
。尝试将 output = []
放在循环 for text in data
.
之外
例如:
output = []
for text in data:
doc = html.fromstring(text)
container = doc.xpath("//tr[contains(@class, 'pair')]")[1]
company_name = container.xpath('.//dt[contains(text(), "Dénomination sociale :")]/following-sibling::dd[1]/text()')
# publication_date = container.xpath(".//td[@class='colonnel']/text()")
#ad_category = container.xpath('.//dt[contains(text(), "Catégorie d'annonce :")]/following-sibling::dd[1]/text()')
department = container.xpath('.//dt[contains(text(), "Département :")]/following-sibling::dd[1]/text()')
ad_posted_to = container.xpath('.//dt[contains(text(), "Annonce déposée au :")]/following-sibling::dd[1]/text()')
publication_references = container.xpath('.//dt[contains(text(), "Références de publication :")]/following-sibling::dd[1]/text()')
# json output
resp = {}
for info in zip(company_name, department, ad_posted_to, publication_references):
resp['Company Name'] = info[0]
resp['Department'] = info[1]
resp['Ad Posted To'] = info[2]
resp['Publication References'] = info[3]
print(resp)
output.append(resp)
print("The output is:",output)
我对此有点陌生。这个想法是抓取数据。我的部分代码有问题。代码运行完美,但在执行 'print("The output is:",output)' 期间,如下面的代码所示。我希望得到 object 的列表。我得到的结果只是列表中的一个object。 例如,我希望在 运行 之后输出的代码应该是 [resp, resp, ...],它只显示在循环中执行的最后一个 object 为:[resp]。 我的输出示例是
The output is: [{'Company Name': ' ELEVEN', 'Department': '7', 'Ad Posted To': 'MATUN SIGNNATURE', 'Publication References': 'AGOSTI, annonce n°1125\n\t\t\t\t\t\t\t'}]
我的代码中是否遗漏了什么? 谢谢你。
在下面的代码中,我省略了 url 和 header 中的一些信息。希望下面的代码足以解释这种情况。
import requests
from lxml import html
import csv
data = []
for i in range(1, 51, 1):
print('page: %s' % i)
headers = {
//header info
}
response = requests.get(url)
_txt = response.text
data.append(_txt)
print(len(data))
# print(data)
# parsing the html
for text in data:
doc = html.fromstring(text)
container = doc.xpath("//tr[contains(@class, 'pair')]")[1]
company_name = container.xpath('.//dt[contains(text(), "Dénomination sociale :")]/following-sibling::dd[1]/text()')
# publication_date = container.xpath(".//td[@class='colonnel']/text()")
#ad_category = container.xpath('.//dt[contains(text(), "Catégorie d'annonce :")]/following-sibling::dd[1]/text()')
department = container.xpath('.//dt[contains(text(), "Département :")]/following-sibling::dd[1]/text()')
ad_posted_to = container.xpath('.//dt[contains(text(), "Annonce déposée au :")]/following-sibling::dd[1]/text()')
publication_references = container.xpath('.//dt[contains(text(), "Références de publication :")]/following-sibling::dd[1]/text()')
# json output
resp = {}
output = []
for info in zip(company_name, department, ad_posted_to, publication_references):
resp['Company Name'] = info[0]
resp['Department'] = info[1]
resp['Ad Posted To'] = info[2]
resp['Publication References'] = info[3]
print(resp)
output.append(resp)
print("The output is:",output)
你只拿了最后一个。尝试将 resp
添加到 for
循环内的输出并重新启动 resp
dict
for info in zip(company_name, department, ad_posted_to, publication_references):
resp = {}
resp['Company Name'] = info[0]
resp['Department'] = info[1]
resp['Ad Posted To'] = info[2]
resp['Publication References'] = info[3]
print(resp)
output.append(resp)
问题是您每次迭代都会覆盖 output
。尝试将 output = []
放在循环 for text in data
.
例如:
output = []
for text in data:
doc = html.fromstring(text)
container = doc.xpath("//tr[contains(@class, 'pair')]")[1]
company_name = container.xpath('.//dt[contains(text(), "Dénomination sociale :")]/following-sibling::dd[1]/text()')
# publication_date = container.xpath(".//td[@class='colonnel']/text()")
#ad_category = container.xpath('.//dt[contains(text(), "Catégorie d'annonce :")]/following-sibling::dd[1]/text()')
department = container.xpath('.//dt[contains(text(), "Département :")]/following-sibling::dd[1]/text()')
ad_posted_to = container.xpath('.//dt[contains(text(), "Annonce déposée au :")]/following-sibling::dd[1]/text()')
publication_references = container.xpath('.//dt[contains(text(), "Références de publication :")]/following-sibling::dd[1]/text()')
# json output
resp = {}
for info in zip(company_name, department, ad_posted_to, publication_references):
resp['Company Name'] = info[0]
resp['Department'] = info[1]
resp['Ad Posted To'] = info[2]
resp['Publication References'] = info[3]
print(resp)
output.append(resp)
print("The output is:",output)