尝试在 for 循环中仅使用多个条目填充列表 returns 单个条目
Attempt to populate list with multiple entries in a for loop only returns a single entry
我有一个 csv 文件,其中包含我想从中提取数据的 URL,但我的脚本目前只能设法获取要附加的最后一个条目。这是脚本:
import os
import glob
import time
from urllib.request import urlopen
import pandas as pd
import xml.etree.ElementTree as ET
count=0
files=glob.glob('./extract/isbnlist/Reihe*_isbn-dnb2.csv',recursive=True) #searches all files in folder
print(files)
for file in files:
if count==0:
csvfile = pd.read_csv(file, sep='\t', encoding='utf-8')
for row in csvfile['URL']:
print('row: ' + row)
with urlopen(str(row)) as response:
doc = ET.parse(response)
root = doc.getroot()
namespaces = { # Manually extracted from the XML file, but there could be code written to automatically do that.
"zs": "http://www.loc.gov/zing/srw/",
"": "http://www.loc.gov/MARC21/slim",
}
datafield_nodes_path = "./zs:records/zs:record/zs:recordData/record/datafield" # XPath
datafield_attribute_filters = [ #which fields to extract
{
"tag": "100", #author
"ind1": "1",
"ind2": " ",
}]
#datafield_attribute_filters = [] # Decomment this line to clear filters (and process each datafield node)
aut = []
for datafield_node in root.iterfind(datafield_nodes_path, namespaces=namespaces):
if datafield_attribute_filters:
skip_node = True
for attr_dict in datafield_attribute_filters:
for k, v in attr_dict.items():
if datafield_node.get(k) != v:
break
else:
skip_node = False
break
if skip_node:
continue
for subfield_node in datafield_node.iterfind("./subfield[@code='a']", namespaces=namespaces):
aut.append(subfield_node.text) #this gets the author name and title
print(aut)
count+=1
这是 csv 文件:
URL
0 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783960382850&recordSchema=MARC21-xml
1 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783963622106&recordSchema=MARC21-xml
2 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D-&recordSchema=MARC21-xml
3 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783806241280&recordSchema=MARC21-xml
4 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783890296005&recordSchema=MARC21-xml
5 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783110699111&recordSchema=MARC21-xml
6 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783110698930&recordSchema=MARC21-xml
7 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783110699104&recordSchema=MARC21-xml
8 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783963621093&recordSchema=MARC21-xml
9 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783451716034&recordSchema=MARC21-xml
10 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9788791953514&recordSchema=MARC21-xml
当我执行脚本时,输出是:
['Schmidt, Horst']
但我还需要其他结果。我怎样才能做到这一点?
感谢任何帮助。
编辑:link 到 Pastebin 上的完整 csv 文件,文件名是:Reihe-21A51.csv_extract.csv_isbn-dnb2.csv
正如@Tranbi 指出的那样,我不得不将 aut=[] 移出循环
现在
for file in files:
if count==0: #to only go through the first file, instead of all files in the folder
csvfile = pd.read_csv(file, sep='\t', encoding='utf-8')
aut = []
而不是
aut = []
for datafield_node in root.iterfind(datafield_nodes_path, namespaces=namespaces):
我有一个 csv 文件,其中包含我想从中提取数据的 URL,但我的脚本目前只能设法获取要附加的最后一个条目。这是脚本:
import os
import glob
import time
from urllib.request import urlopen
import pandas as pd
import xml.etree.ElementTree as ET
count=0
files=glob.glob('./extract/isbnlist/Reihe*_isbn-dnb2.csv',recursive=True) #searches all files in folder
print(files)
for file in files:
if count==0:
csvfile = pd.read_csv(file, sep='\t', encoding='utf-8')
for row in csvfile['URL']:
print('row: ' + row)
with urlopen(str(row)) as response:
doc = ET.parse(response)
root = doc.getroot()
namespaces = { # Manually extracted from the XML file, but there could be code written to automatically do that.
"zs": "http://www.loc.gov/zing/srw/",
"": "http://www.loc.gov/MARC21/slim",
}
datafield_nodes_path = "./zs:records/zs:record/zs:recordData/record/datafield" # XPath
datafield_attribute_filters = [ #which fields to extract
{
"tag": "100", #author
"ind1": "1",
"ind2": " ",
}]
#datafield_attribute_filters = [] # Decomment this line to clear filters (and process each datafield node)
aut = []
for datafield_node in root.iterfind(datafield_nodes_path, namespaces=namespaces):
if datafield_attribute_filters:
skip_node = True
for attr_dict in datafield_attribute_filters:
for k, v in attr_dict.items():
if datafield_node.get(k) != v:
break
else:
skip_node = False
break
if skip_node:
continue
for subfield_node in datafield_node.iterfind("./subfield[@code='a']", namespaces=namespaces):
aut.append(subfield_node.text) #this gets the author name and title
print(aut)
count+=1
这是 csv 文件:
URL
0 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783960382850&recordSchema=MARC21-xml
1 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783963622106&recordSchema=MARC21-xml
2 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D-&recordSchema=MARC21-xml
3 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783806241280&recordSchema=MARC21-xml
4 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783890296005&recordSchema=MARC21-xml
5 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783110699111&recordSchema=MARC21-xml
6 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783110698930&recordSchema=MARC21-xml
7 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783110699104&recordSchema=MARC21-xml
8 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783963621093&recordSchema=MARC21-xml
9 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9783451716034&recordSchema=MARC21-xml
10 http://services.dnb.de/sru/dnb?version=1.1&operation=searchRetrieve&query=ISBN%3D9788791953514&recordSchema=MARC21-xml
当我执行脚本时,输出是:
['Schmidt, Horst']
但我还需要其他结果。我怎样才能做到这一点? 感谢任何帮助。
编辑:link 到 Pastebin 上的完整 csv 文件,文件名是:Reihe-21A51.csv_extract.csv_isbn-dnb2.csv
正如@Tranbi 指出的那样,我不得不将 aut=[] 移出循环 现在
for file in files:
if count==0: #to only go through the first file, instead of all files in the folder
csvfile = pd.read_csv(file, sep='\t', encoding='utf-8')
aut = []
而不是
aut = []
for datafield_node in root.iterfind(datafield_nodes_path, namespaces=namespaces):