Python 未附加到列表
Python not appending to list
我有一个以前使用过的脚本,它使用关键字列表来查询包含多个列和条目的主文件。该脚本应逐行读取主文件,并在遇到关键字时将整行写入新文件。
关键字文件如下所示:
A2M,ABCC9,ACADVL,ACTC1,ACTN2,ADA2,AGL
主文件如下所示:
8:27379821,8,27379821,[A/T],NM_001979,NM_001256482,NM_001256483,NM_001256484,A2M,A2M,A2M,A2M,,Silent,Silent,Silent,Silent
GSA-rs72475893,8,27380763,[A/G],NM_001979,NM_001256482,NM_001256483,NM_001256484,AM,AM,AM,AM,EXON,Missense_R1407W,Missense_R1307W,Missense_R1257W,Missense_R1407W
8:27381207,8,27381207,[A/C],NM_001979,NM_001256482,NM_001256483,NM_001256484,ADA2,ADA2,ADA2,ADA2,,Silent,Silent,Silent,Silent
GSA-rs117056676,6,72385948,[T/C],,,,,AADACL2-AS1,AADAC,EXON,Silent,Silent,Missense_X400Q
所需的输出将是:
8:27379821,8,27379821,[A/T],NM_001979,NM_001256482,NM_001256483,NM_001256484,A2M,A2M,A2M,A2M,,Silent,Silent,Silent,Silent
8:27381207,8,27381207,[A/C],NM_001979,NM_001256482,NM_001256483,NM_001256484,ADA2,ADA2,ADA2,ADA2,,Silent,Silent,Silent,Silent
我使用的代码如下。我遇到的问题是“匹配”列表变量似乎是空的,它没有附加任何内容。为什么会这样?它不进行任何匹配吗?还是因为它没有将它们附加到列表中?
我试过将主文件和关键字文件用作 .csv 和 .txt,但 none 有效。
感谢您的帮助!
#open the list of words to search for
list_file = open(r'file.csv','r')
search_words = []
#loop through the words in the search list
for word in list_file:
#save each word in an array and strip whitespace
search_words.append(word.strip())
list_file.close()
#this is where the matching lines will be stored
matches = []
#open the master file
master_file = open(r'file2.csv','r')
#loop through each line in the master file
for line in master_file:
#split the current line into array, this allows for us to use the "in" operator to search for exact strings
current_line = line.split()
#loop through each search word
for search_word in search_words:
#check if the search word is in the current line
if search_word in current_line:
#if found then save the line as we found it in the file
matches.append(line)
#once found then stop searching the current line
break
master_file.close()
#create the new file
new_file = open(r'file3.txt', 'w')
#loop through all of the matched lines
for line in matches:
#write the current matched line to the new file
new_file.write(line)
new_file.close()
我尝试添加两个打印语句以查看幕后发生的情况,发现您正在阅读的第一个文件没有将句子拆分成单独的单词。
search_words 是这样存储的:
search_words = ['hello this is a line']
而不是像这样
search_words = ['hello', 'this', 'is', 'a', 'line']
我修改了第10行
对此:search_words += (word.strip()).split()
而不是这个:search_words.append(word.strip())
修改后的代码如下:
#open the list of words to search for
list_file = open(r'file.csv','r')
search_words = []
#loop through the words in the search list
for word in list_file:
#save each word in an array and strip whitespace
search_words += (word.strip()).split()
list_file.close()
#print (search_words)
#this is where the matching lines will be stored
matches = []
#open the master file
master_file = open(r'file2.csv','r')
#loop through each line in the master file
for line in master_file:
#split the current line into array, this allows for us to use the "in" operator to search for exact strings
current_line = line.split()
#print (current_line)
#loop through each search word
for search_word in search_words:
#check if the search word is in the current line
if search_word in current_line:
#if found then save the line as we found it in the file
matches.append(line)
#once found then stop searching the current line
break
master_file.close()
#create the new file
new_file = open(r'file3.txt', 'w')
#loop through all of the matched lines
for line in matches:
#write the current matched line to the new file
new_file.write(line)
new_file.close()
一目了然有两个问题:
迭代文件会给你行,而不是在 ,
字符上拆分。在使用 .strip()
并附加到您的搜索列表之前,您需要使用 .split()
。我已经删除了每一行的迭代,因为您的示例输入只有一行,但是如果您希望有多行,您可以很容易地将其添加回去。
其次,.split()
将默认拆分为
(一个 space),而不是 ,
,因此您需要将其指定为 [= 的参数13=].
通过这些修复(并使用上下文打开文件),修复代码为:
search_words = []
with open(r'file.csv','r') as list_file:
for word in list_file.read().split(","): # Fix 1
search_words.append(word.strip())
matches = []
with open(r'file2.csv','r') as master_file:
for line in master_file:
# Not strictly necessary, we can search in the string using in
current_line = line.split(",") # Fix 2
for search_word in search_words:
if search_word in current_line:
matches.append(line)
break
with open(r'file3.txt', 'w') as new_file:
for line in matches:
new_file.write(line)
print(line)
结果:
8:27379821,8,27379821,[A/T],NM_001979,NM_001256482,NM_001256483,NM_001256484,A2M,A2M,A2M,A2M,,Silent,Silent,Silent,Silent
8:27381207,8,27381207,[A/C],NM_001979,NM_001256482,NM_001256483,NM_001256484,ADA2,ADA2,ADA2,ADA2,,Silent,Silent,Silent,Silent
(注意控制台输出每行之间有额外的新行)
我有一个以前使用过的脚本,它使用关键字列表来查询包含多个列和条目的主文件。该脚本应逐行读取主文件,并在遇到关键字时将整行写入新文件。
关键字文件如下所示:
A2M,ABCC9,ACADVL,ACTC1,ACTN2,ADA2,AGL
主文件如下所示:
8:27379821,8,27379821,[A/T],NM_001979,NM_001256482,NM_001256483,NM_001256484,A2M,A2M,A2M,A2M,,Silent,Silent,Silent,Silent
GSA-rs72475893,8,27380763,[A/G],NM_001979,NM_001256482,NM_001256483,NM_001256484,AM,AM,AM,AM,EXON,Missense_R1407W,Missense_R1307W,Missense_R1257W,Missense_R1407W
8:27381207,8,27381207,[A/C],NM_001979,NM_001256482,NM_001256483,NM_001256484,ADA2,ADA2,ADA2,ADA2,,Silent,Silent,Silent,Silent
GSA-rs117056676,6,72385948,[T/C],,,,,AADACL2-AS1,AADAC,EXON,Silent,Silent,Missense_X400Q
所需的输出将是:
8:27379821,8,27379821,[A/T],NM_001979,NM_001256482,NM_001256483,NM_001256484,A2M,A2M,A2M,A2M,,Silent,Silent,Silent,Silent
8:27381207,8,27381207,[A/C],NM_001979,NM_001256482,NM_001256483,NM_001256484,ADA2,ADA2,ADA2,ADA2,,Silent,Silent,Silent,Silent
我使用的代码如下。我遇到的问题是“匹配”列表变量似乎是空的,它没有附加任何内容。为什么会这样?它不进行任何匹配吗?还是因为它没有将它们附加到列表中?
我试过将主文件和关键字文件用作 .csv 和 .txt,但 none 有效。
感谢您的帮助!
#open the list of words to search for
list_file = open(r'file.csv','r')
search_words = []
#loop through the words in the search list
for word in list_file:
#save each word in an array and strip whitespace
search_words.append(word.strip())
list_file.close()
#this is where the matching lines will be stored
matches = []
#open the master file
master_file = open(r'file2.csv','r')
#loop through each line in the master file
for line in master_file:
#split the current line into array, this allows for us to use the "in" operator to search for exact strings
current_line = line.split()
#loop through each search word
for search_word in search_words:
#check if the search word is in the current line
if search_word in current_line:
#if found then save the line as we found it in the file
matches.append(line)
#once found then stop searching the current line
break
master_file.close()
#create the new file
new_file = open(r'file3.txt', 'w')
#loop through all of the matched lines
for line in matches:
#write the current matched line to the new file
new_file.write(line)
new_file.close()
我尝试添加两个打印语句以查看幕后发生的情况,发现您正在阅读的第一个文件没有将句子拆分成单独的单词。
search_words 是这样存储的:
search_words = ['hello this is a line']
而不是像这样
search_words = ['hello', 'this', 'is', 'a', 'line']
我修改了第10行
对此:search_words += (word.strip()).split()
而不是这个:search_words.append(word.strip())
修改后的代码如下:
#open the list of words to search for
list_file = open(r'file.csv','r')
search_words = []
#loop through the words in the search list
for word in list_file:
#save each word in an array and strip whitespace
search_words += (word.strip()).split()
list_file.close()
#print (search_words)
#this is where the matching lines will be stored
matches = []
#open the master file
master_file = open(r'file2.csv','r')
#loop through each line in the master file
for line in master_file:
#split the current line into array, this allows for us to use the "in" operator to search for exact strings
current_line = line.split()
#print (current_line)
#loop through each search word
for search_word in search_words:
#check if the search word is in the current line
if search_word in current_line:
#if found then save the line as we found it in the file
matches.append(line)
#once found then stop searching the current line
break
master_file.close()
#create the new file
new_file = open(r'file3.txt', 'w')
#loop through all of the matched lines
for line in matches:
#write the current matched line to the new file
new_file.write(line)
new_file.close()
一目了然有两个问题:
迭代文件会给你行,而不是在 ,
字符上拆分。在使用 .strip()
并附加到您的搜索列表之前,您需要使用 .split()
。我已经删除了每一行的迭代,因为您的示例输入只有一行,但是如果您希望有多行,您可以很容易地将其添加回去。
其次,.split()
将默认拆分为
(一个 space),而不是 ,
,因此您需要将其指定为 [= 的参数13=].
通过这些修复(并使用上下文打开文件),修复代码为:
search_words = []
with open(r'file.csv','r') as list_file:
for word in list_file.read().split(","): # Fix 1
search_words.append(word.strip())
matches = []
with open(r'file2.csv','r') as master_file:
for line in master_file:
# Not strictly necessary, we can search in the string using in
current_line = line.split(",") # Fix 2
for search_word in search_words:
if search_word in current_line:
matches.append(line)
break
with open(r'file3.txt', 'w') as new_file:
for line in matches:
new_file.write(line)
print(line)
结果:
8:27379821,8,27379821,[A/T],NM_001979,NM_001256482,NM_001256483,NM_001256484,A2M,A2M,A2M,A2M,,Silent,Silent,Silent,Silent
8:27381207,8,27381207,[A/C],NM_001979,NM_001256482,NM_001256483,NM_001256484,ADA2,ADA2,ADA2,ADA2,,Silent,Silent,Silent,Silent
(注意控制台输出每行之间有额外的新行)