如何将文件结构更改为表格格式?
How to Alter the Structure of a File into a Tabular Format?
我有一个文件包含以下数据:
输入:
Query= A1 bird
Hit= B1 owl
Score= 1.0 4.0 2.5
Hit= B2 bluejay
Score= 10.0 6.0 7.0
Query= A2 shark
Hit= C1 catshark
Score= 10.0 7.0 2.0
Query= A3 cat
Hit= D1 dog
Score= 7.0 2.0 1.0
我想编写一个程序来处理数据结构,使其成为表格 (.csv) 格式...类似于以下内容:
输出:
Query = A1 bird, Hit= B1 owl, Score= 1.0 4.0 2.5 #The first query, hit, score
Query = A1 bird, Hit= B2 bluejay, Score= 10.0 6.0 7.0 #The second hit and score associated with the first query
Query = A2 shark, Hit= C1 catshark, Score= 10.0 7.0 2.0 #The second query, hit, socre
Query = A3 cat, Hit= D1 dog, Score= 7.0 2.0 1.0 #The third query, hit, score
我尝试按照 提出的以下建议解决方案:
with open('g.txt', 'r') as f, open('result.csv', 'w') as csvfile:
fieldnames = ['Query', 'Hit', 'Score']
csvwriter = csv.DictWriter(csvfile, quoting=csv.QUOTE_ALL,
fieldnames=fieldnames)
csvwriter.writeheader()
data = {}
for line in f:
key, value = line.split('=')
data[key.strip()] = value.strip()
if len(data.keys()) == 3:
csvwriter.writerow(data)
data = {}
问题:
如何让程序识别与每个查询关联的命中和分数,以便我可以将它们打印在一行中?如果一个查询在其下有多个命中和分数(关联),则打印查询、第二个命中和第二个分数。完全像下面的输出:
"A1 bird","B1 owl","1.0 4.0 2.5" #1st Query, its 1st Hit, its 1st Score
"A1 bird","B2 bluejay", "10.0 6.0 7.0" #1st Query, its 2nd Hit, its 2nd Score
"A2 shark","C1 catshark", "10.0 7.0 2.0"#2nd Query, 1st and only Hit, 1st and only Score
"A3 cat","D1 dog","7.0 2.0 1.0"#3d Query, 1st and only Hit, 1st and only Score
有什么想法吗?
更改最后一行
print line.rstrip("\n\r"), #print of the first score
至
print line.rstrip("\n\r") #print of the first score
(去掉最后一个逗号)。
如果要重复前面的查询,需要添加一些变量:
query = None
prev_query = None
for line in file:
if line.startswith("Query="):
query = line.rstrip("\n\r")
print query, #print of the query line
elif line.startswith("Hit="):
if not query:
print prev_query,
print line.rstrip("\n\r"), #print of the first hit
elif line.startswith("Score="):
print line.rstrip("\n\r") #print of the first score
prev_query = query
query = None
我会使用 csv
包中的 DictWriter
class 将解析后的数据写入 CSV。没有错误处理,程序假设每个查询都会出现三个所需的数据项,尽管它们不需要为每个查询以相同的顺序给出。
import csv
with open('g.txt', 'r') as f, open('result.csv', 'w') as csvfile:
fieldnames = ['Query', 'Hit', 'Score']
csvwriter = csv.DictWriter(csvfile, quoting=csv.QUOTE_ALL,
fieldnames=fieldnames)
csvwriter.writeheader()
data = {}
for line in f:
key, value = line.split('=')
data[key.strip()] = value.strip()
if len(data.keys()) == 3:
csvwriter.writerow(data)
data = {}
我有一个文件包含以下数据:
输入:
Query= A1 bird
Hit= B1 owl
Score= 1.0 4.0 2.5
Hit= B2 bluejay
Score= 10.0 6.0 7.0
Query= A2 shark
Hit= C1 catshark
Score= 10.0 7.0 2.0
Query= A3 cat
Hit= D1 dog
Score= 7.0 2.0 1.0
我想编写一个程序来处理数据结构,使其成为表格 (.csv) 格式...类似于以下内容:
输出:
Query = A1 bird, Hit= B1 owl, Score= 1.0 4.0 2.5 #The first query, hit, score
Query = A1 bird, Hit= B2 bluejay, Score= 10.0 6.0 7.0 #The second hit and score associated with the first query
Query = A2 shark, Hit= C1 catshark, Score= 10.0 7.0 2.0 #The second query, hit, socre
Query = A3 cat, Hit= D1 dog, Score= 7.0 2.0 1.0 #The third query, hit, score
我尝试按照
with open('g.txt', 'r') as f, open('result.csv', 'w') as csvfile:
fieldnames = ['Query', 'Hit', 'Score']
csvwriter = csv.DictWriter(csvfile, quoting=csv.QUOTE_ALL,
fieldnames=fieldnames)
csvwriter.writeheader()
data = {}
for line in f:
key, value = line.split('=')
data[key.strip()] = value.strip()
if len(data.keys()) == 3:
csvwriter.writerow(data)
data = {}
问题: 如何让程序识别与每个查询关联的命中和分数,以便我可以将它们打印在一行中?如果一个查询在其下有多个命中和分数(关联),则打印查询、第二个命中和第二个分数。完全像下面的输出:
"A1 bird","B1 owl","1.0 4.0 2.5" #1st Query, its 1st Hit, its 1st Score
"A1 bird","B2 bluejay", "10.0 6.0 7.0" #1st Query, its 2nd Hit, its 2nd Score
"A2 shark","C1 catshark", "10.0 7.0 2.0"#2nd Query, 1st and only Hit, 1st and only Score
"A3 cat","D1 dog","7.0 2.0 1.0"#3d Query, 1st and only Hit, 1st and only Score
有什么想法吗?
更改最后一行
print line.rstrip("\n\r"), #print of the first score
至
print line.rstrip("\n\r") #print of the first score
(去掉最后一个逗号)。
如果要重复前面的查询,需要添加一些变量:
query = None
prev_query = None
for line in file:
if line.startswith("Query="):
query = line.rstrip("\n\r")
print query, #print of the query line
elif line.startswith("Hit="):
if not query:
print prev_query,
print line.rstrip("\n\r"), #print of the first hit
elif line.startswith("Score="):
print line.rstrip("\n\r") #print of the first score
prev_query = query
query = None
我会使用 csv
包中的 DictWriter
class 将解析后的数据写入 CSV。没有错误处理,程序假设每个查询都会出现三个所需的数据项,尽管它们不需要为每个查询以相同的顺序给出。
import csv
with open('g.txt', 'r') as f, open('result.csv', 'w') as csvfile:
fieldnames = ['Query', 'Hit', 'Score']
csvwriter = csv.DictWriter(csvfile, quoting=csv.QUOTE_ALL,
fieldnames=fieldnames)
csvwriter.writeheader()
data = {}
for line in f:
key, value = line.split('=')
data[key.strip()] = value.strip()
if len(data.keys()) == 3:
csvwriter.writerow(data)
data = {}