需要在文件中搜索特定术语并将第二个术语重复输出到 csv - Python
Need to search a file for specific terms and output the second term into a csv with repetition - Python
我正在尝试读取一个使用“:”作为分隔符的文本文件,在第一列中查找特定的搜索词,并将第二列输出到 .csv 文件。
我从中提取的文件有多个部分,如下所示(显示 2 行很多):
Object : Info1
Type : Info2
LastChange : INFO3
DeviceId : INFO4
EndObject
Object : Info5
Type : Info6
LastChange : INFO7
DeviceId : INFO8
EndObject
这会重复相同的第一列(对象、类型等)但不同的信息#
我想搜索 'Info#' 并将其拉入 csv 文件以读出:
Info1、Info2、Info3、Info4 通过搜索那里的第一列(对象类型 LastChange DeviceId)
到目前为止我已经得到它来输出对象和类型,但是我的 for 循环只进行一次迭代,我的代码到目前为止:
import csv
import string
import pandas as pd
filename1 = 'test.txt' #EDIT THIS TO MATCH EXACTLY THE .DMP FILE YOU WISH TO READ!!
infile = open(filename1, 'r', errors = 'ignore') #this names the read file variable, !!DO NOT TOUCH!!
lines = infile.readlines()
filename2 = 'test.csv'
outfile = open(filename2,'w')
headerList ="Type:Device:Name:Change\n".split(':')
headerString = ','.join(headerList)
outfile.write(headerString)
for line in lines[1:]:
sline = line.split(":")
if 'Type' in sline[0]:
dataList = sline[1:]
dataString = ','.join(dataList)
typestring1 = ','.join([x.strip() for x in dataString.split(",")])
if ' Object' in sline[0]:
objectList = sline[1:]
objectstring = ','.join(objectList)
namestring1 = ','.join([x.strip()for x in objectstring.split(",")])
writeString = (typestring1 + "," + namestring1+ ","+ "\n")
outfile.write(writeString)
outfile.close()
infile.close()
我是 python 的新手,非常感谢任何帮助。
我在寻找格式解析器,但找不到您显示的输入格式。我会花一些时间来确保还没有“Python 微控制器内存 DMP 文件的解析器”。你比我更了解上下文,所以也许你的搜索会更有成果。
同时,根据您的示例,input.txt:
Object : Info1
Type : Info2
LastChange : INFO3
DeviceId : INFO4
EndObject
Object : Info5
Type : Info6
LastChange : INFO7
DeviceId : INFO8
EndObject
这是一个端到端的解决方案,可以读取该样本并将对象数据的每个“块”转换为 CSV 行。
压力的重点是将这些类型的问题分解为尽可能多的离散步骤,如下所示:
- 过滤 DMP 文件以确保在要解析为值的行中至少有一个冒号 (
:
)(或更具体地说,只有 Type :
)
- 解析过滤后的行并证明您已找到所有块
- 将每个块中的行转换成一行(可以传递给 csv 模块的 writer class)
- 将行写为 CSV
import csv
import pprint
filtered_lines = []
with open('input.txt') as f:
for line in f:
line = line.strip()
if line.startswith('Object') or line == 'EndObject':
filtered_lines.append(line)
continue
# Keep only Type
if line.startswith('Type :'):
filtered_lines.append(line)
continue
# or, keep any line with a color
# if ':' in line:
# filtered_lines.append(line)
# continue
# at this point, no predicate has been satisfied, drop line
pass # redundant, but poignant and satisfying :)
all_blocks = []
this_block = None
in_block = False
for line in filtered_lines:
# Find the start of a "block" of data
if line.startswith('Object'):
in_block = True
this_block = []
# Find the end of block...
if line == 'EndObject':
# save it
all_blocks.append(this_block)
# reset for next block
this_block = None
in_block = False
if in_block:
this_block.append(line)
print('Blocks:')
pprint.pprint(all_blocks)
# Convert a list of blocks to a list of rows
all_rows = []
for block in all_blocks:
row = []
# Convert a list of lines (key : value) to a "row", a list of single-value strings
for line in block:
_, value = line.split(':')
row.append(value.strip())
all_rows.append(row)
print('Rows:')
pprint.pprint(all_rows)
# Finally, save as CSV
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(all_rows)
当我 运行 针对该输入时,我得到:
Blocks:
[['Object : Info1', 'Type : Info2'], ['Object : Info5', 'Type : Info6']]
Rows:
[['Info1', 'Info2'], ['Info5', 'Info6']]
最后,output.csv:
Info1,Info2
Info5,Info6
我正在尝试读取一个使用“:”作为分隔符的文本文件,在第一列中查找特定的搜索词,并将第二列输出到 .csv 文件。
我从中提取的文件有多个部分,如下所示(显示 2 行很多):
Object : Info1
Type : Info2
LastChange : INFO3
DeviceId : INFO4
EndObject
Object : Info5
Type : Info6
LastChange : INFO7
DeviceId : INFO8
EndObject
这会重复相同的第一列(对象、类型等)但不同的信息#
我想搜索 'Info#' 并将其拉入 csv 文件以读出: Info1、Info2、Info3、Info4 通过搜索那里的第一列(对象类型 LastChange DeviceId)
到目前为止我已经得到它来输出对象和类型,但是我的 for 循环只进行一次迭代,我的代码到目前为止:
import csv
import string
import pandas as pd
filename1 = 'test.txt' #EDIT THIS TO MATCH EXACTLY THE .DMP FILE YOU WISH TO READ!!
infile = open(filename1, 'r', errors = 'ignore') #this names the read file variable, !!DO NOT TOUCH!!
lines = infile.readlines()
filename2 = 'test.csv'
outfile = open(filename2,'w')
headerList ="Type:Device:Name:Change\n".split(':')
headerString = ','.join(headerList)
outfile.write(headerString)
for line in lines[1:]:
sline = line.split(":")
if 'Type' in sline[0]:
dataList = sline[1:]
dataString = ','.join(dataList)
typestring1 = ','.join([x.strip() for x in dataString.split(",")])
if ' Object' in sline[0]:
objectList = sline[1:]
objectstring = ','.join(objectList)
namestring1 = ','.join([x.strip()for x in objectstring.split(",")])
writeString = (typestring1 + "," + namestring1+ ","+ "\n")
outfile.write(writeString)
outfile.close()
infile.close()
我是 python 的新手,非常感谢任何帮助。
我在寻找格式解析器,但找不到您显示的输入格式。我会花一些时间来确保还没有“Python 微控制器内存 DMP 文件的解析器”。你比我更了解上下文,所以也许你的搜索会更有成果。
同时,根据您的示例,input.txt:
Object : Info1
Type : Info2
LastChange : INFO3
DeviceId : INFO4
EndObject
Object : Info5
Type : Info6
LastChange : INFO7
DeviceId : INFO8
EndObject
这是一个端到端的解决方案,可以读取该样本并将对象数据的每个“块”转换为 CSV 行。
压力的重点是将这些类型的问题分解为尽可能多的离散步骤,如下所示:
- 过滤 DMP 文件以确保在要解析为值的行中至少有一个冒号 (
:
)(或更具体地说,只有Type :
) - 解析过滤后的行并证明您已找到所有块
- 将每个块中的行转换成一行(可以传递给 csv 模块的 writer class)
- 将行写为 CSV
import csv
import pprint
filtered_lines = []
with open('input.txt') as f:
for line in f:
line = line.strip()
if line.startswith('Object') or line == 'EndObject':
filtered_lines.append(line)
continue
# Keep only Type
if line.startswith('Type :'):
filtered_lines.append(line)
continue
# or, keep any line with a color
# if ':' in line:
# filtered_lines.append(line)
# continue
# at this point, no predicate has been satisfied, drop line
pass # redundant, but poignant and satisfying :)
all_blocks = []
this_block = None
in_block = False
for line in filtered_lines:
# Find the start of a "block" of data
if line.startswith('Object'):
in_block = True
this_block = []
# Find the end of block...
if line == 'EndObject':
# save it
all_blocks.append(this_block)
# reset for next block
this_block = None
in_block = False
if in_block:
this_block.append(line)
print('Blocks:')
pprint.pprint(all_blocks)
# Convert a list of blocks to a list of rows
all_rows = []
for block in all_blocks:
row = []
# Convert a list of lines (key : value) to a "row", a list of single-value strings
for line in block:
_, value = line.split(':')
row.append(value.strip())
all_rows.append(row)
print('Rows:')
pprint.pprint(all_rows)
# Finally, save as CSV
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(all_rows)
当我 运行 针对该输入时,我得到:
Blocks:
[['Object : Info1', 'Type : Info2'], ['Object : Info5', 'Type : Info6']]
Rows:
[['Info1', 'Info2'], ['Info5', 'Info6']]
最后,output.csv:
Info1,Info2
Info5,Info6