需要在文件中搜索特定术语并将第二个术语重复输出到 csv - Python

Need to search a file for specific terms and output the second term into a csv with repetition - Python

我正在尝试读取一个使用“:”作为分隔符的文本文件,在第一列中查找特定的搜索词,并将第二列输出到 .csv 文件。

我从中提取的文件有多个部分,如下所示(显示 2 行很多):

 Object : Info1
   Type : Info2
   LastChange : INFO3
   DeviceId : INFO4
EndObject

Object : Info5
   Type : Info6
   LastChange : INFO7
   DeviceId : INFO8
EndObject

这会重复相同的第一列(对象、类型等)但不同的信息#

我想搜索 'Info#' 并将其拉入 csv 文件以读出: Info1、Info2、Info3、Info4 通过搜索那里的第一列(对象类型 LastChange DeviceId)

到目前为止我已经得到它来输出对象和类型,但是我的 for 循环只进行一次迭代,我的代码到目前为止:

import csv
import string
import pandas as pd


        

filename1 = 'test.txt'                      #EDIT THIS TO MATCH EXACTLY THE .DMP FILE YOU WISH TO READ!!

infile = open(filename1, 'r', errors = 'ignore')                    #this names the read file variable, !!DO NOT TOUCH!!             
lines = infile.readlines()

        
filename2 = 'test.csv'                    
outfile = open(filename2,'w')
headerList ="Type:Device:Name:Change\n".split(':')     
headerString = ','.join(headerList)
outfile.write(headerString)
for line in lines[1:]:
       sline = line.split(":")                    

       if  'Type' in sline[0]:
        dataList = sline[1:]                                  
        dataString = ','.join(dataList) 
        typestring1 = ','.join([x.strip() for x in dataString.split(",")])   

       if ' Object' in sline[0]:
        objectList = sline[1:]
        objectstring = ','.join(objectList)
        namestring1 = ','.join([x.strip()for x in objectstring.split(",")])
                   
writeString = (typestring1 + "," + namestring1+ ","+ "\n")
outfile.write(writeString)



outfile.close()
infile.close()

我是 python 的新手,非常感谢任何帮助。

我在寻找格式解析器,但找不到您显示的输入格式。我会花一些时间来确保还没有“Python 微控制器内存 DMP 文件的解析器”。你比我更了解上下文,所以也许你的搜索会更有成果。

同时,根据您的示例,input.txt:


 Object : Info1
   Type : Info2
   LastChange : INFO3
   DeviceId : INFO4
EndObject




Object : Info5
   Type : Info6
   LastChange : INFO7
   DeviceId : INFO8
EndObject

这是一个端到端的解决方案,可以读取该样本并将对象数据的每个“块”转换为 CSV 行。

压力的重点是将这些类型的问题分解为尽可能多的离散步骤,如下所示:

  1. 过滤 DMP 文件以确保在要解析为值的行中至少有一个冒号 (:)(或更具体地说,只有 Type :
  2. 解析过滤后的行并证明您已找到所有块
  3. 将每个块中的行转换成一行(可以传递给 csv 模块的 writer class)
  4. 将行写为 CSV
import csv
import pprint

filtered_lines = []
with open('input.txt') as f:
    for line in f:
        line = line.strip()
        if line.startswith('Object') or line == 'EndObject':
            filtered_lines.append(line)
            continue
    
        # Keep only Type
        if line.startswith('Type :'):
            filtered_lines.append(line)
            continue
    
        # or, keep any line with a color
        # if ':' in line:
        #     filtered_lines.append(line)
        #     continue

        # at this point, no predicate has been satisfied, drop line
        pass  # redundant, but poignant and satisfying :)


all_blocks = []
this_block = None
in_block = False
for line in filtered_lines:
    # Find the start of a "block" of data
    if line.startswith('Object'):
        in_block = True
        this_block = []

    # Find the end of block... 
    if line == 'EndObject':
        # save it
        all_blocks.append(this_block)

        # reset for next block
        this_block = None
        in_block = False

    if in_block:
        this_block.append(line)

print('Blocks:')
pprint.pprint(all_blocks)

# Convert a list of blocks to a list of rows
all_rows = []
for block in all_blocks:
    row = []

    # Convert a list of lines (key : value) to a "row", a list of single-value strings
    for line in block:
        _, value = line.split(':')
        row.append(value.strip())
    
    all_rows.append(row)

print('Rows:')
pprint.pprint(all_rows)

# Finally, save as CSV
with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(all_rows)

当我 运行 针对该输入时,我得到:

Blocks:
[['Object : Info1', 'Type : Info2'], ['Object : Info5', 'Type : Info6']]
Rows:
[['Info1', 'Info2'], ['Info5', 'Info6']]

最后,output.csv:

Info1,Info2
Info5,Info6