根据 Python 中的分类将单行转换为多列

Convert One Single row to Multiple Columns based on Categorization in Python

我有一个 txt 文件如下。数据集具有以下模板,我想将此数据集转换为 6 列,其中包含 ID、原因、代码、事件时间、严重性和严重性代码 headers in python:

  Id                = 0005      Cause          = ERROR      
  Code     = 307      Event Time              = 2020-11-09 10:16:48      
  Severity      = WARNING      
  Severity Code = 5      Id                = 0006      Cause          = FAILURE      
  Code     = 517      Event Time              = 2020-11-09 10:19:47      
  Severity      = MINOR      Severity Code = 4    

我想知道是否可以按如下方式转换以上数据集:

Id          Cause       Code     Event Time             Severity        Severity Code
0005        ERROR       307     2020-11-09 10:16:48     WARNING         5
0006        FAILURE     517     2020-11-09 10:19:47     MINOR           4

试试这个:

import re

pattern = re.compile("(.+?)=(.+?)\s{2,}")
data = []
item = {}

with open("data.txt") as fp:
    for line in fp:
        for m in pattern.finditer(line):
            key, value = [m.group(i).strip() for i in [1,2]]
            
            if key == "Id":
                if item:
                    data.append(item)
                item = {"Id": value}
            else:
                item[key] = value

    data.append(item)

df = pd.DataFrame(data)

以上是数据转换的方法,希望对你有帮助!

import re
import pandas as pd

x =   """Id                = 0005      Cause          = ERROR      
  Code     = 307      Event Time              = 2020-11-09 10:16:48      
  Severity      = WARNING      
  Severity Code = 5      Id                = 0006      Cause          = FAILURE      
  Code     = 517      Event Time              = 2020-11-09 10:19:47      
  Severity      = MINOR      Severity Code = 4"""

formatted_text = ' '.join(x.split())
id = re.findall(r"Id = ([^\s]+)", formatted_text)
cause = re.findall(r"Cause = ([^\s]+)", formatted_text)
severity = re.findall(r"Severity = ([^\s]+)", formatted_text)
severity_code = re.findall(r"Severity Code = ([^\s]+)", formatted_text)
event_time = re.findall(r"Event Time = ([^\s]+)", formatted_text)

info_dict = {
    "Id": id,
    "Cause": cause,
    "Severity": severity,
    "Severity Code": severity_code,
    "Event Time": event_time
}

df = pd.DataFrame.from_dict(info_dict)
print(df)