根据 Python 中的分类将单行转换为多列
Convert One Single row to Multiple Columns based on Categorization in Python
我有一个 txt 文件如下。数据集具有以下模板,我想将此数据集转换为 6 列,其中包含 ID、原因、代码、事件时间、严重性和严重性代码 headers in python:
Id = 0005 Cause = ERROR
Code = 307 Event Time = 2020-11-09 10:16:48
Severity = WARNING
Severity Code = 5 Id = 0006 Cause = FAILURE
Code = 517 Event Time = 2020-11-09 10:19:47
Severity = MINOR Severity Code = 4
我想知道是否可以按如下方式转换以上数据集:
Id Cause Code Event Time Severity Severity Code
0005 ERROR 307 2020-11-09 10:16:48 WARNING 5
0006 FAILURE 517 2020-11-09 10:19:47 MINOR 4
试试这个:
import re
pattern = re.compile("(.+?)=(.+?)\s{2,}")
data = []
item = {}
with open("data.txt") as fp:
for line in fp:
for m in pattern.finditer(line):
key, value = [m.group(i).strip() for i in [1,2]]
if key == "Id":
if item:
data.append(item)
item = {"Id": value}
else:
item[key] = value
data.append(item)
df = pd.DataFrame(data)
以上是数据转换的方法,希望对你有帮助!
import re
import pandas as pd
x = """Id = 0005 Cause = ERROR
Code = 307 Event Time = 2020-11-09 10:16:48
Severity = WARNING
Severity Code = 5 Id = 0006 Cause = FAILURE
Code = 517 Event Time = 2020-11-09 10:19:47
Severity = MINOR Severity Code = 4"""
formatted_text = ' '.join(x.split())
id = re.findall(r"Id = ([^\s]+)", formatted_text)
cause = re.findall(r"Cause = ([^\s]+)", formatted_text)
severity = re.findall(r"Severity = ([^\s]+)", formatted_text)
severity_code = re.findall(r"Severity Code = ([^\s]+)", formatted_text)
event_time = re.findall(r"Event Time = ([^\s]+)", formatted_text)
info_dict = {
"Id": id,
"Cause": cause,
"Severity": severity,
"Severity Code": severity_code,
"Event Time": event_time
}
df = pd.DataFrame.from_dict(info_dict)
print(df)
我有一个 txt 文件如下。数据集具有以下模板,我想将此数据集转换为 6 列,其中包含 ID、原因、代码、事件时间、严重性和严重性代码 headers in python:
Id = 0005 Cause = ERROR
Code = 307 Event Time = 2020-11-09 10:16:48
Severity = WARNING
Severity Code = 5 Id = 0006 Cause = FAILURE
Code = 517 Event Time = 2020-11-09 10:19:47
Severity = MINOR Severity Code = 4
我想知道是否可以按如下方式转换以上数据集:
Id Cause Code Event Time Severity Severity Code
0005 ERROR 307 2020-11-09 10:16:48 WARNING 5
0006 FAILURE 517 2020-11-09 10:19:47 MINOR 4
试试这个:
import re
pattern = re.compile("(.+?)=(.+?)\s{2,}")
data = []
item = {}
with open("data.txt") as fp:
for line in fp:
for m in pattern.finditer(line):
key, value = [m.group(i).strip() for i in [1,2]]
if key == "Id":
if item:
data.append(item)
item = {"Id": value}
else:
item[key] = value
data.append(item)
df = pd.DataFrame(data)
以上是数据转换的方法,希望对你有帮助!
import re
import pandas as pd
x = """Id = 0005 Cause = ERROR
Code = 307 Event Time = 2020-11-09 10:16:48
Severity = WARNING
Severity Code = 5 Id = 0006 Cause = FAILURE
Code = 517 Event Time = 2020-11-09 10:19:47
Severity = MINOR Severity Code = 4"""
formatted_text = ' '.join(x.split())
id = re.findall(r"Id = ([^\s]+)", formatted_text)
cause = re.findall(r"Cause = ([^\s]+)", formatted_text)
severity = re.findall(r"Severity = ([^\s]+)", formatted_text)
severity_code = re.findall(r"Severity Code = ([^\s]+)", formatted_text)
event_time = re.findall(r"Event Time = ([^\s]+)", formatted_text)
info_dict = {
"Id": id,
"Cause": cause,
"Severity": severity,
"Severity Code": severity_code,
"Event Time": event_time
}
df = pd.DataFrame.from_dict(info_dict)
print(df)