结构化 Android LogCat 文本文件到结构化 Pandas DF
Structure Android LogCat Text File to Structured Pandas DF
我想将 LogCat 行文本文件转换为结构化 Pandas DF。我似乎无法正确地概念化我将如何做到这一点......这是我的基本伪代码:
dateTime = []
processID = []
threadID = []
priority = []
application = []
tag = []
text = []
logFile = "xxxxxx.log"
for line in logfile:
split the string according to the basic structure
dateTime = [0]
processID = [1]
threadID = [2]
priority = [3]
application = [4]
tag = [5]
text = [6]
append each to the empty list above
write the lists to pandas dataframe & add column names
问题是:我不知道如何使用这种结构正确定义定界符
08-01 14:28:35.947 1320 1320 D wpa_xxxx: wlan1: skip--ssid
import re
import pandas as pd
ROW_PATTERN = re.compile(r"""(\d{2}\-\d{2} \d{2}\:\d{2}\:\d{2}\.\d+) (\d+) (\d+) ([A-Z]) (\S+) (\S+) (\S+)""")
with open(logFile) as f:
s = pd.Series(f.readlines())
df = s.extract(ROW_PATTERN)
df.columns = ['dateTime', 'processID', 'threadID', 'priority', 'application', 'tag', 'text']
这会将 logFile
的每一行读入一个 Series 中的一行,然后可以通过正则表达式中的每个组将其扩展为一个 DataFrame。这假设 08-01 14:28:35.947
是每行中的第一个值,后续值由白色 space.
分隔
我想将 LogCat 行文本文件转换为结构化 Pandas DF。我似乎无法正确地概念化我将如何做到这一点......这是我的基本伪代码:
dateTime = []
processID = []
threadID = []
priority = []
application = []
tag = []
text = []
logFile = "xxxxxx.log"
for line in logfile:
split the string according to the basic structure
dateTime = [0]
processID = [1]
threadID = [2]
priority = [3]
application = [4]
tag = [5]
text = [6]
append each to the empty list above
write the lists to pandas dataframe & add column names
问题是:我不知道如何使用这种结构正确定义定界符
08-01 14:28:35.947 1320 1320 D wpa_xxxx: wlan1: skip--ssid
import re
import pandas as pd
ROW_PATTERN = re.compile(r"""(\d{2}\-\d{2} \d{2}\:\d{2}\:\d{2}\.\d+) (\d+) (\d+) ([A-Z]) (\S+) (\S+) (\S+)""")
with open(logFile) as f:
s = pd.Series(f.readlines())
df = s.extract(ROW_PATTERN)
df.columns = ['dateTime', 'processID', 'threadID', 'priority', 'application', 'tag', 'text']
这会将 logFile
的每一行读入一个 Series 中的一行,然后可以通过正则表达式中的每个组将其扩展为一个 DataFrame。这假设 08-01 14:28:35.947
是每行中的第一个值,后续值由白色 space.