从文件中读取字符串的特定区域
Reading specific area of string from file
我在想办法用 python 做一些事情。
我有一个文本文件,其中包含字符串,即:
M, 1, 14/08/2019 11:39, 4, xxxx, name, “Initialization of the system, and loading
M, 1, 14/08/2019 11:40, 100, xxxx, name, “Open Connection”
M, 1, 14/08/2019 11:40, 100, xxxx, name, “Close Connection, and reboot”
S, 1, 14/08/2019 11:40, 6, xxxx, name, We created the user in the systems
S, 1, 14/08/2019 11:41, 3, xxxx, User logged in, User tal logged in
M, 1, 14/08/2019 11:39, 4, xxxx, name, “Initialization of the system”
S, 1, 14/08/2019 11:40, 6, New User, We created the user in the systems
S, 1, 14/08/2019 11:41, 3, User logged in, User tal logged in
S, 1, 14/08/2019 11:42, 3, User logged in, User tal logged in
M, 2, 14/08/2019 11:43, 100, yyy, yura, 12345, Message
我想做的是进入文件,如果它是第一次有 M,1,我应该打印一些文本,如果它是 S,1 或 M,2 或 S,1,则相同。
我还必须只打印文件中选定的行(还没有打印出来,但我会使用行计数器)。
我还需要做的是只打印选定的列,我所说的列是指列之间有一个分隔符“,”,即如果我想打印第 1 行和第 2 行的 3 列和 4 列,我应该只打印 14/ 08/2019 11:39,4 和 14/08/2019 11:40,100。
我已经想出了如何用 re.split 拆分字符串,但我不知道如何继续。
谢谢
import re
import string
filename = '11.txt'
def infile(filename):
m1 = m2 = s1 = s2 = 0
linecounter = 1
lines = [1,2,3]
colums = [2,4]
i=0
fin = open(filename, 'r')
if fin.closed:
print ('file is closed')
lines = fin.readlines()
for line in lines:
if(line[0] == 'M' and line[3] == '1' and m1 == 0):
print('---M, 1, Datetime, Error Level, DeviceId, UserId, Message---\n')
m1 = 1
elif (line[0] == 'M' and line[3] == '2' and m2 == 0):
print('---M, 2, Datetime, Error Level, DeviceId, UserId, MobileId, Message---\n')
m2 = 1
elif (line[0] == 'S' and line[3] == '1' and s1 == 0):
print('---S, 1, Datetime, Error Level, DeviceId, Action, Message---\n')
s1 = 1
elif (line[0] == 'S' and line[3] == '2' and s2 == 0):
print('---S, 2, Datetime, Error Level, DeviceId, IP, Action, Message---\n')
s2 = 1
for p in re.split(",",line): // thats a check of spliting, nothing else
print("piece="+p)
print(line)
infile(filename)
从 re.split(",",line)
,其中 returns 一个向量,您可以使用例如:
访问您想要的值
slit_str=re.split(",",line)
split_str[2] #Returns the dates
split_str[3] #Returns the number in the column after the date
为了加快速度,如果m1,m2,s1和s1 == 1,你也可以打破循环,使用break
我在 select_columns
下面创建了一个函数,它将接受一个 int 数组(用于列),然后用 ,
分隔符和 return 一串整理后的值。
希望对您有所帮助
import re
import string
filename = '11.txt'
column_list = [3, 4] #Index 1 not index 0
def infile(filename, column_list):
m1 = m2 = s1 = s2 = 0
linecounter = 1
lines = [1,2,3]
colums = [2,4]
i=0
fin = open(filename, 'r')
if fin.closed:
print ('file is closed')
lines = fin.readlines()
for line in lines:
if(line[0] == 'M' and line[3] == '1' and m1 == 0):
print('---M, 1, Datetime, Error Level, DeviceId, UserId, Message---\n')
print(select_columns(row = line, column_list = column_list))
m1 = 1
elif (line[0] == 'M' and line[3] == '2' and m2 == 0):
print('---M, 2, Datetime, Error Level, DeviceId, UserId, MobileId, Message---\n')
print(select_columns(row = line, column_list = column_list))
m2 = 1
elif (line[0] == 'S' and line[3] == '1' and s1 == 0):
print('---S, 1, Datetime, Error Level, DeviceId, Action, Message---\n')
print(select_columns(row = line, column_list = column_list))
s1 = 1
elif (line[0] == 'S' and line[3] == '2' and s2 == 0):
print('---S, 2, Datetime, Error Level, DeviceId, IP, Action, Message---\n')
print(select_columns(row = line, column_list = column_list))
s2 = 1
for p in re.split(",",line): # thats a check of spliting, nothing else
print("piece="+p)
print(line)
def select_columns(row, column_list):
column_split = row.split(',')
return_string = ''
for column in column_list:
return_string = '{0},{1}'.format(return_string, column_split[column - 1])
return return_string[1:] # retruns the string trimming the first comma
infile(filename, column_list)
一种更简单的方法是将文件加载到数据框中,然后根据列值过滤行
-->加载为数据框:
data = pd.read_csv('output_list.txt', sep=" ", header=None)
data.columns = ["a", "b", "c", "etc."]
Load data from txt with pandas
根据列值过滤行:
pandas: filter rows of DataFrame with operator chaining
https://cmdlinetips.com/2018/02/how-to-subset-pandas-dataframe-based-on-values-of-a-column/
您可以使用以下代码替换 for 循环来拆分行并打印第 2,3 列:
splittedLine = line.split(",")
print(splittedLine[2],splittedLine[3])
这将打印:
14/08/2019 11:39 4
and so on.....
您可以使用字典来维护有关每行前缀第一次出现的信息,然后使用字典相应地打印信息。
此外,为每种类型(“M, 1”、“M, 2”等)维护其 header 的映射将使最终结果的打印更加容易。
import json
from pprint import pprint
input_string = """M, 1, 14/08/2019 11:39, 4, xxxx, name, “Initialization of the system, and loading
M, 1, 14/08/2019 11:40, 100, xxxx, name, “Open Connection”
M, 1, 14/08/2019 11:40, 100, xxxx, name, “Close Connection, and reboot”
S, 1, 14/08/2019 11:40, 6, xxxx, name, We created the user in the systems
S, 1, 14/08/2019 11:41, 3, xxxx, User logged in, User tal logged in
M, 1, 14/08/2019 11:39, 4, xxxx, name, “Initialization of the system”
S, 1, 14/08/2019 11:40, 6, New User, We created the user in the systems
S, 1, 14/08/2019 11:41, 3, User logged in, User tal logged in
S, 1, 14/08/2019 11:42, 3, User logged in, User tal logged in
M, 2, 14/08/2019 11:43, 100, yyy, yura, 12345, Message"""
# Maintain mapping between the type of line, and the header corresponding to it
header_mapping = {"M, 1": ["Datetime", "Error Level", "DeviceId", "UserId", "Message"],
"M, 2": ["Datetime", "Error Level", "DeviceId", "UserId", "MobileId", "Message"],
"S, 1": ["Datetime", "Error Level", "DeviceId", "Action", "Message"],
"S, 2": ["Datetime", "Error Level", "DeviceId", "IP", "Action", "Message"]
}
mapping = dict()
# Split the string into lines
lines = input_string.splitlines()
for line in lines:
split_line = line.split(", ") # Split each line using ", "
key = split_line[0] + ", " + split_line[1] # First two elements of the split list form your key
# Check if the key already exists. This is to ensure that our mapping dictionary contains only the first occurrence of each type.
if not mapping.get(key, None):
header = header_mapping[key]
line_info = dict(zip(header, split_line[2:])) # Create dictionary with header-value mapping
mapping[key] = line_info # Enter dictionary entry with type-values mapping
pprint(mapping)
"""
{'M, 1': {'Datetime': '14/08/2019 11:39',
'DeviceId': 'xxxx',
'Error Level': '4',
'Message': '“Initialization of the system',
'UserId': 'name'},
'M, 2': {'Datetime': '14/08/2019 11:43',
'DeviceId': 'yyy',
'Error Level': '100',
'Message': 'Message',
'MobileId': '12345',
'UserId': 'yura'},
'S, 1': {'Action': 'name',
'Datetime': '14/08/2019 11:40',
'DeviceId': 'xxxx',
'Error Level': '6',
'Message': 'We created the user in the systems'}}
"""
我在想办法用 python 做一些事情。 我有一个文本文件,其中包含字符串,即:
M, 1, 14/08/2019 11:39, 4, xxxx, name, “Initialization of the system, and loading
M, 1, 14/08/2019 11:40, 100, xxxx, name, “Open Connection”
M, 1, 14/08/2019 11:40, 100, xxxx, name, “Close Connection, and reboot”
S, 1, 14/08/2019 11:40, 6, xxxx, name, We created the user in the systems
S, 1, 14/08/2019 11:41, 3, xxxx, User logged in, User tal logged in
M, 1, 14/08/2019 11:39, 4, xxxx, name, “Initialization of the system”
S, 1, 14/08/2019 11:40, 6, New User, We created the user in the systems
S, 1, 14/08/2019 11:41, 3, User logged in, User tal logged in
S, 1, 14/08/2019 11:42, 3, User logged in, User tal logged in
M, 2, 14/08/2019 11:43, 100, yyy, yura, 12345, Message
我想做的是进入文件,如果它是第一次有 M,1,我应该打印一些文本,如果它是 S,1 或 M,2 或 S,1,则相同。 我还必须只打印文件中选定的行(还没有打印出来,但我会使用行计数器)。 我还需要做的是只打印选定的列,我所说的列是指列之间有一个分隔符“,”,即如果我想打印第 1 行和第 2 行的 3 列和 4 列,我应该只打印 14/ 08/2019 11:39,4 和 14/08/2019 11:40,100。 我已经想出了如何用 re.split 拆分字符串,但我不知道如何继续。 谢谢
import re
import string
filename = '11.txt'
def infile(filename):
m1 = m2 = s1 = s2 = 0
linecounter = 1
lines = [1,2,3]
colums = [2,4]
i=0
fin = open(filename, 'r')
if fin.closed:
print ('file is closed')
lines = fin.readlines()
for line in lines:
if(line[0] == 'M' and line[3] == '1' and m1 == 0):
print('---M, 1, Datetime, Error Level, DeviceId, UserId, Message---\n')
m1 = 1
elif (line[0] == 'M' and line[3] == '2' and m2 == 0):
print('---M, 2, Datetime, Error Level, DeviceId, UserId, MobileId, Message---\n')
m2 = 1
elif (line[0] == 'S' and line[3] == '1' and s1 == 0):
print('---S, 1, Datetime, Error Level, DeviceId, Action, Message---\n')
s1 = 1
elif (line[0] == 'S' and line[3] == '2' and s2 == 0):
print('---S, 2, Datetime, Error Level, DeviceId, IP, Action, Message---\n')
s2 = 1
for p in re.split(",",line): // thats a check of spliting, nothing else
print("piece="+p)
print(line)
infile(filename)
从 re.split(",",line)
,其中 returns 一个向量,您可以使用例如:
slit_str=re.split(",",line)
split_str[2] #Returns the dates
split_str[3] #Returns the number in the column after the date
为了加快速度,如果m1,m2,s1和s1 == 1,你也可以打破循环,使用break
我在 select_columns
下面创建了一个函数,它将接受一个 int 数组(用于列),然后用 ,
分隔符和 return 一串整理后的值。
希望对您有所帮助
import re
import string
filename = '11.txt'
column_list = [3, 4] #Index 1 not index 0
def infile(filename, column_list):
m1 = m2 = s1 = s2 = 0
linecounter = 1
lines = [1,2,3]
colums = [2,4]
i=0
fin = open(filename, 'r')
if fin.closed:
print ('file is closed')
lines = fin.readlines()
for line in lines:
if(line[0] == 'M' and line[3] == '1' and m1 == 0):
print('---M, 1, Datetime, Error Level, DeviceId, UserId, Message---\n')
print(select_columns(row = line, column_list = column_list))
m1 = 1
elif (line[0] == 'M' and line[3] == '2' and m2 == 0):
print('---M, 2, Datetime, Error Level, DeviceId, UserId, MobileId, Message---\n')
print(select_columns(row = line, column_list = column_list))
m2 = 1
elif (line[0] == 'S' and line[3] == '1' and s1 == 0):
print('---S, 1, Datetime, Error Level, DeviceId, Action, Message---\n')
print(select_columns(row = line, column_list = column_list))
s1 = 1
elif (line[0] == 'S' and line[3] == '2' and s2 == 0):
print('---S, 2, Datetime, Error Level, DeviceId, IP, Action, Message---\n')
print(select_columns(row = line, column_list = column_list))
s2 = 1
for p in re.split(",",line): # thats a check of spliting, nothing else
print("piece="+p)
print(line)
def select_columns(row, column_list):
column_split = row.split(',')
return_string = ''
for column in column_list:
return_string = '{0},{1}'.format(return_string, column_split[column - 1])
return return_string[1:] # retruns the string trimming the first comma
infile(filename, column_list)
一种更简单的方法是将文件加载到数据框中,然后根据列值过滤行
-->加载为数据框:
data = pd.read_csv('output_list.txt', sep=" ", header=None)
data.columns = ["a", "b", "c", "etc."]
Load data from txt with pandas
根据列值过滤行: pandas: filter rows of DataFrame with operator chaining https://cmdlinetips.com/2018/02/how-to-subset-pandas-dataframe-based-on-values-of-a-column/
您可以使用以下代码替换 for 循环来拆分行并打印第 2,3 列:
splittedLine = line.split(",")
print(splittedLine[2],splittedLine[3])
这将打印:
14/08/2019 11:39 4
and so on.....
您可以使用字典来维护有关每行前缀第一次出现的信息,然后使用字典相应地打印信息。
此外,为每种类型(“M, 1”、“M, 2”等)维护其 header 的映射将使最终结果的打印更加容易。
import json
from pprint import pprint
input_string = """M, 1, 14/08/2019 11:39, 4, xxxx, name, “Initialization of the system, and loading
M, 1, 14/08/2019 11:40, 100, xxxx, name, “Open Connection”
M, 1, 14/08/2019 11:40, 100, xxxx, name, “Close Connection, and reboot”
S, 1, 14/08/2019 11:40, 6, xxxx, name, We created the user in the systems
S, 1, 14/08/2019 11:41, 3, xxxx, User logged in, User tal logged in
M, 1, 14/08/2019 11:39, 4, xxxx, name, “Initialization of the system”
S, 1, 14/08/2019 11:40, 6, New User, We created the user in the systems
S, 1, 14/08/2019 11:41, 3, User logged in, User tal logged in
S, 1, 14/08/2019 11:42, 3, User logged in, User tal logged in
M, 2, 14/08/2019 11:43, 100, yyy, yura, 12345, Message"""
# Maintain mapping between the type of line, and the header corresponding to it
header_mapping = {"M, 1": ["Datetime", "Error Level", "DeviceId", "UserId", "Message"],
"M, 2": ["Datetime", "Error Level", "DeviceId", "UserId", "MobileId", "Message"],
"S, 1": ["Datetime", "Error Level", "DeviceId", "Action", "Message"],
"S, 2": ["Datetime", "Error Level", "DeviceId", "IP", "Action", "Message"]
}
mapping = dict()
# Split the string into lines
lines = input_string.splitlines()
for line in lines:
split_line = line.split(", ") # Split each line using ", "
key = split_line[0] + ", " + split_line[1] # First two elements of the split list form your key
# Check if the key already exists. This is to ensure that our mapping dictionary contains only the first occurrence of each type.
if not mapping.get(key, None):
header = header_mapping[key]
line_info = dict(zip(header, split_line[2:])) # Create dictionary with header-value mapping
mapping[key] = line_info # Enter dictionary entry with type-values mapping
pprint(mapping)
"""
{'M, 1': {'Datetime': '14/08/2019 11:39',
'DeviceId': 'xxxx',
'Error Level': '4',
'Message': '“Initialization of the system',
'UserId': 'name'},
'M, 2': {'Datetime': '14/08/2019 11:43',
'DeviceId': 'yyy',
'Error Level': '100',
'Message': 'Message',
'MobileId': '12345',
'UserId': 'yura'},
'S, 1': {'Action': 'name',
'Datetime': '14/08/2019 11:40',
'DeviceId': 'xxxx',
'Error Level': '6',
'Message': 'We created the user in the systems'}}
"""