如果下一行匹配相同的模式,如何删除带有模式的行?
How to remove lines with a pattern if the next line matches the same pattern?
我有一个数据框,其中有一列包含每行票证的日志。这是日志的示例:
99645,
\Submitted',
'\Modifications made 2015/01/01',
'x_change0: --> info0',
'y_status1: --> info1',
'z_change2: --> info2',
'y_change3: --> info3',
'\Modifications made 2015/01/03',
'\Modifications made 2015/01/05',
'\Modifications made 2015/01/07',
'w_change0: --> info0',
'a_status1: --> info1',
'\Modifications made 2015/01/07',
.
.
.
我想删除所有后面没有更改的行。以下正则表达式匹配我要查找的内容 RegEx101:
pattern = '(?sm)Modifications\s*((?!Modifications\s*).)*'
re.findall(pattern, dataframe['log'])
数据帧中每个单元格的预期结果['log']:
Modifications made 2015/01/01',
'change0: --> info0',
'change1: --> info1',
'change2: --> info2',
'change3: --> info3',
'Modifications made 2015/01/07',
'change0: --> info0',
'change1: --> info1',
'
如何删除单元格中不需要的行?或者如何用过滤后的字符串替换单元格内的字符串?
使用 pd.Series.shift
和 str.startswith
函数进行复杂过滤。
初始数据帧:
In [87]: df
Out[87]:
log
0 '\Modifications made 2015/01/01',
1 'change0: --> info0',
2 'change1: --> info1',
3 'change2: --> info2',
4 'change3: --> info3',
5 '\Modifications made 2015/01/03',
6 '\Modifications made 2015/01/05',
7 '\Modifications made 2015/01/07',
8 'change0: --> info0',
9 'change1: --> info1',
10 '\Modifications made 2015/01/07',
根据条件删除行(添加 inplace=True
papam 以修改 就地):
In [88]: df.drop(df[(df.log.str.startswith("'\Modifications")) & ((df.log.shift(-1).str.startswith("'\Modificat
...: ions")) | (~df.log.shift(-1).str.startswith("'change", na=False)) | df.log.shift(-1).isna())].index)
Out[88]:
log
0 '\Modifications made 2015/01/01',
1 'change0: --> info0',
2 'change1: --> info1',
3 'change2: --> info2',
4 'change3: --> info3',
7 '\Modifications made 2015/01/07',
8 'change0: --> info0',
9 'change1: --> info1',
使用@Code Maniac 的 RegEx 解决方案解决:
(?sm)Modifications[^,]+,(?:(?!^\s*'\Modifications).)*\b
.
用以下循环替换单元格字符串:
pattern = r"(?sm)Modifications[^,]+,(?:(?!^\s*'\Modifications).)*\b"
pattern = re.compile(pattern=pattern)
df['tickethist'] = ""
for i in range(len(df['log'])):
search = []
log = df.at[i, 'log']
for match in pattern.findall(str(log)):
search.append(match)
df.at[i, 'tickethist'] = search
我有一个数据框,其中有一列包含每行票证的日志。这是日志的示例:
99645,
\Submitted',
'\Modifications made 2015/01/01',
'x_change0: --> info0',
'y_status1: --> info1',
'z_change2: --> info2',
'y_change3: --> info3',
'\Modifications made 2015/01/03',
'\Modifications made 2015/01/05',
'\Modifications made 2015/01/07',
'w_change0: --> info0',
'a_status1: --> info1',
'\Modifications made 2015/01/07',
.
.
.
我想删除所有后面没有更改的行。以下正则表达式匹配我要查找的内容 RegEx101:
pattern = '(?sm)Modifications\s*((?!Modifications\s*).)*'
re.findall(pattern, dataframe['log'])
数据帧中每个单元格的预期结果['log']:
Modifications made 2015/01/01',
'change0: --> info0',
'change1: --> info1',
'change2: --> info2',
'change3: --> info3',
'Modifications made 2015/01/07',
'change0: --> info0',
'change1: --> info1',
'
如何删除单元格中不需要的行?或者如何用过滤后的字符串替换单元格内的字符串?
使用 pd.Series.shift
和 str.startswith
函数进行复杂过滤。
初始数据帧:
In [87]: df
Out[87]:
log
0 '\Modifications made 2015/01/01',
1 'change0: --> info0',
2 'change1: --> info1',
3 'change2: --> info2',
4 'change3: --> info3',
5 '\Modifications made 2015/01/03',
6 '\Modifications made 2015/01/05',
7 '\Modifications made 2015/01/07',
8 'change0: --> info0',
9 'change1: --> info1',
10 '\Modifications made 2015/01/07',
根据条件删除行(添加 inplace=True
papam 以修改 就地):
In [88]: df.drop(df[(df.log.str.startswith("'\Modifications")) & ((df.log.shift(-1).str.startswith("'\Modificat
...: ions")) | (~df.log.shift(-1).str.startswith("'change", na=False)) | df.log.shift(-1).isna())].index)
Out[88]:
log
0 '\Modifications made 2015/01/01',
1 'change0: --> info0',
2 'change1: --> info1',
3 'change2: --> info2',
4 'change3: --> info3',
7 '\Modifications made 2015/01/07',
8 'change0: --> info0',
9 'change1: --> info1',
使用@Code Maniac 的 RegEx 解决方案解决:
(?sm)Modifications[^,]+,(?:(?!^\s*'\Modifications).)*\b
.
用以下循环替换单元格字符串:
pattern = r"(?sm)Modifications[^,]+,(?:(?!^\s*'\Modifications).)*\b"
pattern = re.compile(pattern=pattern)
df['tickethist'] = ""
for i in range(len(df['log'])):
search = []
log = df.at[i, 'log']
for match in pattern.findall(str(log)):
search.append(match)
df.at[i, 'tickethist'] = search