Python 正则表达式 - 通过为所有匹配的模式添加字符来更新文件

Question

我正在尝试创建一个脚本来：

阅读我文件夹中的所有文本文件
找到匹配模式 [r'\d\d\d\d'+"H"] 的单词（例如 1234H）
将它们替换成（例如12:34:00）
保存文件

目前我的代码是这样的，不知道哪里错了。请多多指教谢谢！

import os
import re

path = r'C:\Users\CL\Desktop\regex'

for root, dirs, files in os.walk(path):
    for file in files:
        if file.endswith('.txt'): #find all .txt files
            path = os.path.join(root, file)
            f = open(path,'a')
            pattern = r'\d\d\d\d'+"H" #pattern
            replacewords = re.findall(pattern, f) #find all words with this pattern
            
            ...... #replace matched words with eg. 12:23:00
            
            f.write() #save file
            f.close()

sample text content:

1111H, 1234H, 1115H

Answer 1

您可以使用

import os, re

path = r'C:\Users\CL\Desktop\regex'

for root, dirs, files in os.walk(path):
    for file in files:
        if file.lower().endswith('.txt'): #find all .txt / .TXT files
            path = os.path.join(root, file)
            pattern = r'(\d{2})(\d{2})H' # pattern
            with open(path, 'r+') as f:  # Read and update
                contents = re.sub(pattern, r'::00' f.read())
                f.seek(0)
                f.truncate()
                f.write(contents)

注意:

if file.lower().endswith('.txt') 使文本文件搜索不区分大小写
(\d{2})(\d{2})H 模式匹配并捕获第 1 组中的前两位数字和 H 之前的接下来的两位数字进入第 2 组
替换时，</code>指第1组值，<code>指第2组值
文件读取模式设置为r+，以便文件可以读取和更新。
f.seek(0) 和 f.truncate() 允许 re-writing 具有更新内容的文件内容。

Python 正则表达式 - 通过为所有匹配的模式添加字符来更新文件

Python regex - updating file by adding character for all the matched pattern

python

regex

operating-system