使用 Python 和 Notepad++ Unicode 格式的文本文件批量单词替换

Question

我面临的问题是 Unicode 文本 file.Notepad++ 插件>python 脚本。下面的代码完美地工作并替换包含 wordlist.txt 的单词。只有它适用于英语。非 ASCII 无法搜索。我试过 With open('C:\Users\Desktop\wordlist.txt') as f: --> with io.open('C:\Users\Desktop\wordlist.txt', encoding='utf-8') as f: 但是 notepad++ 没有为 Unicode 文字文本文件执行。现在我需要帮助如何传递 unicode 字符串进行搜索。在下面的代码中。否则请帮助 python 代码在 A.text 文件中使用“单词列表查找并用分隔符替换 B.Text 文件”。

With open('C:\Users\Desktop\wordlist.txt') as f:
    for l in f:
        s = l.split()
        editor.rereplace(r'\b' + s[0] + r'\b', s[1])

Answer 1

不要使用会导致 utf8 字符出现问题的单词边界 \b。使用 lookaround:

import re

with open('D:\temp\wordlist.txt') as f:
    for l in f:
        s = l.split()
        editor.rereplace(r'(?<!\S)' + s[0] + r'(?!\S)', '\t' + s[1])

其中：

(?<!\S) 是一个负面回顾，确保在要修改的单词
(?!\S) 是一个否定的前瞻，确保在要修改的词后没有 NON space

通过你的 2 个示例文件，我得到：

    मारुती
नामशिवाया 
    जयश्रीराम 
जयश्रीराम

注意：为了可读性，我在修饰词之前添加了表格，请将其删除以供您应用。

截图：

使用 Python 和 Notepad++ Unicode 格式的文本文件批量单词替换

Text file batch word replacement using Python and Notepad++ Unicode format

python

notepad++