多个字符替换为正则表达式

Question

需要用正则表达式替换大量字符：

  before -->  h e l l o
              | | | | |
  result -->  a b c c d

怎么样？ :)

实际上，需要将 html 中的所有字符 Unicode (UTF-8) 替换为 ASCII (Unicode Escaped) :) 这个问题只是一个简化的例子

更新

好的，老是忘记了可以通过正则表达式在文本中搜索但是不能替换，问题已解决，谢谢

Answer 1

这应该符合您的目的。我把它保存在 utf-to-ascii.py.

#!/usr/bin/env python

import sys

for c in sys.stdin.read().decode('UTF-8'):
    charcode = ord(c)
    if charcode > 127:
        sys.stdout.write('\u%04x'%(charcode))
    else:
        sys.stdout.write(c)

我用一个名为 textdoc.txt 的文件对其进行了测试，其中包含以下内容：

hello ד blah blah

我运行是这样的:

$ ./utf-to-ascii.py <textdoc.txt 
hello \u05d3 blah blah

要将该输出保存到文件中，您需要运行这样做：

$ ./utf-to-ascii.py < textdoc.txt > textdoc.transformed.txt

多个字符替换为正则表达式

multiple char replace with regex

regex

sublimetext

sublimetext2