可以在 Python 中制作自定义字符串文字前缀吗？

Question

假设我有一个从 str 派生的自定义 class implements/overrides 一些方法：

class mystr(str):
    # just an example for a custom method:
    def something(self):
        return "anything"

现在我必须通过在构造函数中传递一个字符串来手动创建 mystr 的实例：

ms1 = mystr("my string")

s = "another string"
ms2 = mystr(s)

这还不错，但它导致了这样的想法，即使用类似于 b'bytes string' 或 r'raw string' 或 u'unicode string' 的自定义字符串前缀会很酷。

是否有可能在 Python 到 create/register 中使用像 m 这样的自定义字符串文字前缀，以便文字 m'my string' 产生 [= 的新实例14=]?
或者这些前缀是硬编码到 Python 解释器中的吗？

Answer 1

这些前缀在解释器中是硬编码的，您不能注册更多的前缀。

但是，您可以做的是使用 自定义源编解码器 预处理您的 Python 文件。这是一个相当巧妙的 hack，需要您注册自定义编解码器，并理解和应用源代码转换。

Python 允许您指定源代码的编码，并在顶部添加特殊注释：

# coding: utf-8

会告诉 Python 源代码使用 UTF-8 编码，并在解析前相应地解码文件。 Python 在 codecs 模块注册表中查找编解码器。 您可以注册自己的编解码器。

pyxl project uses this trick to parse out HTML syntax from Python files to replace them with actual Python syntax to build that HTML, all in a 'decoding' step. See the codec package in that project, where the register module registers a custom codec search function that transforms source code before Python actually parses and compiles it. A custom .pth file is installed into your site-packages directory to load this registration step at Python startup time. Another project that does the same thing to parse out Ruby-style string formatting is interpy。

然后你所要做的就是构建这样一个编解码器，它将解析一个 Python 源文件（对其进行标记化，可能使用 tokenize module）并用你的自定义字符串文字替换以 mystr(<string literal>) 调用为前缀。您要解析的任何文件都用 # coding: yourcustomcodec.

标记

我将把那部分留作 reader 的练习。祝你好运！

注意，这个转换的结果然后被编译成字节码，被缓存；您的转换只需运行一次每个源代码修订，使用您的编解码器的模块的所有其他导入将加载缓存的字节码。

Answer 2

虽然上面提到的解决方法很好，但它们可能很危险。黑掉你的 python 真的不是个好主意。虽然你不能真正做一个前缀，否则，您可以执行以下操作：

class MyString(str):
    def something(self):
        return MyString("anything")

m = MyString

# The you can do:
m("hi")
# Rather than:
# m"hi"

这可能是您能找到的最安全的解决方案。两个括号实际上不需要输入那么多内容，代码的读者也不会感到困惑。

Answer 3

可以使用运算符重载将 str 隐式转换为自定义 class

class MyString(str):
    def __or__( self, a ):
        return MyString(self + a)

m = MyString('')
print( m, type(m) )
#('', <class 'MyString'>)
print m|'a', type(m|'a')
#('a', <class 'MyString'>)

这避免了使用括号有效地模拟带有一个额外字符的字符串前缀 ─ 我选择 | 但它也可以是 & 或其他二进制比较运算符。

可以在 Python 中制作自定义字符串文字前缀吗？

Possible to make custom string literal prefixes in Python?

python

string

prefix

string-literals