将普通字符串（应该是二进制字符串）转换回二进制字符串

Question

所以我目前正在使用库：simple-crypt。

我已经成功地将某个输入字符串转换为它的二进制字符串。

        pw_data = input("Please type in your p!")  # enter password
        pw_data_confirmed = input("Please confirm!")
        _platform = input("Please tell me the platform!")  # belonging platform
        if pw_data == pw_data_confirmed:  # check confirmed pw
            print("Received!")

            salt_data = "AbCdEfkhl"  # salt key

            ciphertext = encrypt(salt_data, pw_data.encode("utf8"))  # encrypt pw with salt key

二进制字符串例如：b'sc\x00\x02X\xd8\x8ez\xbfB\x03s\xc5\x8bm\xecp\x19\x8d\xd6lqW\xf1\xc3\xa4y\x8f\x1aW)\x9bX\xfc\x0e\xa4\xf2ngJj/]{\x80\x06-\x07\x8cQ\xeef\x0b\x02?\x86\x19\x98\x94eW\x08}\x1d8\xdb\xe57\xf7\x97\x81\xb6\xc7\x08\n^\xc9\xc0'

此二进制字符串将存储在 word 文档中。

现在的问题是：我一读文档得到这个特定的二进制字符串，它就认不出来了作为二进制字符串了。相反，它现在是字符串数据类型。

p_loc = input("Which platform do you need?")
doc_existing = docx.Document(r"xxx")
text = []
for i in doc_existing.paragraphs:
    text.append(i.text)

for pos,i in enumerate(text):
    if i == p_loc:
    len_pos = len(text[pos+1])
    p_code = text[pos+1][2:len_pos-1]  # get the binary string which is of type ordinary string
print(p_code.encode("utf8"))  # when I apply .encode , another \ is added so I have for my binary code two \


salt_data = "AbCdEfkhl"

plain = decrypt(salt_data, p_code)

print(plain)

p_code 没有 .encode 语句（作为字符串，而不是字节串！）： sc\x00\x02X\xd8\x8ez\xbfB\x03s\xc5\x8bm\xecp\x19\x8d\xd6lqW\xf1\xc3\xa4y\x8f\x1aW)\x9bX\xfc\x0e\xa4\xf2ngJj/]{\x80\x06-\x07\x8cQ\xeef\x0b\x02?\x86\x19\x98\x94eW\x08}\x1d8\xdb\xe57\xf7\x97\x81\xb6\xc7\x08\n^\xc9\xc0

当我现在打印出 p_code.encode("utf8") 时，我得到以下结果： b'sc\\x00\\x02X\\xd8\\x8ez\\xbfB\\x03s\\xc5\\x8bm\\xecp\\x19\\x8d\\xd6lqW\\xf1\\xc3\\xa4y\\ x8f\\x1aW)\\x9bX\\xfc\\x0e\\xa4\\xf2ngJj/]{\\x80\\x06-\\x07\\x8cQ\\xeef\\x0b\\x02?\\x86\ \x19\\x98\\x94eW\\x08}\\x1d8\\xdb\\xe57\\xf7\\x97\\x81\\xb6\\xc7\\x08\\n^\\xc9\\xc0'

所以问题是，如果将第二个二进制字符串与原始二进制字符串进行比较，它会向其添加第二个 \。因此，我无法解码此二进制字符串，因为它无法将其识别为原始二进制代码字符串。

所以我的问题是：有没有一种随意的方法可以简单地将已经是二进制样式的字符串转换回二进制字符串，所以它是一样的？或者有什么方法可以删除第二个 \ 以便我再次拥有原始二进制字符串？

非常感谢您的帮助！！

Answer 1

好的。因此，当您执行 f"{ciphertext}" 时，您是在告诉 python 将这些字节的 字符串表示形式 作为文本存储在文档中。

例如

>>> b = b"\x00\x01\x65\x66"
>>> print(f"{b}")
b'\x00\x01ef'

您（可能）真的不想将 b'\x00\x01ef' 存储在您的 word 文档中。以文本形式存储二进制数据的一个很好的通用方法是使用不同的编码。 Base64 是一种常用的编码，旨在以基于文本的形式存储二进制数据。

有关详细信息，请参阅 https://docs.python.org/3/library/base64.html。

在你的情况下，你会做类似的事情

import base64

cipher_b64_b = base64.b64encode(ciphertext)
cipher_b64 = cipher_b64_b.decode() # cipher_b64 is now a string.
# Now store this cipher_b64 string in your word document

...

# Now you fetch p_code (which is now a base64 string) from your word doc
cipher_b64_b = p_code.encode()
cipher = base64.b64decode(cipher_b64_b)

这将生成您的原始二进制密文。 word文档将包含一个base64编码的字符串，如“AAFlZg==”，这避免了word文档中转义序列等问题。

将普通字符串（应该是二进制字符串）转换回二进制字符串

Transform ordinary string (which is supposed to be a binary string) back into a binary string

python

string

binary

encode

python-simple-crypt