从字符串转换为 base-64 中的数字

Converting from a string to a number in base-64

所以,我正在尝试编写一个程序来解码 6 个字符的 base-64 数字。

问题陈述如下:

Return the 36-bit number represented as a base-64 number in reverse order by the 6-character string s where the order of the 64 numerals is: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+

decode('000000') → 0

decode('gR1iC9') → 9876543210

decode('++++++') → 68719476735

我想在没有字符串的情况下执行此操作。

最简单的方法是创建以下函数的反函数:

def get_digit(d):
    ''' Convert a base 64 digit to the desired character '''
    if 0 <= d <= 9:
        # 0 - 9
        c = 48 + d
    elif 10 <= d <= 35:
        # A - Z
        c = 55 + d
    elif 36 <= d <= 61:
        # a - z
        c = 61 + d
    elif d == 62:
        # -
        c = 45
    elif d == 63:
        # +
        c = 43
    else:
        # We should never get here
        raise ValueError('Invalid digit for base 64: ' + str(d)) 
    return chr(c)

# Test `digit`
print(''.join([get_digit(d) for d in range(64)]))

def encode(n):
    ''' Convert integer n to base 64 '''
    out = []
    while n:
        n, r = n // 64, n % 64
        out.append(get_digit(r))
    while len(out) < 6:
        out.append('0')
    return ''.join(out)

# Test `encode`
for i in (0, 9876543210, 68719476735):
    print(i, encode(i))

输出

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+
0 000000
9876543210 gR1iC9
68719476735 ++++++

实际上来自 页面上的 PM 2Ring。

如何编写这个程序的反函数?

一开始:

上面get_digits的倒数如下:

def inv_get_digit(c):

    if 0 <= c <= 9:
        d = ord(c) - 48
    elif 'A' <= c <= 'Z':
        d = ord(c) - 55
    elif 'a' <= c <= 'z'
        d = ord(c) - 61
    elif c == '+':
        d = 63
    elif c == '-':
        d = 62
    else:
        raise ValueError('Invalid Input' + str(c))
    return d


def decode(n):

    out = []
    while n:
        n, r= n % 10, n ** (6-len(str))
        out.append(get_digit(r))
    while len(out) < 10:
        out.append('0')
    return ''.join(out)

这是一个结合了 和一些新代码以执行逆运算的程序。

您的 inv_get_digit 函数中存在语法错误:您在 elif 行末尾遗漏了冒号。并且没有必要做 str(c),因为 c 已经是一个字符串。

恐怕您的 decode 功能没有多大意义。它应该接受一个字符串作为输入,return 一个整数。请查看下面的工作版本。

def get_digit(d):
    ''' Convert a base 64 digit to the desired character '''
    if 0 <= d <= 9:
        # 0 - 9
        c = 48 + d
    elif 10 <= d <= 35:
        # A - Z
        c = 55 + d
    elif 36 <= d <= 61:
        # a - z
        c = 61 + d
    elif d == 62:
        # -
        c = 45
    elif d == 63:
        # +
        c = 43
    else:
        # We should never get here
        raise ValueError('Invalid digit for base 64: ' + str(d)) 
    return chr(c)

print('Testing get_digit') 
digits = ''.join([get_digit(d) for d in range(64)])
print(digits)

def inv_get_digit(c):
    if '0' <= c <= '9':
        d = ord(c) - 48
    elif 'A' <= c <= 'Z':
        d = ord(c) - 55
    elif 'a' <= c <= 'z':
        d = ord(c) - 61
    elif c == '-':
        d = 62
    elif c == '+':
        d = 63
    else:
        raise ValueError('Invalid input: ' + c)
    return d

print('\nTesting inv_get_digit') 
nums = [inv_get_digit(c) for c in digits]
print(nums == list(range(64)))

def encode(n):
    ''' Convert integer n to base 64 '''
    out = []
    while n:
        n, r = n // 64, n % 64
        out.append(get_digit(r))
    while len(out) < 6:
        out.append('0')
    return ''.join(out)

print('\nTesting encode')
numdata = (0, 9876543210, 68719476735)
strdata = []
for i in numdata:
    s = encode(i)
    print(i, s)
    strdata.append(s)

def decode(s):
    out = []
    n = 0
    for c in reversed(s):
        d = inv_get_digit(c)
        n = 64 * n + d
    return n

print('\nTesting decode')
for s, oldn in zip(strdata, numdata):
    n = decode(s)
    print(s, n, n == oldn)

输出

Testing get_digit
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+

Testing inv_get_digit
True

Testing encode
0 000000
9876543210 gR1iC9
68719476735 ++++++

Testing decode
000000 0 True
gR1iC9 9876543210 True
++++++ 68719476735 True

I would like to do this WITHOUT strings.

首先,您需要弄清楚这是什么意思。您提供的工作编码器使用这些字符串:

out.append('0')
return ''.join(out)

并且接受的解决方案添加了这些字符串:

digits = ''.join([get_digit(d) for d in range(64)])
if '0' <= c <= '9':
elif 'A' <= c <= 'Z':
elif 'a' <= c <= 'z':
elif c == '-':
elif c == '+':

你的意思是单个个字符串可以,多个字符串不可以?还是您的意思是您不想使用 str 作为数据结构并希望尽量减少字符串操作?

我觉得您的解决方案以及基于它构建的公认解决方案在编码和解码时执行了太多操作。我建议预先做一些工作来构建数据结构,并在处理数据时减少工作量:

from string import digits, ascii_lowercase, ascii_uppercase

BASE10_TO_BASE64 = list(digits + ascii_uppercase + ascii_lowercase + '-' + '+')

BASE64_TO_BASE10 = {base64: base10 for base10, base64 in enumerate(BASE10_TO_BASE64)}

ZEROS = ['0'] * 6

def encode(number):
    ''' Convert base 10 int to reversed base 64 str '''

    characters = []

    while number:
        number, remainder = divmod(number, 64)
        characters.append(BASE10_TO_BASE64[remainder])

    return ''.join(characters + ZEROS[:max(len(ZEROS) - len(characters), 0)])

def decode(string):
    ''' Convert reversed base 64 str to base 10 int '''

    number = 0

    for character in string[::-1]:
        digit = BASE64_TO_BASE10[character]
        number = 64 * number + digit

    return number

if __name__ == "__main__":

    NUMBERS = (4096, 9876543210, 68719476735)
    strings = []

    print("Encode:")
    for number in NUMBERS:
        string = encode(number)
        print(number, string)
        strings.append(string)

    print("\nDecode:")
    for string in strings:
        number = decode(string)
        print(string, number)

编码为 reversed base 64 数字和零填充使程序复杂化但不添加任何内容。我们通常期望'100'代表底数的平方,但这里不是。

输出

> python3 test.py
Encode:
4096 001000
9876543210 gR1iC9
68719476735 ++++++

Decode:
001000 4096
gR1iC9 9876543210
++++++ 68719476735
>