pandas read_clipboard 的 NumPy 等价物?

A NumPy equivalent of pandas read_clipboard?

例如,如果您遇到 question/answer 发布这样的数组:

[[ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]
 [16 17 18 19 20 21 22 23]
 [24 25 26 27 28 29 30 31]
 [32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47]
 [48 49 50 51 52 53 54 55]
 [56 57 58 59 60 61 62 63]]

如何将它加载到 REPL 会话中的变量中而不必在各处添加逗号?

对于一次性的场合,我可能会这样做:

  • 将包含数组的文本复制到剪贴板。
  • 在ipythonshell中输入s = """,但不要打return。
  • 粘贴剪贴板中的文本。
  • 键入结束三引号。

这给了我:

In [16]: s = """[[ 0  1  2  3  4  5  6  7]
    ...:  [ 8  9 10 11 12 13 14 15]
    ...:  [16 17 18 19 20 21 22 23]
    ...:  [24 25 26 27 28 29 30 31]
    ...:  [32 33 34 35 36 37 38 39]
    ...:  [40 41 42 43 44 45 46 47]
    ...:  [48 49 50 51 52 53 54 55]
    ...:  [56 57 58 59 60 61 62 63]]"""

然后使用np.loadtxt()如下:

In [17]: a = np.loadtxt([line.lstrip(' [').rstrip(']') for line in s.splitlines()], dtype=int)

In [18]: a
Out[18]: 
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55],
       [56, 57, 58, 59, 60, 61, 62, 63]])

如果你有 Pandas、pyperclip 或 something else to read from the clipboard,你可以使用这样的东西:

from pandas.io.clipboard import clipboard_get
# import pyperclip
import numpy as np
import re
import ast

def numpy_from_clipboard():
    inp = clipboard_get()
    # inp = pyperclip.paste()
    inp = inp.strip()
    # if it starts with "array(" we just need to remove the
    # leading "array(" and remove the optional ", dtype=xxx)"
    if inp.startswith('array('):
        inp = re.sub(r'^array\(', '', inp)
        dtype = re.search(r', dtype=(\w+)\)$', inp)
        if dtype:
            return np.array(ast.literal_eval(inp[:dtype.start()]), dtype=dtype.group(1))
        else:
            return np.array(ast.literal_eval(inp[:-1]))
    else:
        # In case it's the string representation it's a bit harder.
        # We need to remove all spaces between closing and opening brackets
        inp = re.sub(r'\]\s+\[', '],[', inp)
        # We need to remove all whitespaces following an opening bracket
        inp = re.sub(r'\[\s+', '[', inp)
        # and all leading whitespaces before closing brackets
        inp = re.sub(r'\s+\]', ']', inp)
        # replace all remaining whitespaces with ","
        inp = re.sub(r'\s+', ',', inp)
        return np.array(ast.literal_eval(inp))

然后读取您保存在剪贴板中的内容:

>>> numpy_from_clipboard()
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55],
       [56, 57, 58, 59, 60, 61, 62, 63]])

这应该能够从您的剪贴板解析(大多数)数组(str 以及 repr 数组)。它甚至应该适用于多行数组(其中 np.loadtxt 失败):

[[ 0.34866207  0.38494993  0.7053722   0.64586156  0.27607369  0.34850162
   0.20530567  0.46583039  0.52982216  0.92062115]
 [ 0.06973858  0.13249867  0.52419149  0.94707951  0.868956    0.72904737
   0.51666421  0.95239542  0.98487436  0.40597835]
 [ 0.66246734  0.85333546  0.072423    0.76936201  0.40067016  0.83163118
   0.45404714  0.0151064   0.14140024  0.12029861]
 [ 0.2189936   0.36662076  0.90078913  0.39249484  0.82844509  0.63609079
   0.18102383  0.05339892  0.3243505   0.64685352]
 [ 0.803504    0.57531309  0.0372428   0.8308381   0.89134864  0.39525473
   0.84138386  0.32848746  0.76247531  0.99299639]]

>>> numpy_from_clipboard()
array([[ 0.34866207,  0.38494993,  0.7053722 ,  0.64586156,  0.27607369,
         0.34850162,  0.20530567,  0.46583039,  0.52982216,  0.92062115],
       [ 0.06973858,  0.13249867,  0.52419149,  0.94707951,  0.868956  ,
         0.72904737,  0.51666421,  0.95239542,  0.98487436,  0.40597835],
       [ 0.66246734,  0.85333546,  0.072423  ,  0.76936201,  0.40067016,
         0.83163118,  0.45404714,  0.0151064 ,  0.14140024,  0.12029861],
       [ 0.2189936 ,  0.36662076,  0.90078913,  0.39249484,  0.82844509,
         0.63609079,  0.18102383,  0.05339892,  0.3243505 ,  0.64685352],
       [ 0.803504  ,  0.57531309,  0.0372428 ,  0.8308381 ,  0.89134864,
         0.39525473,  0.84138386,  0.32848746,  0.76247531,  0.99299639]])

但是我不太擅长正则表达式,所以这可能不是万无一失的,使用 ast.literal_eval 感觉有点尴尬(但它避免了自己进行解析)。

随时提出改进建议。