pandas read_clipboard 的 NumPy 等价物?
A NumPy equivalent of pandas read_clipboard?
例如,如果您遇到 question/answer 发布这样的数组:
[[ 0 1 2 3 4 5 6 7]
[ 8 9 10 11 12 13 14 15]
[16 17 18 19 20 21 22 23]
[24 25 26 27 28 29 30 31]
[32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47]
[48 49 50 51 52 53 54 55]
[56 57 58 59 60 61 62 63]]
如何将它加载到 REPL 会话中的变量中而不必在各处添加逗号?
对于一次性的场合,我可能会这样做:
- 将包含数组的文本复制到剪贴板。
- 在ipythonshell中输入
s = """
,但不要打return。
- 粘贴剪贴板中的文本。
- 键入结束三引号。
这给了我:
In [16]: s = """[[ 0 1 2 3 4 5 6 7]
...: [ 8 9 10 11 12 13 14 15]
...: [16 17 18 19 20 21 22 23]
...: [24 25 26 27 28 29 30 31]
...: [32 33 34 35 36 37 38 39]
...: [40 41 42 43 44 45 46 47]
...: [48 49 50 51 52 53 54 55]
...: [56 57 58 59 60 61 62 63]]"""
然后使用np.loadtxt()
如下:
In [17]: a = np.loadtxt([line.lstrip(' [').rstrip(']') for line in s.splitlines()], dtype=int)
In [18]: a
Out[18]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53, 54, 55],
[56, 57, 58, 59, 60, 61, 62, 63]])
如果你有 Pandas、pyperclip 或 something else to read from the clipboard,你可以使用这样的东西:
from pandas.io.clipboard import clipboard_get
# import pyperclip
import numpy as np
import re
import ast
def numpy_from_clipboard():
inp = clipboard_get()
# inp = pyperclip.paste()
inp = inp.strip()
# if it starts with "array(" we just need to remove the
# leading "array(" and remove the optional ", dtype=xxx)"
if inp.startswith('array('):
inp = re.sub(r'^array\(', '', inp)
dtype = re.search(r', dtype=(\w+)\)$', inp)
if dtype:
return np.array(ast.literal_eval(inp[:dtype.start()]), dtype=dtype.group(1))
else:
return np.array(ast.literal_eval(inp[:-1]))
else:
# In case it's the string representation it's a bit harder.
# We need to remove all spaces between closing and opening brackets
inp = re.sub(r'\]\s+\[', '],[', inp)
# We need to remove all whitespaces following an opening bracket
inp = re.sub(r'\[\s+', '[', inp)
# and all leading whitespaces before closing brackets
inp = re.sub(r'\s+\]', ']', inp)
# replace all remaining whitespaces with ","
inp = re.sub(r'\s+', ',', inp)
return np.array(ast.literal_eval(inp))
然后读取您保存在剪贴板中的内容:
>>> numpy_from_clipboard()
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53, 54, 55],
[56, 57, 58, 59, 60, 61, 62, 63]])
这应该能够从您的剪贴板解析(大多数)数组(str
以及 repr
数组)。它甚至应该适用于多行数组(其中 np.loadtxt
失败):
[[ 0.34866207 0.38494993 0.7053722 0.64586156 0.27607369 0.34850162
0.20530567 0.46583039 0.52982216 0.92062115]
[ 0.06973858 0.13249867 0.52419149 0.94707951 0.868956 0.72904737
0.51666421 0.95239542 0.98487436 0.40597835]
[ 0.66246734 0.85333546 0.072423 0.76936201 0.40067016 0.83163118
0.45404714 0.0151064 0.14140024 0.12029861]
[ 0.2189936 0.36662076 0.90078913 0.39249484 0.82844509 0.63609079
0.18102383 0.05339892 0.3243505 0.64685352]
[ 0.803504 0.57531309 0.0372428 0.8308381 0.89134864 0.39525473
0.84138386 0.32848746 0.76247531 0.99299639]]
>>> numpy_from_clipboard()
array([[ 0.34866207, 0.38494993, 0.7053722 , 0.64586156, 0.27607369,
0.34850162, 0.20530567, 0.46583039, 0.52982216, 0.92062115],
[ 0.06973858, 0.13249867, 0.52419149, 0.94707951, 0.868956 ,
0.72904737, 0.51666421, 0.95239542, 0.98487436, 0.40597835],
[ 0.66246734, 0.85333546, 0.072423 , 0.76936201, 0.40067016,
0.83163118, 0.45404714, 0.0151064 , 0.14140024, 0.12029861],
[ 0.2189936 , 0.36662076, 0.90078913, 0.39249484, 0.82844509,
0.63609079, 0.18102383, 0.05339892, 0.3243505 , 0.64685352],
[ 0.803504 , 0.57531309, 0.0372428 , 0.8308381 , 0.89134864,
0.39525473, 0.84138386, 0.32848746, 0.76247531, 0.99299639]])
但是我不太擅长正则表达式,所以这可能不是万无一失的,使用 ast.literal_eval
感觉有点尴尬(但它避免了自己进行解析)。
随时提出改进建议。
例如,如果您遇到 question/answer 发布这样的数组:
[[ 0 1 2 3 4 5 6 7]
[ 8 9 10 11 12 13 14 15]
[16 17 18 19 20 21 22 23]
[24 25 26 27 28 29 30 31]
[32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47]
[48 49 50 51 52 53 54 55]
[56 57 58 59 60 61 62 63]]
如何将它加载到 REPL 会话中的变量中而不必在各处添加逗号?
对于一次性的场合,我可能会这样做:
- 将包含数组的文本复制到剪贴板。
- 在ipythonshell中输入
s = """
,但不要打return。 - 粘贴剪贴板中的文本。
- 键入结束三引号。
这给了我:
In [16]: s = """[[ 0 1 2 3 4 5 6 7]
...: [ 8 9 10 11 12 13 14 15]
...: [16 17 18 19 20 21 22 23]
...: [24 25 26 27 28 29 30 31]
...: [32 33 34 35 36 37 38 39]
...: [40 41 42 43 44 45 46 47]
...: [48 49 50 51 52 53 54 55]
...: [56 57 58 59 60 61 62 63]]"""
然后使用np.loadtxt()
如下:
In [17]: a = np.loadtxt([line.lstrip(' [').rstrip(']') for line in s.splitlines()], dtype=int)
In [18]: a
Out[18]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53, 54, 55],
[56, 57, 58, 59, 60, 61, 62, 63]])
如果你有 Pandas、pyperclip 或 something else to read from the clipboard,你可以使用这样的东西:
from pandas.io.clipboard import clipboard_get
# import pyperclip
import numpy as np
import re
import ast
def numpy_from_clipboard():
inp = clipboard_get()
# inp = pyperclip.paste()
inp = inp.strip()
# if it starts with "array(" we just need to remove the
# leading "array(" and remove the optional ", dtype=xxx)"
if inp.startswith('array('):
inp = re.sub(r'^array\(', '', inp)
dtype = re.search(r', dtype=(\w+)\)$', inp)
if dtype:
return np.array(ast.literal_eval(inp[:dtype.start()]), dtype=dtype.group(1))
else:
return np.array(ast.literal_eval(inp[:-1]))
else:
# In case it's the string representation it's a bit harder.
# We need to remove all spaces between closing and opening brackets
inp = re.sub(r'\]\s+\[', '],[', inp)
# We need to remove all whitespaces following an opening bracket
inp = re.sub(r'\[\s+', '[', inp)
# and all leading whitespaces before closing brackets
inp = re.sub(r'\s+\]', ']', inp)
# replace all remaining whitespaces with ","
inp = re.sub(r'\s+', ',', inp)
return np.array(ast.literal_eval(inp))
然后读取您保存在剪贴板中的内容:
>>> numpy_from_clipboard()
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53, 54, 55],
[56, 57, 58, 59, 60, 61, 62, 63]])
这应该能够从您的剪贴板解析(大多数)数组(str
以及 repr
数组)。它甚至应该适用于多行数组(其中 np.loadtxt
失败):
[[ 0.34866207 0.38494993 0.7053722 0.64586156 0.27607369 0.34850162
0.20530567 0.46583039 0.52982216 0.92062115]
[ 0.06973858 0.13249867 0.52419149 0.94707951 0.868956 0.72904737
0.51666421 0.95239542 0.98487436 0.40597835]
[ 0.66246734 0.85333546 0.072423 0.76936201 0.40067016 0.83163118
0.45404714 0.0151064 0.14140024 0.12029861]
[ 0.2189936 0.36662076 0.90078913 0.39249484 0.82844509 0.63609079
0.18102383 0.05339892 0.3243505 0.64685352]
[ 0.803504 0.57531309 0.0372428 0.8308381 0.89134864 0.39525473
0.84138386 0.32848746 0.76247531 0.99299639]]
>>> numpy_from_clipboard()
array([[ 0.34866207, 0.38494993, 0.7053722 , 0.64586156, 0.27607369,
0.34850162, 0.20530567, 0.46583039, 0.52982216, 0.92062115],
[ 0.06973858, 0.13249867, 0.52419149, 0.94707951, 0.868956 ,
0.72904737, 0.51666421, 0.95239542, 0.98487436, 0.40597835],
[ 0.66246734, 0.85333546, 0.072423 , 0.76936201, 0.40067016,
0.83163118, 0.45404714, 0.0151064 , 0.14140024, 0.12029861],
[ 0.2189936 , 0.36662076, 0.90078913, 0.39249484, 0.82844509,
0.63609079, 0.18102383, 0.05339892, 0.3243505 , 0.64685352],
[ 0.803504 , 0.57531309, 0.0372428 , 0.8308381 , 0.89134864,
0.39525473, 0.84138386, 0.32848746, 0.76247531, 0.99299639]])
但是我不太擅长正则表达式,所以这可能不是万无一失的,使用 ast.literal_eval
感觉有点尴尬(但它避免了自己进行解析)。
随时提出改进建议。