比这更 pythonic (shorter/efficient) 用结构化字符串的内容填充结构化数组的方法?
More pythonic (shorter/efficient) way of filling a structured array with the content of a structured string than this?
我需要将格式化的字符串放入结构化数组(字符串是 JSON 格式化的二维 table,其中所有列都是 objects
)。现在,我这样做:
import json
import numpy
json_string = '{"SYM": ["this_string","this_string","this_string"],"DATE": ["NaN","NaN","NaN"],"YEST": ["NaN","NaN","NaN"],"other_DATE": ["NaN","NaN","NaN"],"SIZE": ["NaN","NaN","NaN"],"ACTIVITY": ["2019-09-27 14:18:28.000700 UTC","2019-09-27 14:18:28.000700 UTC","2019-09-27 14:18:28.000600 UTC"]}'
all_content = json.loads(json_string)
dtype = numpy.dtype(dict(names = list(all_content.keys()), formats = ['O'] * len(all_content.keys())))
this_bucket = numpy.empty(shape = [len(all_content[next(iter(all_content.keys()))]), ],
dtype = dtype)
for key in all_content.keys():
this_bucket[key][:] = all_content[key]
但这似乎非常冗长。有直接的方法吗?
基本上有两种设置结构化数组值的方法 - 逐字段赋值(您这样做),以及使用元组列表,我将演示:
In [180]: all_content
Out[180]:
{'SYM': ['this_string', 'this_string', 'this_string'],
'DATE': ['NaN', 'NaN', 'NaN'],
'YEST': ['NaN', 'NaN', 'NaN'],
'other_DATE': ['NaN', 'NaN', 'NaN'],
'SIZE': ['NaN', 'NaN', 'NaN'],
'ACTIVITY': ['2019-09-27 14:18:28.000700 UTC',
'2019-09-27 14:18:28.000700 UTC',
'2019-09-27 14:18:28.000600 UTC']}
制作一个对象dtype数组,主要是为了'column'索引方便。
In [181]: arr = np.array(list(all_content.items()))
In [182]: arr
Out[182]:
array([['SYM', list(['this_string', 'this_string', 'this_string'])],
['DATE', list(['NaN', 'NaN', 'NaN'])],
['YEST', list(['NaN', 'NaN', 'NaN'])],
['other_DATE', list(['NaN', 'NaN', 'NaN'])],
['SIZE', list(['NaN', 'NaN', 'NaN'])],
['ACTIVITY',
list(['2019-09-27 14:18:28.000700 UTC', '2019-09-27 14:18:28.000700 UTC', '2019-09-27 14:18:28.000600 UTC'])]],
dtype=object)
定义 dtype - 像您一样,或使用:
In [183]: dt = np.dtype(list(zip(arr[:,0],['O']*arr.shape[0])))
In [184]: dt
Out[184]: dtype([('SYM', 'O'), ('DATE', 'O'), ('YEST', 'O'), ('other_DATE', 'O'), ('SIZE', 'O'), ('ACTIVITY', 'O')])
List 'transpose' 生成一个元组列表:
In [185]: list(zip(*arr[:,1]))
Out[185]:
[('this_string', 'NaN', 'NaN', 'NaN', 'NaN', '2019-09-27 14:18:28.000700 UTC'),
('this_string', 'NaN', 'NaN', 'NaN', 'NaN', '2019-09-27 14:18:28.000700 UTC'),
('this_string', 'NaN', 'NaN', 'NaN', 'NaN', '2019-09-27 14:18:28.000600 UTC')]
此列表适合作为数据输入:
In [186]: np.array(list(zip(*arr[:,1])),dtype=dt)
Out[186]:
array([('this_string', 'NaN', 'NaN', 'NaN', 'NaN', '2019-09-27 14:18:28.000700 UTC'),
('this_string', 'NaN', 'NaN', 'NaN', 'NaN', '2019-09-27 14:18:28.000700 UTC'),
('this_string', 'NaN', 'NaN', 'NaN', 'NaN', '2019-09-27 14:18:28.000600 UTC')],
dtype=[('SYM', 'O'), ('DATE', 'O'), ('YEST', 'O'), ('other_DATE', 'O'), ('SIZE', 'O'), ('ACTIVITY', 'O')])
您可以通过以下方式简化获取 keys/fields 的数量:
In [187]: len(all_content)
Out[187]: 6
获取'records'数量的另一种方法是
In [188]: first,*rest=all_content.values()
In [189]: first
Out[189]: ['this_string', 'this_string', 'this_string']
你的 next(iter...)
可能也一样好。
我需要将格式化的字符串放入结构化数组(字符串是 JSON 格式化的二维 table,其中所有列都是 objects
)。现在,我这样做:
import json
import numpy
json_string = '{"SYM": ["this_string","this_string","this_string"],"DATE": ["NaN","NaN","NaN"],"YEST": ["NaN","NaN","NaN"],"other_DATE": ["NaN","NaN","NaN"],"SIZE": ["NaN","NaN","NaN"],"ACTIVITY": ["2019-09-27 14:18:28.000700 UTC","2019-09-27 14:18:28.000700 UTC","2019-09-27 14:18:28.000600 UTC"]}'
all_content = json.loads(json_string)
dtype = numpy.dtype(dict(names = list(all_content.keys()), formats = ['O'] * len(all_content.keys())))
this_bucket = numpy.empty(shape = [len(all_content[next(iter(all_content.keys()))]), ],
dtype = dtype)
for key in all_content.keys():
this_bucket[key][:] = all_content[key]
但这似乎非常冗长。有直接的方法吗?
基本上有两种设置结构化数组值的方法 - 逐字段赋值(您这样做),以及使用元组列表,我将演示:
In [180]: all_content
Out[180]:
{'SYM': ['this_string', 'this_string', 'this_string'],
'DATE': ['NaN', 'NaN', 'NaN'],
'YEST': ['NaN', 'NaN', 'NaN'],
'other_DATE': ['NaN', 'NaN', 'NaN'],
'SIZE': ['NaN', 'NaN', 'NaN'],
'ACTIVITY': ['2019-09-27 14:18:28.000700 UTC',
'2019-09-27 14:18:28.000700 UTC',
'2019-09-27 14:18:28.000600 UTC']}
制作一个对象dtype数组,主要是为了'column'索引方便。
In [181]: arr = np.array(list(all_content.items()))
In [182]: arr
Out[182]:
array([['SYM', list(['this_string', 'this_string', 'this_string'])],
['DATE', list(['NaN', 'NaN', 'NaN'])],
['YEST', list(['NaN', 'NaN', 'NaN'])],
['other_DATE', list(['NaN', 'NaN', 'NaN'])],
['SIZE', list(['NaN', 'NaN', 'NaN'])],
['ACTIVITY',
list(['2019-09-27 14:18:28.000700 UTC', '2019-09-27 14:18:28.000700 UTC', '2019-09-27 14:18:28.000600 UTC'])]],
dtype=object)
定义 dtype - 像您一样,或使用:
In [183]: dt = np.dtype(list(zip(arr[:,0],['O']*arr.shape[0])))
In [184]: dt
Out[184]: dtype([('SYM', 'O'), ('DATE', 'O'), ('YEST', 'O'), ('other_DATE', 'O'), ('SIZE', 'O'), ('ACTIVITY', 'O')])
List 'transpose' 生成一个元组列表:
In [185]: list(zip(*arr[:,1]))
Out[185]:
[('this_string', 'NaN', 'NaN', 'NaN', 'NaN', '2019-09-27 14:18:28.000700 UTC'),
('this_string', 'NaN', 'NaN', 'NaN', 'NaN', '2019-09-27 14:18:28.000700 UTC'),
('this_string', 'NaN', 'NaN', 'NaN', 'NaN', '2019-09-27 14:18:28.000600 UTC')]
此列表适合作为数据输入:
In [186]: np.array(list(zip(*arr[:,1])),dtype=dt)
Out[186]:
array([('this_string', 'NaN', 'NaN', 'NaN', 'NaN', '2019-09-27 14:18:28.000700 UTC'),
('this_string', 'NaN', 'NaN', 'NaN', 'NaN', '2019-09-27 14:18:28.000700 UTC'),
('this_string', 'NaN', 'NaN', 'NaN', 'NaN', '2019-09-27 14:18:28.000600 UTC')],
dtype=[('SYM', 'O'), ('DATE', 'O'), ('YEST', 'O'), ('other_DATE', 'O'), ('SIZE', 'O'), ('ACTIVITY', 'O')])
您可以通过以下方式简化获取 keys/fields 的数量:
In [187]: len(all_content)
Out[187]: 6
获取'records'数量的另一种方法是
In [188]: first,*rest=all_content.values()
In [189]: first
Out[189]: ['this_string', 'this_string', 'this_string']
你的 next(iter...)
可能也一样好。