我可以 "detect" python class 方法中的切片表达式吗?
Can I "detect" a slicing expression in a python class method?
我正在开发一个应用程序,我在其中定义了一个包含 numpy 数组形式的数据的 "variable" 对象。这些变量链接到 (netcdf) 数据文件,我想在需要时动态加载变量值,而不是在开始时从有时巨大的文件中加载所有数据。
以下代码片段演示了原理并且效果很好,包括使用切片访问数据部分。例如,你可以这样写:
a = var() # empty variable
print a.values[7] # values have been automatically "loaded"
甚至:
a = var()
a[7] = 0
但是,这段代码仍然迫使我一次加载整个变量数据。 Netcdf(带有 netCDF4 库)允许我直接从文件访问数据切片。示例:
f = netCDF4.Dataset(filename, "r")
print f.variables["a"][7]
我不能直接使用 netcdf 变量对象,因为我的应用程序绑定到无法记住 netcdf 文件处理程序的 Web 服务,而且因为变量数据并不总是来自 netcdf 文件,但可能源自其他来源,例如 OGC 网络服务。
有没有办法"capture" 属性 或setter 方法中的切片表达式并使用它们?这个想法是写这样的东西:
@property
def values(self):
if self._values is None:
self._values = np.arange(10.)[slice] # load from file ...
return self._values
而不是下面的代码。
工作演示:
import numpy as np
class var(object):
def __init__(self, values=None, metadata=None):
if values is None:
self._values = None
else:
self._values = np.array(values)
self.metadata = metadata # just to demonstrate that var has mor than just values
@property
def values(self):
if self._values is None:
self._values = np.arange(10.) # load from file ...
return self._values
@values.setter
def values(self, values):
self._values = values
第一个想法:我是否应该将值创建为单独的 class 然后使用 __getitem__
?参见 In python, how do I create two index slicing for my own matrix class?
不,您无法检测在从 .values
进行 return 之后将对对象执行的操作。结果可以存储在一个变量中并且仅(稍后)被切片,或者在不同的地方切片,或者全部使用,等等。
您确实应该 return 一个 包装器对象 并挂接到 object.__getitem__
; it would let you detect slicing and load data as needed. When slicing, Python passes in a slice()
object.
感谢 Martijn Pieters 的指导和更多的阅读,我想出了以下代码作为演示。请注意 Reader class 使用 netcdf 文件和 netCDF4 库。如果你想自己尝试这段代码,你需要一个带有变量 "a" 和 "b" 的 netcdf 文件,或者将 Reader 替换为其他可以 return 数据数组的东西或数据数组的切片。
此解决方案定义了三个 classes:Reader 执行实际文件 I/O 处理,Values 管理数据访问部分并在没有数据时调用 Reader 实例已存储在内存中,var 是最终的 "variable",在现实生活中将包含更多元数据。该代码包含一些用于教育目的的额外打印语句。
"""Implementation of a dynamic variable class which can read data from file when needed or
return the data values from memory if they were read already. This concepts supports
slicing for both memory and file access."""
import numpy as np
import netCDF4 as nc
FILENAME = r"C:\Users\m.schultz\Downloads\data\tmp\MACC_20141224_0001.nc"
VARNAME = "a"
class Reader(object):
"""Implements the actual data access to variable values. Here reading a
slice from a netcdf file.
"""
def __init__(self, filename, varname):
"""Final implementation will also have to take groups into account...
"""
self.filename = filename
self.varname = varname
def read(self, args=slice(None, None, None)):
"""Read a data slice. Args is a tuple of slice objects (e.g.
numpy.index_exp). The default corresponds to [:], i.e. all data
will be read.
"""
with nc.Dataset(self.filename, "r") as f:
values = f.variables[self.varname][args]
return values
class Values(object):
def __init__(self, values=None, reader=None):
"""Initialize Values. You can either pass numerical (or other) values,
preferrably as numpy array, or a reader instance which will read the
values on demand. The reader must have a read(args) method, where
args is a tuple of slices. If no args are given, all data should be
returned.
"""
if values is not None:
self._values = np.array(values)
self.reader = reader
def __getattr__(self, name):
"""This is only be called if attribute name is not present.
Here, the only attribute we care about is _values.
Self.reader should always be defined.
This method is necessary to allow access to variable.values without
a slicing index. If only __getitem__ were defined, one would always
have to write variable.values[:] in order to make sure that something
is returned.
"""
print ">>> in __getattr__, trying to access ", name
if name == "_values":
print ">>> calling reader and reading all values..."
self._values = self.reader.read()
return self._values
def __getitem__(self, args):
print "in __getitem__"
if not "_values" in self.__dict__:
values = self.reader.read(args)
print ">>> read from file. Shape = ", values.shape
if args == slice(None, None, None):
self._values = values # all data read, store in memory
return values
else:
print ">>> read from memory. Shape = ", self._values[args].shape
return self._values[args]
def __repr__(self):
return self._values.__repr__()
def __str__(self):
return self._values.__str__()
class var(object):
def __init__(self, name=VARNAME, filename=FILENAME, values=None):
self.name = name
self.values = Values(values, Reader(filename, name))
if __name__ == "__main__":
# define a variable and access all data first
# this will read the entire array and save it in memory, so that
# subsequent access with or without index returns data from memory
a = var("a", filename=FILENAME)
print "1: a.values = ", a.values
print "2: a.values[-1] = ", a.values[-1]
print "3: a.values = ", a.values
# define a second variable, where we access a data slice first
# In this case the Reader only reads the slice and no data are stored
# in memory. The second access indexes the complete array, so Reader
# will read everything and the data will be stored in memory.
# The last access will then use the data from memory.
b = var("b", filename=FILENAME)
print "4: b.values[0:3] = ", b.values[0:3]
print "5: b.values[:] = ", b.values[:]
print "6: b.values[5:8] = ",b.values[5:8]
我正在开发一个应用程序,我在其中定义了一个包含 numpy 数组形式的数据的 "variable" 对象。这些变量链接到 (netcdf) 数据文件,我想在需要时动态加载变量值,而不是在开始时从有时巨大的文件中加载所有数据。
以下代码片段演示了原理并且效果很好,包括使用切片访问数据部分。例如,你可以这样写:
a = var() # empty variable
print a.values[7] # values have been automatically "loaded"
甚至:
a = var()
a[7] = 0
但是,这段代码仍然迫使我一次加载整个变量数据。 Netcdf(带有 netCDF4 库)允许我直接从文件访问数据切片。示例:
f = netCDF4.Dataset(filename, "r")
print f.variables["a"][7]
我不能直接使用 netcdf 变量对象,因为我的应用程序绑定到无法记住 netcdf 文件处理程序的 Web 服务,而且因为变量数据并不总是来自 netcdf 文件,但可能源自其他来源,例如 OGC 网络服务。
有没有办法"capture" 属性 或setter 方法中的切片表达式并使用它们?这个想法是写这样的东西:
@property
def values(self):
if self._values is None:
self._values = np.arange(10.)[slice] # load from file ...
return self._values
而不是下面的代码。
工作演示:
import numpy as np
class var(object):
def __init__(self, values=None, metadata=None):
if values is None:
self._values = None
else:
self._values = np.array(values)
self.metadata = metadata # just to demonstrate that var has mor than just values
@property
def values(self):
if self._values is None:
self._values = np.arange(10.) # load from file ...
return self._values
@values.setter
def values(self, values):
self._values = values
第一个想法:我是否应该将值创建为单独的 class 然后使用 __getitem__
?参见 In python, how do I create two index slicing for my own matrix class?
不,您无法检测在从 .values
进行 return 之后将对对象执行的操作。结果可以存储在一个变量中并且仅(稍后)被切片,或者在不同的地方切片,或者全部使用,等等。
您确实应该 return 一个 包装器对象 并挂接到 object.__getitem__
; it would let you detect slicing and load data as needed. When slicing, Python passes in a slice()
object.
感谢 Martijn Pieters 的指导和更多的阅读,我想出了以下代码作为演示。请注意 Reader class 使用 netcdf 文件和 netCDF4 库。如果你想自己尝试这段代码,你需要一个带有变量 "a" 和 "b" 的 netcdf 文件,或者将 Reader 替换为其他可以 return 数据数组的东西或数据数组的切片。
此解决方案定义了三个 classes:Reader 执行实际文件 I/O 处理,Values 管理数据访问部分并在没有数据时调用 Reader 实例已存储在内存中,var 是最终的 "variable",在现实生活中将包含更多元数据。该代码包含一些用于教育目的的额外打印语句。
"""Implementation of a dynamic variable class which can read data from file when needed or
return the data values from memory if they were read already. This concepts supports
slicing for both memory and file access."""
import numpy as np
import netCDF4 as nc
FILENAME = r"C:\Users\m.schultz\Downloads\data\tmp\MACC_20141224_0001.nc"
VARNAME = "a"
class Reader(object):
"""Implements the actual data access to variable values. Here reading a
slice from a netcdf file.
"""
def __init__(self, filename, varname):
"""Final implementation will also have to take groups into account...
"""
self.filename = filename
self.varname = varname
def read(self, args=slice(None, None, None)):
"""Read a data slice. Args is a tuple of slice objects (e.g.
numpy.index_exp). The default corresponds to [:], i.e. all data
will be read.
"""
with nc.Dataset(self.filename, "r") as f:
values = f.variables[self.varname][args]
return values
class Values(object):
def __init__(self, values=None, reader=None):
"""Initialize Values. You can either pass numerical (or other) values,
preferrably as numpy array, or a reader instance which will read the
values on demand. The reader must have a read(args) method, where
args is a tuple of slices. If no args are given, all data should be
returned.
"""
if values is not None:
self._values = np.array(values)
self.reader = reader
def __getattr__(self, name):
"""This is only be called if attribute name is not present.
Here, the only attribute we care about is _values.
Self.reader should always be defined.
This method is necessary to allow access to variable.values without
a slicing index. If only __getitem__ were defined, one would always
have to write variable.values[:] in order to make sure that something
is returned.
"""
print ">>> in __getattr__, trying to access ", name
if name == "_values":
print ">>> calling reader and reading all values..."
self._values = self.reader.read()
return self._values
def __getitem__(self, args):
print "in __getitem__"
if not "_values" in self.__dict__:
values = self.reader.read(args)
print ">>> read from file. Shape = ", values.shape
if args == slice(None, None, None):
self._values = values # all data read, store in memory
return values
else:
print ">>> read from memory. Shape = ", self._values[args].shape
return self._values[args]
def __repr__(self):
return self._values.__repr__()
def __str__(self):
return self._values.__str__()
class var(object):
def __init__(self, name=VARNAME, filename=FILENAME, values=None):
self.name = name
self.values = Values(values, Reader(filename, name))
if __name__ == "__main__":
# define a variable and access all data first
# this will read the entire array and save it in memory, so that
# subsequent access with or without index returns data from memory
a = var("a", filename=FILENAME)
print "1: a.values = ", a.values
print "2: a.values[-1] = ", a.values[-1]
print "3: a.values = ", a.values
# define a second variable, where we access a data slice first
# In this case the Reader only reads the slice and no data are stored
# in memory. The second access indexes the complete array, so Reader
# will read everything and the data will be stored in memory.
# The last access will then use the data from memory.
b = var("b", filename=FILENAME)
print "4: b.values[0:3] = ", b.values[0:3]
print "5: b.values[:] = ", b.values[:]
print "6: b.values[5:8] = ",b.values[5:8]