构建所选特征的字典

Build a dictionary of selected features

我有 20K 个对象和列表中提供的一组功能。我需要从每个对象中提取这些特征并将它们保存到字典中。每个对象都有近 100 个特征。

例如:

# object1
Object1.Age = '20'
Object1.Gender = 'Female'
Object1.DOB = '03/05/1997'
Object1.Weight = '130lb'
Object1.Height = '5.5'

#object2
Object1.Age = '22'
Object1.Gender = 'Male'
Object1.DOB = '03/05/1995'
Object1.Weight = '145lb'
Object1.Height = '5.8'

#object3
Object1.Age = '22'
Object1.Gender = 'Male'
Object1.DOB = '03/05/1995'
Object1.Weight = '145lb'

#object4
...

以及我需要从每个对象中提取的特征列表(此列表可能会更改,因此我需要代码对其灵活处理):

features = ['Gender', 
        'DOB', 
        'Height']

目前,我正在使用此函数从每个对象中捕获我需要的所有特征:

def get_features(obj, features):
return {f: getattr(obj, f) for f in features}

如果所有对象都具有我想要的所有功能,此功能将完美运行。但是有些对象不具备所有功能。例如 object3 没有名为 "Height" 的文件。如何将 NaN 作为我字典中丢失文件的值,以便我可以防止出现错误?

您可以使用 obj.__dict__:

def get_features(obj, features):
  return {f:obj.__dict__.get(f, 'NaN') for f in features}

或者,如果您仍然希望使用 getattr,您可以实施 hasattr:

def get_features(obj, features):
  return {f:'NaN' if not hasattr(obj, f) else getattr(obj, f) for f in features}

如果密钥不存在,这应该 return NaN 作为默认值:obj.__dict__.get(feature_name, float('NaN'))

对于 Python 3.5+,NaNmath 包中作为常量可用,所以这会起作用 obj.__dict__.get(feature_name, math.nan)

Python getattr documentation:

getattr(object, name[, default]) Return the value of the named attribute of object. name must be a string. If the string is the name of one of the object’s attributes, the result is the value of that attribute. For example, getattr(x, 'foobar') is equivalent to x.foobar. If the named attribute does not exist, default is returned if provided, otherwise AttributeError is raised.

你可以这样做:

def get_features(obj, features):
    return {f: getattr(obj, f, float('Nan')) for f in features}