"Labelled Faces in the Wild" 数据集（scikit 学习）中“数据”字段的性质是什么？

Question

我正在尝试使用从 sklearn.datasets.fetch_lfw_people 获取的数据来训练一个简单的 HOG 人脸检测器。获取数据集后，我找到以下键：

In [1]:  lfw_people.keys()
Out[1]:  ['images', 'data', 'target_names', 'DESCR', 'target']

...但是 earth 是什么 data?

在我的例子中，它是一个 (13233 x 1850) 浮点数的 numpy 数组，也就是说每张图像一行 1850 个浮点数。

这个data字段的性质是什么？

Answer 1

(lfw_people.images[0].ravel() == lfw_people.data[0]).all() 的计算结果为 True，因此看起来 data 字段只是平面化为矢量的图像。

谜底已解，但这种事情确实应该提前说明:/

What is the nature of the `data` field in the "Labelled Faces in the Wild" dataset (scikit learn)?