Python: 仅将 csv 文件中的一些属性提取到 numpy 数组

Question

我正在尝试开发一个从 csv 文件读取输入的神经网络。我用这个作为教程： https://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/

我了解输入是如何存储在 X 中的

X = dataset[:,0:4].astype(float)

问题是我将要使用的数据集每个条目有超过 100 个属性（不像这里只有 4 个）。我已经弄清楚我想将其中的哪一个用作输入，但我找不到创建与示例中具有相同格式的 X 的方法。我试过 numpy.vstack 但没有得到想要的结果。

谁能给我一个示例，说明如何创建仅包含指定属性的 X？

Answer 1

一个常见的过程是使用 pandas，它允许通过 pandas.DataFrame.infer_objects followed by pandas.DataFrame.values 的 dtype 软转换 returns DataFrame 的 Numpy 表示。或者，您可以按照上面的建议指定要与 usecols 一起使用的列。

来自docs：

usecols : list-like or callable, default None

Return a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s).

看起来怎么样？

df = pd.read_csv(infile, usecols = ['a', 'b'])         # Read
df_dtypes = df.infer_objects()                         # Soft conversion
x = df.values                                          # Numpy array

print df.info()                                        # Inspect -> Object
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
a    3 non-null object
b    3 non-null object
dtypes: object(2)
memory usage: 120.0+ bytes

print df_types.info()                                  # Inspect -> dtype change
  <class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
a    3 non-null int64
b    3 non-null int64
dtypes: int64(2)
memory usage: 120.0 bytes

print x                                                # Inspect -> numpy array
[[7 3]
 [1 2]
 [5 1]]

Python: 仅将 csv 文件中的一些属性提取到 numpy 数组

Python: Extract only some attributes from a csv file to a numpy array

python

csv

attributes

numpy

selection