Graphlab 和 numpy 问题

Graphlab and numpy issue

我目前正在学习华盛顿大学提供的 Coursera(机器学习)课程,我在 numpygraphlab

方面几乎没有遇到任何问题

课程要求使用高于 1.7 的 graphlab 版本 我的更高,如下所示,但是,当我 运行 下面的脚本时,出现如下错误:

  [INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started.
  def get_numpy_data(data_sframe, features, output):
      data_sframe['constant'] = 1
      features = ['constant'] + features # this is how you combine two lists
      # the following line will convert the features_SFrame into a numpy matrix:
      feature_matrix = features_sframe.to_numpy()
      # assign the column of data_sframe associated with the output to the SArray output_sarray

      # the following will convert the SArray into a numpy array by first converting it to a list
      output_array = output_sarray.to_numpy()
      return(feature_matrix, output_array)

     (example_features, example_output) = get_numpy_data(sales,['sqft_living'], 'price') # the [] around 'sqft_living' makes it a list
     print example_features[0,:] # this accesses the first row of the data the ':' indicates 'all columns'
     print example_output[0] # and the corresponding output

     ----> 8     feature_matrix = features_sframe.to_numpy()
     NameError: global name 'features_sframe' is not defined

上面的脚本是课程作者写的,所以我相信我做错了什么

我们将不胜感激任何帮助。

您应该在 运行 之前完成函数 get_numpy_data,这就是您收到错误的原因。按照原函数中的指令,实际上是:

def get_numpy_data(data_sframe, features, output):
    data_sframe['constant'] = 1 # this is how you add a constant column to an SFrame
    # add the column 'constant' to the front of the features list so that we can extract it along with the others:
    features = ['constant'] + features # this is how you combine two lists
    # select the columns of data_SFrame given by the features list into the SFrame features_sframe (now including constant):

    # the following line will convert the features_SFrame into a numpy matrix:
    feature_matrix = features_sframe.to_numpy()
    # assign the column of data_sframe associated with the output to the SArray output_sarray

    # the following will convert the SArray into a numpy array by first converting it to a list
    output_array = output_sarray.to_numpy()
    return(feature_matrix, output_array)

graphlab 赋值指令让您从 graphlab 转换为 pandas,然后再转换为 numpy。您可以跳过 graphlab 部分并直接使用 pandas。 (这是作业描述中明确允许的。)

首先,读入数据文件。

import pandas as pd

dtype_dict = {'bathrooms':float, 'waterfront':int, 'sqft_above':int, 'sqft_living15':float, 'grade':int, 'yr_renovated':int, 'price':float, 'bedrooms':float, 'zipcode':str, 'long':float, 'sqft_lot15':float, 'sqft_living':float, 'floors':str, 'condition':int, 'lat':float, 'date':str, 'sqft_basement':int, 'yr_built':int, 'id':str, 'sqft_lot':int, 'view':int}
sales = pd.read_csv('data//kc_house_data.csv', dtype=dtype_dict)
train_data = pd.read_csv('data//kc_house_train_data.csv', dtype=dtype_dict)
test_data = pd.read_csv('data//kc_house_test_data.csv', dtype=dtype_dict)

convert to numpy函数就变成了

def get_numpy_data(df, features, output):
    df['constant'] = 1

    # add the column 'constant' to the front of the features list so that we can extract it along with the others
    features = ['constant'] + features

    # select the columns of data_SFrame given by the features list into the SFrame features_sframe
    features_df = pd.DataFrame(**FILL IN THE BLANK HERE WITH YOUR CODE**)

    # cast the features_df into a numpy matrix
    feature_matrix = features_df.as_matrix()

    etc.

其余代码应该相同(因为您只使用 numpy 版本完成剩余的作业)。