如何使用经过训练的张量流网络从新数据生成预测？

Question

我想从头开始训练 Google VGGish network (Hershey et al 2017) 来预测类特定于我自己的音频文件。

为此，我正在使用 vggish_train_demo.py 脚本，该脚本可在其 github 存储库中使用，该脚本使用 tensorflow。我已经能够修改脚本以通过更改 _get_examples_batch() 函数从我自己的音频中提取 melspec 特征，然后根据该函数的输出训练模型。这运行到完成并打印每个时期的损失。

但是，我一直无法弄清楚如何让这个经过训练的模型根据新数据生成预测。这可以通过更改 vggish_train_demo.py 脚本来完成吗？

Answer 1

对于将来偶然发现此问题的任何人，我编写了这个脚本来完成这项工作。您必须在数组中保存训练和测试数据的 logmel 规范：X_train、y_train、X_test、y_test。 X_train/test 是 (n, 96,64) 个特征的数组，y_train/test 是两个类的形状 (n, _NUM_CLASSES) 的数组，其中 n = 0.96s 音频片段的数量和 _NUM_CLASSES = 使用的类的数量。

查看函数定义语句了解更多信息和我原来的 vggish github post:

### Run the network and save the predictions and accuracy at each epoch

### Train NN, output results
r"""This uses the VGGish model definition within a larger model which adds two 
layers on top, and then trains this larger model. 

We input log-mel spectrograms (X_train) calculated above with associated labels 
(y_train), and feed the batches into the model. Once the model is trained, it 
is then executed on the test log-mel spectrograms (X_test), and the accuracy is
ouput, alongside a .csv file with the predictions for each 0.96s chunk and their
true class."""
    
def main(X):   
  with tf.Graph().as_default(), tf.Session() as sess:
    # Define VGGish.
    embeddings = vggish_slim.define_vggish_slim(training=FLAGS.train_vggish)
    
    
    # Define a shallow classification model and associated training ops on top
    # of VGGish.
    with tf.variable_scope('mymodel'):
      # Add a fully connected layer with 100 units. Add an activation function
      # to the embeddings since they are pre-activation.
      num_units = 100
      fc = slim.fully_connected(tf.nn.relu(embeddings), num_units)

      # Add a classifier layer at the end, consisting of parallel logistic
      # classifiers, one per class. This allows for multi-class tasks.
      logits = slim.fully_connected(                                 
          fc, _NUM_CLASSES, activation_fn=None, scope='logits')
      tf.sigmoid(logits, name='prediction')
    
      linear_out= slim.fully_connected(                                      
          fc, _NUM_CLASSES, activation_fn=None, scope='linear_out')
      logits = tf.sigmoid(linear_out, name='logits')
    
      # Add training ops.
      with tf.variable_scope('train'):
        global_step = tf.train.create_global_step()

        # Labels are assumed to be fed as a batch multi-hot vectors, with
        # a 1 in the position of each positive class label, and 0 elsewhere.
        labels_input = tf.placeholder(
            tf.float32, shape=(None, _NUM_CLASSES), name='labels')

        # Cross-entropy label loss.
        xent = tf.nn.sigmoid_cross_entropy_with_logits(
            logits=logits, labels=labels_input, name='xent')  
        loss = tf.reduce_mean(xent, name='loss_op')
        tf.summary.scalar('loss', loss)

        # We use the same optimizer and hyperparameters as used to train VGGish.
        optimizer = tf.train.AdamOptimizer(
            learning_rate=vggish_params.LEARNING_RATE,
            epsilon=vggish_params.ADAM_EPSILON)
        train_op = optimizer.minimize(loss, global_step=global_step)

    # Initialize all variables in the model, and then load the pre-trained
    # VGGish checkpoint.
    sess.run(tf.global_variables_initializer())         
    vggish_slim.load_vggish_slim_checkpoint(sess, FLAGS.checkpoint)

    # The training loop.
    features_input = sess.graph.get_tensor_by_name(
        vggish_params.INPUT_TENSOR_NAME)
    


    accuracy_scores = []
    for epoch in range(num_epochs):#FLAGS.num_batches):
            epoch_loss = 0
            i=0
            while i < len(X_train):
                start = i
                end = i+batch_size
                batch_x = np.array(X_train[start:end])
                batch_y = np.array(y_train[start:end])

                _, c = sess.run([train_op, loss], feed_dict={features_input: batch_x, labels_input: batch_y})
                epoch_loss += c
                i+=batch_size
            #print no. of epochs and loss
            print('Epoch', epoch+1, 'completed out of', num_epochs,', loss:',epoch_loss) #FLAGS.num_batches,', loss:',epoch_loss)
            
            #If these lines are left here, it will evaluate on the test data every iteration and print accuracy
            #note this adds a small computational cost
            correct = tf.equal(tf.argmax(logits, 1), tf.argmax(labels_input, 1)) #This line returns the max value of each array, which we want to be the same (think the prediction/logits is value given to each class with the highest value being the best match)
            accuracy = tf.reduce_mean(tf.cast(correct, 'float')) #changes correct to type: float
            accuracy1 = accuracy.eval({features_input:X_test, labels_input:y_test}) 
            accuracy_scores.append(accuracy1)
            print('Accuracy:', accuracy1)#TF is smart so just knows to feed it through the model without us seeming to tell it to.



            #Save predictions for test data
            predictions_sigm = logits.eval(feed_dict = {features_input:X_test}) #not really _sigm, change back later
            #print(predictions_sigm) #shows table of predictions, meaningless if saving at each epoch
            test_preds = pd.DataFrame(predictions_sigm, columns = col_names)  #converts predictions to df
            true_class = np.argmax(y_test, axis = 1)     #This saves the true class
            test_preds['True class'] = true_class        #This adds true class to the df
        
            #Saves csv file of table of predictions for test data. NB. header will not save when using np.text for some reason
           np.savetxt("/content/drive/MyDrive/..."+"Epoch_"+str(epoch+1)+"_Accuracy_"+str(accuracy1), test_preds.values, delimiter=",") 

    
if __name__ == '__main__':
  tf.app.run()


#'An exception has occurred, use %tb to see the full traceback.' error will occur, fear not, this just means its finished (perhaps as its exited the tensorflow session?)

如何使用经过训练的张量流网络从新数据生成预测？

How to generate predictions from new data using trained tensorflow network?

python

audio

neural-network

deep-learning

tensorflow