如何查看 train_on_batch Colab 生成的 train_on_batch 张量板日志文件?

How to view train_on_batch tensorboard log files generated by Google Colab?

我知道如何在我的本地机器上查看张量板图,同时我的神经网络使用本地 Jupyter Notebook 中的代码进行训练,使用以下代码。当我使用 Google Colab 训练神经网络时,我需要做些什么不同的事情?使用 train_on_batch.

时,我无法在线看到任何 tutorials/examples

定义我的模型(convnet)后...

convnet.compile(loss='categorical_crossentropy',                                      
                optimizer=tf.keras.optimizers.Adam(0.001),
                metrics=['accuracy']
               )

# create tensorboard graph data for the model
tb = tf.keras.callbacks.TensorBoard(log_dir='Logs/Exp_15', 
                                    histogram_freq=0, 
                                    batch_size=batch_size, 
                                    write_graph=True, 
                                    write_grads=False)
tb.set_model(convnet)

num_epochs = 3
batches_processed_counter = 0

for epoch in range(num_epochs):

    for batch in range(int(train_img.samples/batch_size)): 
        batches_processed_counter = batches_processed_counter  + 1

        # get next batch of images & labels
        X_imgs, X_labels = next(train_img) 

        #train model, get cross entropy & accuracy for batch
        train_CE, train_acc = convnet.train_on_batch(X_imgs, X_labels) 

        # validation images - just predict
        X_imgs_val, X_labels_val = next(val_img)
        val_CE, val_acc = convnet.test_on_batch(X_imgs_val, X_labels_val) 

        # create tensorboard graph info for the cross entropy loss and training accuracies
        # for every batch in every epoch (so if 5 epochs and 10 batches there should be 50 accuracies )
        tb.on_epoch_end(batches_processed_counter, {'train_loss': train_CE, 'train_acc': train_acc})

        # create tensorboard graph info for the cross entropy loss and VALIDATION accuracies
        # for every batch in every epoch (so if 5 epochs and 10 batches there should be 50 accuracies )
        tb.on_epoch_end(batches_processed_counter, {'val_loss': val_CE, 'val_acc': val_acc})

        print('epoch', epoch, 'batch', batch, 'train_CE:', train_CE, 'train_acc:', train_acc)
        print('epoch', epoch, 'batch', batch, 'val_CE:', val_CE, 'val_acc:', val_acc)

tb.on_train_end(None)

我可以看到日志文件已在 Google Colab 运行时中成功生成。我如何在 Tensorboard 中查看它?我见过描述将日志文件下载到本地机器并在本地 tensorboard 中查看的解决方案,但这不会显示任何内容。我的代码中是否缺少某些内容以允许它在本地的 tensorboard 上运行? And/or 在 Google Colab 中查看 Tensorboard 中的日志数据的替代解决方案?

如果它对解决方案的细节很重要,我在 Mac。此外,我在网上看到的教程展示了如何在使用 fit 代码时将 Tensorboard 与 Google Colab 一起使用,但看不到如何修改不使用 fit 的代码而是 train_on_batch.

!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip

get_ipython().system_raw('tensorboard --logdir /content/trainingdata/objectdetection/ckpt_output/trainingImatges/ --host 0.0.0.0 --port 6006 &')

get_ipython().system_raw('./ngrok http 6006 &')

! curl -s http://localhost:4040/api/tunnels | python3 -c \
 "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

这会根据创建的日志文件为您提供张量板。这为 colab 上的张量板创建了一个隧道,并使其可以通过 ngrok 提供的 public URL 访问。当您 运行 最后一个命令时,会打印 public URL 。它适用于 TF1.13 。我想你也可以对 TF2 使用相同的方法。

感谢曼彻斯特城市大学的 Ryan Cunningham 博士解决了这个问题,具体如下:

%load_ext tensorboard
%tensorboard --logdir './Logs'

...这允许我在 Google Colab 文档本身中查看 Tensorboard 图,并在 NN 训练时查看图更新。

因此,在网络训练时查看 Tensorboard 图的全套代码是(在定义神经网络之后,我称之为 convnet):

# compile the neural net after defining the loss, optimisation and 
# performance metric
convnet.compile(loss='categorical_crossentropy',  # cross entropy is suited to 
                                                   # multi-class classification
                optimizer=tf.keras.optimizers.Adam(0.001),
                metrics=['accuracy']
               )

# create tensorboard graph data for the model
tb = tf.keras.callbacks.TensorBoard(log_dir='Logs/Exp_15', 
                                    histogram_freq=0, 
                                    batch_size=batch_size, 
                                    write_graph=True, 
                                    write_grads=False)
tb.set_model(convnet)

%load_ext tensorboard
%tensorboard --logdir './Logs'

# iterate through the training set for x epochs, 
# each time iterating through the batches,
# for each batch, train, calculate loss & optimise weights. 
# (mini-batch approach)
num_epochs = 1
batches_processed_counter = 0

for epoch in range(num_epochs):

    for batch in range(int(train_img.samples/batch_size)): 
        batches_processed_counter = batches_processed_counter  + 1

        # get next batch of images & labels
        X_imgs, X_labels = next(train_img) 

        #train model, get cross entropy & accuracy for batch
        train_CE, train_acc = convnet.train_on_batch(X_imgs, X_labels) 

        # validation images - just predict
        X_imgs_val, X_labels_val = next(val_img)
        val_CE, val_acc = convnet.test_on_batch(X_imgs_val, X_labels_val) 

        # create tensorboard graph info for the cross entropy loss and training accuracies
        # for every batch in every epoch (so if 5 epochs and 10 batches there should be 50 accuracies )
        tb.on_epoch_end(batches_processed_counter, {'train_loss': train_CE, 'train_acc': train_acc})

        # create tensorboard graph info for the cross entropy loss and VALIDATION accuracies
        # for every batch in every epoch (so if 5 epochs and 10 batches there should be 50 accuracies )
        tb.on_epoch_end(batches_processed_counter, {'val_loss': val_CE, 'val_acc': val_acc})

        print('epoch', epoch, 'batch', batch, 'train_CE:', train_CE, 'train_acc:', train_acc)
        print('epoch', epoch, 'batch', batch, 'val_CE:', val_CE, 'val_acc:', val_acc)

tb.on_train_end(None)


注意:单元格完成后 运行 可能需要几秒钟,单元格输出才会刷新并显示 Tensorboard 图。