TensorFlow - 从 TensorBoard TFEvent 文件导入数据?

TensorFlow - Importing data from a TensorBoard TFEvent file?

我已经 运行 多次使用 TensorFlow 中的不同图表进行训练。我设置的摘要在训练和验证中显示了有趣的结果。现在,我想获取保存在摘要日志中的数据,并执行一些统计分析和一般绘图,并以不同方式查看摘要数据。有没有现成的方法可以轻松访问这些数据?

更具体地说,是否有任何内置方法可以将 TFEvent 记录读回 Python?

如果没有简单的方法,TensorFlow states that all its file formats are protobuf files。根据我对 protobufs 的理解(这是有限的),我认为如果我有 TFEvent 协议规范,我就能够提取这些数据。有没有一种简单的方法可以解决这个问题?非常感谢。

您可以简单地使用:

tensorboard --inspect --event_file=myevents.out

或者,如果您想过滤图表中的特定事件子集:

tensorboard --inspect --event_file=myevents.out --tag=loss

如果您想创建更多自定义内容,可以深入研究

/tensorflow/python/summary/event_file_inspector.py 

了解如何解析事件文件。

作为日志中的 Fabrizio , TensorBoard is a great tool for visualizing the contents of your summary logs. However, if you want to perform a custom analysis, you can use function to loop over all of the tf.Event and tf.Summary 协议缓冲区:

for summary in tf.train.summary_iterator("/path/to/log/file"):
    # Perform custom processing in here.

tf2 更新:

from tensorflow.python.summary.summary_iterator import summary_iterator

您需要导入它,该模块级别当前默认不导入。在 2.0.0-rc2

您可以使用脚本 serialize_tensorboard,它将接收一个日志目录并以 json 格式写出所有数据。

您还可以使用 EventAccumulator 来方便 Python API(这与 TensorBoard 使用的 API 相同)。

要读取 TFEvent,您可以获得一个 Python 生成事件协议缓冲区的迭代器。

# This example supposes that the events file contains summaries with a
# summary value tag 'loss'.  These could have been added by calling
# `add_summary()`, passing the output of a scalar summary op created with
# with: `tf.scalar_summary(['loss'], loss_tensor)`.
for e in tf.train.summary_iterator(path_to_events_file):
    for v in e.summary.value:
        if v.tag == 'loss' or v.tag == 'accuracy':
            print(v.simple_value)

更多信息:summary_iterator

这是从标量获取值的完整示例。可以看到Event protobuf消息的消息规范here

import tensorflow as tf


for event in tf.train.summary_iterator('runs/easy_name/events.out.tfevents.1521590363.DESKTOP-43A62TM'):
    for value in event.summary.value:
        print(value.tag)
        if value.HasField('simple_value'):
            print(value.simple_value)

我一直在用这个。它假定您只想查看多次记录的标签,这些标签的值为浮点数并且 returns 结果为 pd.DataFrame。只需调用 metrics_df = parse_events_file(path).

from collections import defaultdict
import pandas as pd
import tensorflow as tf

def is_interesting_tag(tag):
    if 'val' in tag or 'train' in tag:
        return True
    else:
        return False


def parse_events_file(path: str) -> pd.DataFrame:
    metrics = defaultdict(list)
    for e in tf.train.summary_iterator(path):
        for v in e.summary.value:

            if isinstance(v.simple_value, float) and is_interesting_tag(v.tag):
                metrics[v.tag].append(v.simple_value)
            if v.tag == 'loss' or v.tag == 'accuracy':
                print(v.simple_value)
    metrics_df = pd.DataFrame({k: v for k,v in metrics.items() if len(v) > 1})
    return metrics_df

从 tensorflow 版本 2.0.0-beta1 开始的以下作品:

import os

import tensorflow as tf
from tensorflow.python.framework import tensor_util

summary_dir = 'tmp/summaries'
summary_writer = tf.summary.create_file_writer('tmp/summaries')

with summary_writer.as_default():
  tf.summary.scalar('loss', 0.1, step=42)
  tf.summary.scalar('loss', 0.2, step=43)
  tf.summary.scalar('loss', 0.3, step=44)
  tf.summary.scalar('loss', 0.4, step=45)


from tensorflow.core.util import event_pb2
from tensorflow.python.lib.io import tf_record

def my_summary_iterator(path):
    for r in tf_record.tf_record_iterator(path):
        yield event_pb2.Event.FromString(r)

for filename in os.listdir(summary_dir):
    path = os.path.join(summary_dir, filename)
    for event in my_summary_iterator(path):
        for value in event.summary.value:
            t = tensor_util.MakeNdarray(value.tensor)
            print(value.tag, event.step, t, type(t))

my_summary_iterator 的代码是从 tensorflow.python.summary.summary_iterator.py 复制的 - 无法在运行时导入它。

2020 年后期版本的 TensorFlow 和 TensorFlow Datasets 推荐了一种不同的方法。使用 tf.data.TFRecordDataset and event_pb2:

from os import path, listdir
from operator import contains
from functools import partial
from itertools import chain
from json import loads

import numpy as np
import tensorflow as tf
from tensorflow.core.util import event_pb2

# From https://github.com/Suor/funcy/blob/0ee7ae8/funcy/funcs.py#L34-L36
def rpartial(func, *args):
    """Partially applies last arguments."""
    return lambda *a: func(*(a + args))


tensorboard_logdir = "/tmp"


# Or you could just glob… for *tfevents*:
list_dir = lambda p: map(partial(path.join, p), listdir(p))

for event in filter(rpartial(contains, "tfevents"),
                    chain.from_iterable(
                        map(list_dir,
                            chain.from_iterable(
                                map(list_dir,
                                    filter(rpartial(contains, "_epochs_"),
                                           list_dir(tensorboard_logdir))))))):
    print(event)
    for raw_record in tf.data.TFRecordDataset(event):
        for value in event_pb2.Event.FromString(raw_record.numpy()).summary.value:
            print("value: {!r} ;".format(value))
            if value.tensor.ByteSize():
                t = tf.make_ndarray(value.tensor)
                if hasattr(event, "step"):
                    print(value.tag, event.step, t, type(t))
                elif type(t).__module__ == np.__name__:
                    print("t: {!r} ;".format(np.vectorize(loads)(t)))
    print()

有 2 种本机方法可以读取 this post 中提到的事件文件:

  1. 事件累加器

    >>> from tensorboard.backend.event_processing.event_accumulator import EventAccumulator
    >>> event_acc = EventAccumulator(event_file)
    >>> event_acc.Reload() 
    <tensorboard.backend.event_processing.event_accumulator.EventAccumulator object at ...>
    >>> print(event_acc.Tags())
    {'images': [], 'audio': [], 'histograms': [], 'scalars': ['y=2x'], 'distributions': [], 'tensors': [], 'graph': False, 'meta_graph': False, 'run_metadata': []}
    >>> for e in event_acc.Scalars('y=2x'):
    ...   print(e.step, e.value)
    0 0.0
    1 2.0
    2 4.0
    3 6.0
    4 8.0
    
  2. 摘要迭代器

    >>> import tensorflow as tf
    >>> from tensorflow.python.summary.summary_iterator import summary_iterator
    >>> for e in summary_iterator(event_file):
    ...   for v in e.summary.value:
    ...     if v.tag == 'y=2x':
    ...       print(e.step, v.simple_value)
    0 0.0
    1 2.0
    2 4.0
    3 6.0
    4 8.0
    

对于多个事件文件或其他事件类型(例如直方图),您可以使用 tbparse to parse the event logs into a pandas DataFrame and process it locally. You can open an issue if you encountered any question during parsing. (I'm the author of tbparse)

注意:仅当您将事件日志上传到 TensorBoard.dev (source) 时,TensorBoard 才能将事件日志解析为 DataFrames,目前无法使用它 offline/locally.