IteratorGetNext 上的 TensorFlow 性能瓶颈
TensorFlow performance bottleneck on IteratorGetNext
在摆弄 TensorFlow 时,我注意到一个相对简单的任务(批处理我们的一些 3D 加速度计数据并获取每个时期的总和)的性能相对较差。这是我拥有的东西的本质 运行,一旦我得到(非常漂亮!) 功能:
import numpy as np
import tensorflow as tf
from tensorflow.python.client import timeline
# Some dummy functions to compute "features" from the data
def compute_features( data ):
feature_functions = [
lambda x: test_sum( x, axis = 0 ),
lambda x: test_sum( x, axis = 1 ),
lambda x: test_sum( x, axis = 2 ),
]
return tf.convert_to_tensor( [ f( data ) for f in feature_functions ] )
def test_sum( data, axis = 0 ):
t, v = data
return tf.reduce_sum( v[:, axis] )
# Setup for using Timeline
sess = tf.Session()
run_options = tf.RunOptions( trace_level = tf.RunOptions.FULL_TRACE )
run_metadata = tf.RunMetadata()
# Some magic numbers for our dataset
test_sampling_rate = 5000.0
segment_size = int( 60 * test_sampling_rate )
# Load the dataset
with np.load( 'data.npz' ) as data:
t_raw = data['t']
v_raw = data['v']
# Build the iterator
full_dataset = tf.data.Dataset.from_tensor_slices( (t_raw, v_raw) ).batch( segment_size )
dataset_iterator = full_dataset.make_initializable_iterator()
next_datum = dataset_iterator.get_next()
sess.run( dataset_iterator.initializer )
i = 0
while True:
try:
print( sess.run( compute_features( next_datum ), options = run_options,
run_metadata = run_metadata ) )
# Write Timeline data to a file for analysis later
tl = timeline.Timeline( run_metadata.step_stats )
ctf = tl.generate_chrome_trace_format()
with open( 'timeline_{0}.json'.format( i ), 'w' ) as f:
f.write( ctf )
i += 1
except tf.errors.OutOfRangeError:
break
在 Chrome 中将其拉高,我观察到在每次迭代中,IteratorGetNext
都在消耗绝大部分时间:
Screenshot of Chrome displaying the timeline for one iteration
如您所见,计算的 "main" 部分被推入右侧的小点,而此循环的绝大部分时间都停留在 IteratorGetNext
.
我想知道就我构建图表的方式而言,我是否遗漏了任何明显的东西,这些东西会导致性能在此步骤中如此严重地下降。我有点困惑为什么这个设置表现如此糟糕。
如果 IteratorGetNext
在时间轴中显示为一个大事件,那么您的模型在输入处理方面存在瓶颈。在这种情况下,管道非常简单,但将 300,000 个元素复制到一个批次中是瓶颈。您可以通过向数据集定义添加 Dataset.prefetch(1)
转换来将此副本移出关键路径:
full_dataset = (tf.data.Dataset.from_tensor_slices((t_raw, v_raw))
.batch(segment_size)
.prefetch(1))
有关更多性能建议,请参阅 tensorflow.org 上的新 Input Pipeline Performance Guide。
PS。随着时间的推移,在循环中调用 compute_features(next_datum)
将导致您的图表增长,并且循环变慢。改写如下会更有效率:
next_computed_features = compute_features(next_datum)
while True:
try:
print(sess.run(next_computed_features, options=run_options,
run_metadata=run_metadata))
# ...
except tf.errors.OutOfRangeError:
break
在摆弄 TensorFlow 时,我注意到一个相对简单的任务(批处理我们的一些 3D 加速度计数据并获取每个时期的总和)的性能相对较差。这是我拥有的东西的本质 运行,一旦我得到(非常漂亮!)
import numpy as np
import tensorflow as tf
from tensorflow.python.client import timeline
# Some dummy functions to compute "features" from the data
def compute_features( data ):
feature_functions = [
lambda x: test_sum( x, axis = 0 ),
lambda x: test_sum( x, axis = 1 ),
lambda x: test_sum( x, axis = 2 ),
]
return tf.convert_to_tensor( [ f( data ) for f in feature_functions ] )
def test_sum( data, axis = 0 ):
t, v = data
return tf.reduce_sum( v[:, axis] )
# Setup for using Timeline
sess = tf.Session()
run_options = tf.RunOptions( trace_level = tf.RunOptions.FULL_TRACE )
run_metadata = tf.RunMetadata()
# Some magic numbers for our dataset
test_sampling_rate = 5000.0
segment_size = int( 60 * test_sampling_rate )
# Load the dataset
with np.load( 'data.npz' ) as data:
t_raw = data['t']
v_raw = data['v']
# Build the iterator
full_dataset = tf.data.Dataset.from_tensor_slices( (t_raw, v_raw) ).batch( segment_size )
dataset_iterator = full_dataset.make_initializable_iterator()
next_datum = dataset_iterator.get_next()
sess.run( dataset_iterator.initializer )
i = 0
while True:
try:
print( sess.run( compute_features( next_datum ), options = run_options,
run_metadata = run_metadata ) )
# Write Timeline data to a file for analysis later
tl = timeline.Timeline( run_metadata.step_stats )
ctf = tl.generate_chrome_trace_format()
with open( 'timeline_{0}.json'.format( i ), 'w' ) as f:
f.write( ctf )
i += 1
except tf.errors.OutOfRangeError:
break
在 Chrome 中将其拉高,我观察到在每次迭代中,IteratorGetNext
都在消耗绝大部分时间:
Screenshot of Chrome displaying the timeline for one iteration
如您所见,计算的 "main" 部分被推入右侧的小点,而此循环的绝大部分时间都停留在 IteratorGetNext
.
我想知道就我构建图表的方式而言,我是否遗漏了任何明显的东西,这些东西会导致性能在此步骤中如此严重地下降。我有点困惑为什么这个设置表现如此糟糕。
如果 IteratorGetNext
在时间轴中显示为一个大事件,那么您的模型在输入处理方面存在瓶颈。在这种情况下,管道非常简单,但将 300,000 个元素复制到一个批次中是瓶颈。您可以通过向数据集定义添加 Dataset.prefetch(1)
转换来将此副本移出关键路径:
full_dataset = (tf.data.Dataset.from_tensor_slices((t_raw, v_raw))
.batch(segment_size)
.prefetch(1))
有关更多性能建议,请参阅 tensorflow.org 上的新 Input Pipeline Performance Guide。
PS。随着时间的推移,在循环中调用 compute_features(next_datum)
将导致您的图表增长,并且循环变慢。改写如下会更有效率:
next_computed_features = compute_features(next_datum)
while True:
try:
print(sess.run(next_computed_features, options=run_options,
run_metadata=run_metadata))
# ...
except tf.errors.OutOfRangeError:
break