将子网内层的参数重用到该子网外的层

Reusing Parameters from a layer inside a subnetwork to a layer outside of that subnetwork

在我的网络结构中,我有一层 class "rec" 命名为 "output"。在该层的 "unit" 中,我有几个层,其中之一是 'pivot_target_embed_raw'。 'pivot_target_embed_raw' 层将从另一个检查点加载。我现在也想为我的 'source_embed_raw' 层使用 'pivot_target_embed_raw' 的参数,它在 'output' 的单元之外,而是我的网络中具有相同 'network depth' 的层作为 'output'。 在我的配置中,我现在尝试了两件事,都导致了不同的错误: 1. 对于参数 'reuse_params': {'map': {'W': {'reuse_layer': 'pivot_target_embed_raw'}, 'b': None}} 导致以下错误(发布错误的一部分,因为我认为这里的简单问题是调用 pivot_target_embed_raw 的方式所以很可能看 2.)

  File "/u/hilmes/returnn/TFNetworkLayer.py", line 448, in transform_config_dict
    line: for src_name in src_names
    locals:
      src_name = <not found>
      src_names = <local> ['source_embed_raw'], _[0]: {len = 16}
  File "/u/hilmes/returnn/TFNetworkLayer.py", line 449, in <listcomp>
    line: d["sources"] = [
            get_layer(src_name)
            for src_name in src_names
            if not src_name == "none"]
    locals:
      d = <not found>
      get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x7f781e7a6d90>
      src_name = <local> 'source_embed_raw', len = 16
      src_names = <not found>
  File "/u/hilmes/returnn/TFNetwork.py", line 607, in get_layer
    line: return self.construct_layer(net_dict=net_dict, name=src_name)  # set get_layer to wrap construct_layer
    locals:
      self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      net_dict = <local> {'dec_03_att_key0': {'from': ['encoder'], 'class': 'linear', 'with_bias': False, 'n_out': 512, 'activation': None, 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)"}, 'enc_06_self_att_lin': {'from': ['enc_06_self_att_att'], 'class': 'linear',..., len = 98
      name = <not found>
      src_name = <local> 'source_embed_raw', len = 16
  File "/u/hilmes/returnn/TFNetwork.py", line 652, in construct_layer
    line: layer_class.transform_config_dict(layer_desc, network=self, get_layer=get_layer)
    locals:
      layer_class = <local> <class 'TFNetworkLayer.LinearLayer'>
      layer_class.transform_config_dict = <local> <bound method LayerBase.transform_config_dict of <class 'TFNetworkLayer.LinearLayer'>>
      layer_desc = <local> {'reuse_params': {'map': {'W': {'reuse_layer': 'pivot_target_embed_raw'}, 'b': None}}, 'with_bias': False, 'n_out': 512, 'sources': [<SourceLayer 'data:data' out_type=Data(shape=(None,), dtype='int32', sparse=True, dim=35356, batch_shape_meta=[B,T|'time:var:extern_data:data'])>], 'activation': None}
      network = <not found>
      self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x7f781e7a6ea0>
  File "/u/hilmes/returnn/TFNetworkLayer.py", line 456, in transform_config_dict
    line: d["reuse_params"] = ReuseParams.from_config_dict(d["reuse_params"], network=network, get_layer=get_layer)
    locals:
      d = <local> {'reuse_params': {'map': {'W': {'reuse_layer': 'pivot_target_embed_raw'}, 'b': None}}, 'with_bias': False, 'n_out': 512, 'sources': [<SourceLayer 'data:data' out_type=Data(shape=(None,), dtype='int32', sparse=True, dim=35356, batch_shape_meta=[B,T|'time:var:extern_data:data'])>], 'activation': None}
      ReuseParams = <global> <class 'TFNetworkLayer.ReuseParams'>
      ReuseParams.from_config_dict = <global> <bound method ReuseParams.from_config_dict of <class 'TFNetworkLayer.ReuseParams'>>
      network = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x7f781e7a6ea0>
  File "/u/hilmes/returnn/TFNetworkLayer.py", line 1386, in from_config_dict
    line: value["reuse_layer"] = optional_get_layer(value["reuse_layer"])
    locals:
      value = <local> {'reuse_layer': 'pivot_target_embed_raw'}
      optional_get_layer = <local> <function ReuseParams.from_config_dict.<locals>.optional_get_layer at 0x7f781e7a6f28>
  File "/u/hilmes/returnn/TFNetworkLayer.py", line 1362, in optional_get_layer
    line: return get_layer(layer_name)
    locals:
      get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x7f781e7a6ea0>
      layer_name = <local> 'pivot_target_embed_raw', len = 22
  File "/u/hilmes/returnn/TFNetwork.py", line 607, in get_layer
    line: return self.construct_layer(net_dict=net_dict, name=src_name)  # set get_layer to wrap construct_layer
    locals:
      self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      net_dict = <local> {'dec_03_att_key0': {'from': ['encoder'], 'class': 'linear', 'with_bias': False, 'n_out': 512, 'activation': None, 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)"}, 'enc_06_self_att_lin': {'from': ['enc_06_self_att_att'], 'class': 'linear',..., len = 98
      name = <not found>
      src_name = <local> 'pivot_target_embed_raw', len = 22
  File "/u/hilmes/returnn/TFNetwork.py", line 643, in construct_layer
    line: raise LayerNotFound("layer %r not found in %r" % (name, self))
    locals:
      LayerNotFound = <global> <class 'TFNetwork.LayerNotFound'>
      name = <local> 'pivot_target_embed_raw', len = 22
      self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
LayerNotFound: layer 'pivot_target_embed_raw' not found in <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>

在第二次尝试中,我将代码更改为 'reuse_params': {'map': {'W': {'reuse_layer': 'output/pivot_target_embed_raw'}, 'b': None}} 我再次得到一个非常长的堆栈跟踪,开头为:

ReuseParams: layer 'output/pivot_target_embed_raw' does not exist yet and there is a dependency loop, thus creating it on dummy inputs now
Exception creating layer root/'source_embed_raw' of class LinearLayer with opts:
{'activation': None,
 'n_out': 512,
 'name': 'source_embed_raw',
 'network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 'output': Data(name='source_embed_raw_output', shape=(None, 512), batch_shape_meta=[B,T|'time:var:extern_data:data',F|512]),
 'reuse_params': <TFNetworkLayer.ReuseParams object at 0x7fcb3e959ac8>,
 'sources': [<SourceLayer 'data:data' out_type=Data(shape=(None,), dtype='int32', sparse=True, dim=35356, batch_shape_meta=[B,T|'time:var:extern_data:data'])>],
 'with_bias': False}
EXCEPTION
layer root/'source_embed_raw' output: Data(name='source_embed_raw_output', shape=(None, 512), batch_shape_meta=[B,T|'time:var:extern_data:data',F|512])
ReuseParams: layer 'output/pivot_target_embed_raw' does not exist yet and there is a dependency loop, thus creating it on dummy inputs now
Exception creating layer root/'source_embed_raw' of class LinearLayer with opts:
{'activation': None,
 'n_out': 512,
 'name': 'source_embed_raw',
 'network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 'output': Data(name='source_embed_raw_output', shape=(None, 512), batch_shape_meta=[B,T|'time:var:extern_data:data',F|512]),
 'reuse_params': <TFNetworkLayer.ReuseParams object at 0x7fcb3e60e7f0>,
 'sources': [<SourceLayer 'data:data' out_type=Data(shape=(None,), dtype='int32', sparse=True, dim=35356, batch_shape_meta=[B,T|'time:var:extern_data:data'])>],
 'with_bias': False}
Traceback (most recent call last):

并以:

结尾
  File "/u/hilmes/opt/returnn/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1220, in get_variable
    line: return var_store.get_variable(
              full_name,
              shape=shape,
              dtype=dtype,
              initializer=initializer,
              regularizer=regularizer,
              reuse=reuse,
              trainable=trainable,
              collections=collections,
              caching_device=caching_device,
              partitioner=partitioner,
              validate_shape=validate_shape,
              use_resource=use_resource,
              custom_getter=custom_getter,
              constraint=constraint,
              synchronization=synchronization,
              aggregation=aggregation)
    locals:
      var_store = <local> <tensorflow.python.ops.variable_scope._VariableStore object at 0x7fca58cac198>
      var_store.get_variable = <local> <bound method _VariableStore.get_variable of <tensorflow.python.ops.variable_scope._VariableStore object at 0x7fca58cac198>>
      full_name = <local> 'source_embed_raw/W', len = 18
      shape = <local> (35356, 512)
      dtype = <local> tf.float32
      initializer = <local> <tensorflow.python.ops.init_ops.GlorotUniform object at 0x7fcb3e96a7b8>
      regularizer = <local> None
      reuse = <local> <_ReuseMode.AUTO_REUSE: 1>
      trainable = <local> None
      collections = <local> None
      caching_device = <local> None
      partitioner = <local> None
      validate_shape = <local> True
      use_resource = <local> None
      custom_getter = <local> <function ReuseParams.get_variable_scope.<locals>._variable_custom_getter at 0x7fcb3e9616a8>
      constraint = <local> None
      synchronization = <local> <VariableSynchronization.AUTO: 0>
      aggregation = <local> <VariableAggregation.NONE: 0>
  File "/u/hilmes/opt/returnn/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 530, in get_variable
    line: return custom_getter(**custom_getter_kwargs)
    locals:
      custom_getter = <local> <function ReuseParams.get_variable_scope.<locals>._variable_custom_getter at 0x7fcb3e9616a8>
      custom_getter_kwargs = <local> {'use_resource': None, 'caching_device': None, 'collections': None, 'shape': (35356, 512), 'initializer': <tensorflow.python.ops.init_ops.GlorotUn
iform object at 0x7fcb3e96a7b8>, 'name': 'source_embed_raw/W', 'synchronization': <VariableSynchronization.AUTO: 0>, 'validate_shape': True, 'getter': ..., len = 16
  File "/u/hilmes/returnn/TFNetworkLayer.py", line 1537, in _variable_custom_getter
    line: return self.variable_custom_getter(base_layer=base_layer, **kwargs_)
    locals:
      self = <local> <TFNetworkLayer.ReuseParams object at 0x7fcb3e959ac8>
      self.variable_custom_getter = <local> <bound method ReuseParams.variable_custom_getter of <TFNetworkLayer.ReuseParams object at 0x7fcb3e959ac8>>
      base_layer = <local> <LinearLayer 'source_embed_raw' out_type=Data(shape=(None, 512), batch_shape_meta=[B,T|'time:var:extern_data:data',F|512])>
      kwargs_ = <local> {'aggregation': <VariableAggregation.NONE: 0>, 'partitioner': None, 'caching_device': None, 'use_resource': None, 'getter': <function _VariableStore.get_variable.<locals>._true_getter at 0x7fcb3e961730>, 'name': 'source_embed_raw/W', 'synchronization': <VariableSynchronization.AUTO: 0>, 'validate..., len = 16
  File "/u/hilmes/returnn/TFNetworkLayer.py", line 1575, in variable_custom_getter
    line: return self.param_map[param_name].variable_custom_getter(
            getter=getter, name=name, base_layer=base_layer, **kwargs)
    locals:
      self = <local> <TFNetworkLayer.ReuseParams object at 0x7fcb3e959ac8>
      self.param_map = <local> {'W': <TFNetworkLayer.ReuseParams object at 0x7fcb3e959c18>, 'b': <TFNetworkLayer.ReuseParams object at 0x7fcb3e959dd8>}
      param_name = <local> 'W'
      variable_custom_getter = <not found>
      getter = <local> <function _VariableStore.get_variable.<locals>._true_getter at 0x7fcb3e961730>
      name = <local> 'source_embed_raw/W', len = 18
      base_layer = <local> <LinearLayer 'source_embed_raw' out_type=Data(shape=(None, 512), batch_shape_meta=[B,T|'time:var:extern_data:data',F|512])>
      kwargs = <local> {'partitioner': None, 'caching_device': None, 'use_resource': None, 'dtype': tf.float32, 'synchronization': <VariableSynchronization.AUTO: 0>, 'validate_shape': True, 'initializer': <tensorflow.python.ops.init_ops.GlorotUniform object at 0x7fcb3e96a7b8>, 'regularizer': None, 'constraint': None, '..., len = 14
  File "/u/hilmes/returnn/TFNetworkLayer.py", line 1576, in variable_custom_getter
    line: if self.reuse_layer:
    locals:
      self = <local> <TFNetworkLayer.ReuseParams object at 0x7fcb3e959c18>
      self.reuse_layer = <local> !KeyError: 'output/pivot_target_embed_raw'
  File "/u/hilmes/returnn/TFNetworkLayer.py", line 1495, in reuse_layer
    line: self._reuse_layer = self._reuse_layer.get_layer()
    locals:
      self = <local> <TFNetworkLayer.ReuseParams object at 0x7fcb3e959c18>
      self._reuse_layer = <local> <TFNetworkLayer.ReuseParams.LazyLayerResolver object at 0x7fcb3e959b38>
      self._reuse_layer.get_layer = <local> <bound method ReuseParams.LazyLayerResolver.get_layer of <TFNetworkLayer.ReuseParams.LazyLayerResolver object at 0x7fcb3e959b38>>
  File "/u/hilmes/returnn/TFNetworkLayer.py", line 1424, in get_layer
    line: return self.create_dummy_layer(dep_loop_exception=exc)
    locals:
      self = <local> <TFNetworkLayer.ReuseParams.LazyLayerResolver object at 0x7fcb3e959b38>
      self.create_dummy_layer = <local> <bound method ReuseParams.LazyLayerResolver.create_dummy_layer of <TFNetworkLayer.ReuseParams.LazyLayerResolver object at 0x7fcb3e959b38>>
      dep_loop_exception = <not found>
      exc = <not found>
  File "/u/hilmes/returnn/TFNetworkLayer.py", line 1467, in create_dummy_layer
    line: layer_desc = dep_loop_exception.net_dict[self.layer_name].copy()
    locals:
      layer_desc = <not found>
      dep_loop_exception = <local> NetworkConstructionDependencyLoopException("Error: There is a dependency loop on layer 'output'.\nConstruction stack (most recent first):\n  source_embed_weighted\n  source_embed_with_pos\n  source_embed\n  enc_01_self_att_out\n  enc_01_ff_out\n  enc_01\n  enc_02_self_att_out\n  enc_02_ff_out\n  ...
      dep_loop_exception.net_dict = <local> {'enc_06_self_att_laynorm': {'class': 'layer_norm', 'from': ['enc_05']}, 'source_embed_weighted': {'class': 'eval', 'from': ['source_embed_raw'], 'eval': 'source(0) * 22.627417'}, 'enc_01_ff_drop': {'dropout': 0.1, 'class': 'dropout', 'from': ['enc_01_ff_conv2']}, 'enc_05_ff_drop': {'dropout': 0...., len = 98
      self = <local> <TFNetworkLayer.ReuseParams.LazyLayerResolver object at 0x7fcb3e959b38>
      self.layer_name = <local> 'output/pivot_target_embed_raw', len = 29
      copy = <not found>
KeyError: 'output/pivot_target_embed_raw'

函数 create_dummy_layer 是否有可能无法处理作为子网一部分的图层,或者我使用的 reuse_parameters 有误吗?

编辑:配置的缩小版本:

network = { 'dec_01_att_key': {'axis': 'F', 'class': 'split_dims', 'dims': (8, 64), 'from': ['dec_01_att_key0']},
  'dec_01_att_key0': { 'activation': None,
                       'class': 'linear',
                       'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
                       'from': ['encoder'],
                       'n_out': 512,
                       'with_bias': False},
  'dec_01_att_value': {'axis': 'F', 'class': 'split_dims', 'dims': (8, 64), 'from': ['dec_01_att_value0']},
  'dec_01_att_value0': { 'activation': None,
                         'class': 'linear',
                         'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
                         'from': ['encoder'],
                         'n_out': 512,
                         'with_bias': False},
  'decision': {'class': 'decide', 'from': ['output'], 'loss': 'edit_distance', 'loss_opts': {}, 'target': 'classes'},
  'enc_01': {'class': 'copy', 'from': ['enc_01_ff_out']},
  'enc_01_ff_conv1': { 'activation': 'relu',
                       'class': 'linear',
                       'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
                       'from': ['enc_01_ff_laynorm'],
                       'n_out': 2048,
                       'with_bias': True},
  'enc_01_ff_conv2': { 'activation': None,
                       'class': 'linear',
                       'dropout': 0.1,
                       'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
                       'from': ['enc_01_ff_conv1'],
                       'n_out': 512,
                       'with_bias': True},
  'enc_01_ff_drop': {'class': 'dropout', 'dropout': 0.1, 'from': ['enc_01_ff_conv2']},
  'enc_01_ff_laynorm': {'class': 'layer_norm', 'from': ['enc_01_self_att_out']},
  'enc_01_ff_out': {'class': 'combine', 'from': ['enc_01_self_att_out', 'enc_01_ff_drop'], 'kind': 'add', 'n_out': 512},
  'enc_01_self_att_att': { 'attention_dropout': 0.1,
                           'attention_left_only': False,
                           'class': 'self_attention',
                           'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
                           'from': ['enc_01_self_att_laynorm'],
                           'n_out': 512,
                           'num_heads': 8,
                           'total_key_dim': 512},
  'enc_01_self_att_drop': {'class': 'dropout', 'dropout': 0.1, 'from': ['enc_01_self_att_lin']},
  'enc_01_self_att_laynorm': {'class': 'layer_norm', 'from': ['source_embed']},
  'enc_01_self_att_lin': { 'activation': None,
                           'class': 'linear',
                           'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
                           'from': ['enc_01_self_att_att'],
                           'n_out': 512,
                           'with_bias': False},
  'enc_01_self_att_out': {'class': 'combine', 'from': ['source_embed', 'enc_01_self_att_drop'], 'kind': 'add', 'n_out': 512},
  'encoder': {'class': 'layer_norm', 'from': ['enc_01']},
  'output': { 'class': 'rec',
              'from': [],
              'max_seq_len': "max_len_from('base:encoder') * 3",
              'target': 'classes',
              'unit': { 'dec_01': {'class': 'copy', 'from': ['dec_01_ff_out']},
                        'dec_01_att0': {'base': 'base:dec_01_att_value', 'class': 'generic_attention', 'weights': 'dec_01_att_weights_drop'},
                        'dec_01_att_att': {'axes': 'static', 'class': 'merge_dims', 'from': ['dec_01_att0']},
                        'dec_01_att_drop': {'class': 'dropout', 'dropout': 0.1, 'from': ['dec_01_att_lin']},
                        'dec_01_att_energy': { 'class': 'dot',
                                               'from': ['base:dec_01_att_key', 'dec_01_att_query'],
                                               'red1': -1,
                                               'red2': -1,
                                               'var1': 'T',
                                               'var2': 'T?'},
                        'dec_01_att_laynorm': {'class': 'layer_norm', 'from': ['dec_01_self_att_out']},
                        'dec_01_att_lin': { 'activation': None,
                                            'class': 'linear',
                                            'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
                                            'from': ['dec_01_att_att'],
                                            'n_out': 512,
                                            'with_bias': False},
                        'dec_01_att_out': {'class': 'combine', 'from': ['dec_01_self_att_out', 'dec_01_att_drop'], 'kind': 'add', 'n_out': 512},
                        'dec_01_att_query': {'axis': 'F', 'class': 'split_dims', 'dims': (8, 64), 'from': ['dec_01_att_query0']},
                        'dec_01_att_query0': { 'activation': None,
                                               'class': 'linear',
                                               'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', "
                                                                       'scale=1.0)',
                                               'from': ['dec_01_att_laynorm'],
                                               'n_out': 512,
                                               'with_bias': False},
                        'dec_01_att_weights': {'class': 'softmax_over_spatial', 'energy_factor': 0.125, 'from': ['dec_01_att_energy']},
                        'dec_01_att_weights_drop': { 'class': 'dropout',
                                                     'dropout': 0.1,
                                                     'dropout_noise_shape': {'*': None},
                                                     'from': ['dec_01_att_weights']},
                        'dec_01_ff_conv1': { 'activation': 'relu',
                                             'class': 'linear',
                                             'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
                                             'from': ['dec_01_ff_laynorm'],
                                             'n_out': 2048,
                                             'with_bias': True},
                        'dec_01_ff_conv2': { 'activation': None,
                                             'class': 'linear',
                                             'dropout': 0.1,
                                             'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
                                             'from': ['dec_01_ff_conv1'],
                                             'n_out': 512,
                                             'with_bias': True},
                        'dec_01_ff_drop': {'class': 'dropout', 'dropout': 0.1, 'from': ['dec_01_ff_conv2']},
                        'dec_01_ff_laynorm': {'class': 'layer_norm', 'from': ['dec_01_att_out']},
                        'dec_01_ff_out': {'class': 'combine', 'from': ['dec_01_att_out', 'dec_01_ff_drop'], 'kind': 'add', 'n_out': 512},
                        'dec_01_self_att_att': { 'attention_dropout': 0.1,
                                                 'attention_left_only': True,
                                                 'class': 'self_attention',
                                                 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', "
                                                                         'scale=1.0)',
                                                 'from': ['dec_01_self_att_laynorm'],
                                                 'n_out': 512,
                                                 'num_heads': 8,
                                                 'total_key_dim': 512},
                        'dec_01_self_att_drop': {'class': 'dropout', 'dropout': 0.1, 'from': ['dec_01_self_att_lin']},
                        'dec_01_self_att_laynorm': {'class': 'layer_norm', 'from': ['target_embed']},
                        'dec_01_self_att_lin': { 'activation': None,
                                                 'class': 'linear',
                                                 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', "
                                                                         'scale=1.0)',
                                                 'from': ['dec_01_self_att_att'],
                                                 'n_out': 512,
                                                 'with_bias': False},
                        'dec_01_self_att_out': {'class': 'combine', 'from': ['target_embed', 'dec_01_self_att_drop'], 'kind': 'add', 'n_out': 512},
                        'decoder': {'class': 'layer_norm', 'from': ['dec_01']},
                        'end': {'class': 'compare', 'from': ['output'], 'value': 0},
                        'output': {'beam_size': 12, 'class': 'choice', 'from': ['output_prob'], 'initial_output': 0, 'target': 'classes'},
                        'output_prob': { 'class': 'softmax',
                                         'dropout': 0.0,
                                         'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
                                         'from': ['decoder'],
                                         'loss': 'ce',
                                         'loss_opts': {'use_normalized_loss': True},
                                         'reuse_params': {'map': {'W': {'custom': None, 'reuse_layer': 'target_embed_raw'}, 'b': None}},
                                         'target': 'classes',
                                         'with_bias': True},
                        'target_embed': {'class': 'dropout', 'dropout': 0.1, 'from': ['target_embed_with_pos']},
                        'target_embed_raw': { 'activation': None,
                                              'class': 'linear',
                                              'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', "
                                                                      'scale=1.0)',
                                              'from': ['prev:output'],
                                              'n_out': 512,
                                              'with_bias': False},
                        'target_embed_weighted': {'class': 'eval', 'eval': 'source(0) * 22.627417', 'from': ['target_embed_raw'], 'trainable': False},
                        'target_embed_with_pos': { 'add_to_input': True,
                                                   'class': 'positional_encoding',
                                                   'from': ['target_embed_weighted']}},
                        'pivot_target_embed_raw': { 'activation': None,
                                              'class': 'linear',
                                              'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', "
                                                                      'scale=1.0)',
                                              #'from': ['prev:output'],
                                              'n_out': 512,
                                              'trainable': False,
                                              'with_bias': False}
                        },
  'source_embed': {'class': 'dropout', 'dropout': 0.1, 'from': ['source_embed_with_pos']},
  'source_embed_raw': { 'activation': None,
                              'class': 'linear',
                              #'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=1.0)",
                              'from': ['data:data'],
                              'n_out': 512,
                              'with_bias': False,
                              #'reuse_params': {'map': {'W': {'reuse_layer': 'pivot_source_embed_raw'}, 'b': None}},
                              'reuse_params': {'map': {'W': {'reuse_layer': 'output/pivot_target_embed_raw'}, 'b': None}}
                      },
  'source_embed_weighted': {'class': 'eval', 'eval': 'source(0) * 22.627417', 'from': ['source_embed_raw']},
  'source_embed_with_pos': {'add_to_input': True, 'class': 'positional_encoding', 'from': ['source_embed_weighted']}}

pivot_file = [Pathplaceholder] 
pivot_prefix = 'pivot_'
preload_from_files = {}
if not task == "search":
    preload_from_files = {
    "pivot" : {"filename": pivot_file, "prefix": pivot_prefix, "init_for_train": True},
    }

异常发生是因为你有一个循环依赖。

source_embed_raw 取决于 output(通过 reuse_params)。

output 取决于 dec_01_att_value 这取决于 encoder 这取决于 source_embed_raw.

这是一个循环依赖。这是不允许的。