TensorFlow 2.x: Cannot save trained model in h5 format (OSError: Unable to create link (name already exists))
TensorFlow 2.x: Cannot save trained model in h5 format (OSError: Unable to create link (name already exists))
我的模型使用预处理数据来预测客户是私人客户还是非私人客户。预处理步骤使用 feature_column.bucketized_column(…)、feature_column.embedding_column(…) 等步骤。
训练结束后,我试图保存模型,但出现以下错误:
File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5o.pyx", line 202, in h5py.h5o.link
OSError: Unable to create link (name already exists)
我尝试了以下方法来解决我的问题:
- 我试图排除这里提到的优化器:https://github.com/tensorflow/tensorflow/issues/27688。
- 我尝试了不同版本的 TensorFlow,例如 2.2 和 2.3。
- 我尝试按照此处所述重新安装 h5py:RuntimeError: Unable to create link (name already exists) when I append hdf5 file?。
一事无成!
模型相关代码如下:
(feature_columns, train_ds, val_ds, test_ds) = preprocessing.getPreProcessedDatasets(args.data, args.zip, args.batchSize)
feature_layer = tf.keras.layers.DenseFeatures(feature_columns, trainable=False)
model = tf.keras.models.Sequential([
feature_layer,
tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])
model.compile(optimizer='sgd',
loss='binary_crossentropy',
metrics=['accuracy'])
paramString = "Arg-e{}-b{}-z{}".format(args.epoch, args.batchSize, bucketSizeGEO)
...
model.fit(train_ds,
validation_data=val_ds,
epochs=args.epoch,
callbacks=[tensorboard_callback])
model.summary()
loss, accuracy = model.evaluate(test_ds)
print("Accuracy", accuracy)
paramString = paramString + "-a{:.4f}".format(accuracy)
outputName = "logReg" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") + paramStrin
if args.saveModel:
filepath = "./saved_models/" + outputName + ".h5"
model.save(filepath, save_format='h5')
预处理模块中调用的函数:
def getPreProcessedDatasets(filepath, zippath, batch_size, bucketSizeGEO):
print("start preprocessing...")
path = filepath
data = pd.read_csv(path, dtype={
"NAME1": np.str_,
"NAME2": np.str_,
"EMAIL1": np.str_,
"ZIP": np.str_,
"STREET": np.str_,
"LONGITUDE":np.floating,
"LATITUDE": np.floating,
"RECEIVERTYPE": np.int64})
feature_columns = []
data = data.fillna("NaN")
data = __preProcessName(data)
data = __preProcessStreet(data)
train, test = train_test_split(data, test_size=0.2, random_state=0)
train, val = train_test_split(train, test_size=0.2, random_state=0)
train_ds = __df_to_dataset(train, batch_size=batch_size)
val_ds = __df_to_dataset(val, shuffle=False, batch_size=batch_size)
test_ds = __df_to_dataset(test, shuffle=False, batch_size=batch_size)
__buildFeatureColums(feature_columns, data, zippath, bucketSizeGEO, True)
print("preprocessing completed")
return (feature_columns, train_ds, val_ds, test_ds)
调用特征的不同预处理函数:
def __buildFeatureColums(feature_columns, data, zippath, bucketSizeGEO, addCrossedFeatures):
feature_columns.append(__getFutureColumnLon(bucketSizeGEO))
feature_columns.append(__getFutureColumnLat(bucketSizeGEO))
(namew1_one_hot, namew2_one_hot) = __getFutureColumnsName(__getNumberOfWords(data, 'NAME1PRO'))
feature_columns.append(namew1_one_hot)
feature_columns.append(namew2_one_hot)
feature_columns.append(__getFutureColumnStreet(__getNumberOfWords(data, 'STREETPRO')))
feature_columns.append(__getFutureColumnZIP(2223, zippath))
if addCrossedFeatures:
feature_columns.append(__getFutureColumnCrossedNames(100))
feature_columns.append(__getFutureColumnCrossedZIPStreet(100, 2223, zippath))
与嵌入相关的函数:
def __getFutureColumnsName(name_num_words):
vocabulary_list = np.arange(0, name_num_words + 1, 1).tolist()
namew1_voc = tf.feature_column.categorical_column_with_vocabulary_list(
key='NAME1W1', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)
namew2_voc = tf.feature_column.categorical_column_with_vocabulary_list(
key='NAME1W2', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)
dim = __getNumberOfDimensions(name_num_words)
namew1_embedding = feature_column.embedding_column(namew1_voc, dimension=dim)
namew2_embedding = feature_column.embedding_column(namew2_voc, dimension=dim)
return (namew1_embedding, namew2_embedding)
def __getFutureColumnStreet(street_num_words):
vocabulary_list = np.arange(0, street_num_words + 1, 1).tolist()
street_voc = tf.feature_column.categorical_column_with_vocabulary_list(
key='STREETW', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)
dim = __getNumberOfDimensions(street_num_words)
street_embedding = feature_column.embedding_column(street_voc, dimension=dim)
return street_embedding
def __getFutureColumnZIP(zip_num_words, zippath):
zip_voc = feature_column.categorical_column_with_vocabulary_file(
key='ZIP', vocabulary_file=zippath, vocabulary_size=zip_num_words,
default_value=0)
dim = __getNumberOfDimensions(zip_num_words)
zip_embedding = feature_column.embedding_column(zip_voc, dimension=dim)
return zip_embedding
以h5格式保存模型时的错误OSError: Unable to create link (name already exists)
是由于一些重复的变量名引起的。 for i, w in enumerate(model.weights): print(i, w.name)
检查表明它们是 embedding_weights 个名字。
通常,在构建 feature_column
时,传入每个特征列的不同 key
将用于构建不同的变量 name
。这在 TF 2.1 中工作正常,但在 TF 2.2 和 2.3 中出现故障,据推测 fixed in TF 2.4 nigthly.
我的 TF 2.3 解决方法是基于@SajanGohil 的评论,但我的问题是 weight 名称(不是 layer 名称):
for i in range(len(model.weights)):
model.weights[i]._handle_name = model.weights[i].name + "_" + str(i)
同样的警告适用:这种方法操纵 TF 内部结构,因此不是面向未来的。
我发现当我从模型检查点加载模型时也会出现这种情况,model.compile 它具有相同的优化器、指标和损失函数,并对其进行训练。
但是如果我避免用相同的参数再次编译它,这个错误信息就不会再出现了。
我的模型使用预处理数据来预测客户是私人客户还是非私人客户。预处理步骤使用 feature_column.bucketized_column(…)、feature_column.embedding_column(…) 等步骤。 训练结束后,我试图保存模型,但出现以下错误:
File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5o.pyx", line 202, in h5py.h5o.link
OSError: Unable to create link (name already exists)
我尝试了以下方法来解决我的问题:
- 我试图排除这里提到的优化器:https://github.com/tensorflow/tensorflow/issues/27688。
- 我尝试了不同版本的 TensorFlow,例如 2.2 和 2.3。
- 我尝试按照此处所述重新安装 h5py:RuntimeError: Unable to create link (name already exists) when I append hdf5 file?。
一事无成!
模型相关代码如下:
(feature_columns, train_ds, val_ds, test_ds) = preprocessing.getPreProcessedDatasets(args.data, args.zip, args.batchSize)
feature_layer = tf.keras.layers.DenseFeatures(feature_columns, trainable=False)
model = tf.keras.models.Sequential([
feature_layer,
tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])
model.compile(optimizer='sgd',
loss='binary_crossentropy',
metrics=['accuracy'])
paramString = "Arg-e{}-b{}-z{}".format(args.epoch, args.batchSize, bucketSizeGEO)
...
model.fit(train_ds,
validation_data=val_ds,
epochs=args.epoch,
callbacks=[tensorboard_callback])
model.summary()
loss, accuracy = model.evaluate(test_ds)
print("Accuracy", accuracy)
paramString = paramString + "-a{:.4f}".format(accuracy)
outputName = "logReg" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") + paramStrin
if args.saveModel:
filepath = "./saved_models/" + outputName + ".h5"
model.save(filepath, save_format='h5')
预处理模块中调用的函数:
def getPreProcessedDatasets(filepath, zippath, batch_size, bucketSizeGEO):
print("start preprocessing...")
path = filepath
data = pd.read_csv(path, dtype={
"NAME1": np.str_,
"NAME2": np.str_,
"EMAIL1": np.str_,
"ZIP": np.str_,
"STREET": np.str_,
"LONGITUDE":np.floating,
"LATITUDE": np.floating,
"RECEIVERTYPE": np.int64})
feature_columns = []
data = data.fillna("NaN")
data = __preProcessName(data)
data = __preProcessStreet(data)
train, test = train_test_split(data, test_size=0.2, random_state=0)
train, val = train_test_split(train, test_size=0.2, random_state=0)
train_ds = __df_to_dataset(train, batch_size=batch_size)
val_ds = __df_to_dataset(val, shuffle=False, batch_size=batch_size)
test_ds = __df_to_dataset(test, shuffle=False, batch_size=batch_size)
__buildFeatureColums(feature_columns, data, zippath, bucketSizeGEO, True)
print("preprocessing completed")
return (feature_columns, train_ds, val_ds, test_ds)
调用特征的不同预处理函数:
def __buildFeatureColums(feature_columns, data, zippath, bucketSizeGEO, addCrossedFeatures):
feature_columns.append(__getFutureColumnLon(bucketSizeGEO))
feature_columns.append(__getFutureColumnLat(bucketSizeGEO))
(namew1_one_hot, namew2_one_hot) = __getFutureColumnsName(__getNumberOfWords(data, 'NAME1PRO'))
feature_columns.append(namew1_one_hot)
feature_columns.append(namew2_one_hot)
feature_columns.append(__getFutureColumnStreet(__getNumberOfWords(data, 'STREETPRO')))
feature_columns.append(__getFutureColumnZIP(2223, zippath))
if addCrossedFeatures:
feature_columns.append(__getFutureColumnCrossedNames(100))
feature_columns.append(__getFutureColumnCrossedZIPStreet(100, 2223, zippath))
与嵌入相关的函数:
def __getFutureColumnsName(name_num_words):
vocabulary_list = np.arange(0, name_num_words + 1, 1).tolist()
namew1_voc = tf.feature_column.categorical_column_with_vocabulary_list(
key='NAME1W1', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)
namew2_voc = tf.feature_column.categorical_column_with_vocabulary_list(
key='NAME1W2', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)
dim = __getNumberOfDimensions(name_num_words)
namew1_embedding = feature_column.embedding_column(namew1_voc, dimension=dim)
namew2_embedding = feature_column.embedding_column(namew2_voc, dimension=dim)
return (namew1_embedding, namew2_embedding)
def __getFutureColumnStreet(street_num_words):
vocabulary_list = np.arange(0, street_num_words + 1, 1).tolist()
street_voc = tf.feature_column.categorical_column_with_vocabulary_list(
key='STREETW', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)
dim = __getNumberOfDimensions(street_num_words)
street_embedding = feature_column.embedding_column(street_voc, dimension=dim)
return street_embedding
def __getFutureColumnZIP(zip_num_words, zippath):
zip_voc = feature_column.categorical_column_with_vocabulary_file(
key='ZIP', vocabulary_file=zippath, vocabulary_size=zip_num_words,
default_value=0)
dim = __getNumberOfDimensions(zip_num_words)
zip_embedding = feature_column.embedding_column(zip_voc, dimension=dim)
return zip_embedding
以h5格式保存模型时的错误OSError: Unable to create link (name already exists)
是由于一些重复的变量名引起的。 for i, w in enumerate(model.weights): print(i, w.name)
检查表明它们是 embedding_weights 个名字。
通常,在构建 feature_column
时,传入每个特征列的不同 key
将用于构建不同的变量 name
。这在 TF 2.1 中工作正常,但在 TF 2.2 和 2.3 中出现故障,据推测 fixed in TF 2.4 nigthly.
我的 TF 2.3 解决方法是基于@SajanGohil 的评论,但我的问题是 weight 名称(不是 layer 名称):
for i in range(len(model.weights)):
model.weights[i]._handle_name = model.weights[i].name + "_" + str(i)
同样的警告适用:这种方法操纵 TF 内部结构,因此不是面向未来的。
我发现当我从模型检查点加载模型时也会出现这种情况,model.compile 它具有相同的优化器、指标和损失函数,并对其进行训练。 但是如果我避免用相同的参数再次编译它,这个错误信息就不会再出现了。