如何仅将 CPU 用于嵌入?
How to use CPU only for Embedding?
我需要避免这个错误:tensorflow.python error.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized
。和我3060内存的获取有关,为了避免它,我不得不在CPU上做Embedding
层计算,但是怎么做呢?我在 CPU 上尝试了 运行 完整模型,它工作正常,但非常慢。例如,如果我将所有层中的神经元数量减少到 128,那么我可以使用 8000 个句子(data_list[:8000]
而不是下面的 6000 个句子)进行训练,但我有 ~ 20000 个。
我的模特:
class CPUEmbedding(Embedding):
@tf_utils.shape_type_conversion
def build(self, input_shape):
with ops.device('cpu:0'):
self.embeddings = self.add_weight(
shape=(self.input_dim, self.output_dim),
initializer=self.embeddings_initializer,
name='embeddings',
regularizer=self.embeddings_regularizer,
constraint=self.embeddings_constraint)
self.built = True
print('Embedding starts on cpu')
model = Sequential()
model.add(CPUEmbedding(19260, 256, input_length=163))
model.add(LSTM(256, return_sequences=True)) # the output will be a sequence of the same length
model.add(Dropout(0.2))
model.add(LSTM(512))
model.add(Dropout(0.2))
model.add(Dense(self.total_words, activation='softmax'))
adam = Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['acc'])
模型摘要:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
cpu_embedding (CPUEmbedding (None, 163, 256) 4930560
)
lstm (LSTM) (None, 163, 256) 525312
dropout (Dropout) (None, 163, 256) 0
lstm_1 (LSTM) (None, 512) 1574912
dropout_1 (Dropout) (None, 512) 0
dense (Dense) (None, 19260) 9880380
=================================================================
Total params: 16,911,164
Trainable params: 16,911,164
Non-trainable params: 0
你可以运行的模型,但首先你需要下载一些大书
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Embedding, LSTM, Dense, Dropout
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
import numpy as np
tokenizer = Tokenizer()
# Book with len > 1 000 000 words
with open('text.txt', encoding='utf-8') as f:
data = f.read().replace('\ufeff', '')
data_list = data.lower().split("\n")
tokenizer.fit_on_texts(data_list)
total_words = len(tokenizer.word_index) + 1
print('Words number:', total_words)
input_sequences = []
for line in data_list:
token_list = tokenizer.texts_to_sequences([line])[0]
for i in range(1, len(token_list)):
n_gram_sequence = token_list[:i + 1]
input_sequences.append(n_gram_sequence)
max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))
X, labels = input_sequences[:, :-1], input_sequences[:, -1]
Y = to_categorical(labels, num_classes=total_words)
model = Sequential()
model.add(Embedding(total_words, 256, input_length=max_sequence_len - 1))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(512))
model.add(Dropout(0.1))
model.add(Dense(total_words, activation='softmax'))
adam = Adam(lr=0.001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['acc'])
history = model.fit(x=X, y=Y, batch_size=128, epochs=1000)
版本:
1)
- OS: Windows 10
- Cuda:11.6(最新,来自 nvidia 网站)
- python: 3.9
- 张量流:2.8
- 开始于:cmd
- 显卡:3060
- OS: Windows 11
- Cuda:由 conda 下载
- python: 3.8
- 张量流:2.6
- 开始于:conda
- 显卡:1060Ti
可能是内存问题。您可能没有足够的 ram 将嵌入从 CPU 复制到 GPU。监控您的 RAM 和 GPU 使用情况。如果它占用太多内存,请尝试使用自定义数据生成器,而不是将所有 20,000 个句子存储在一个变量中,您可以在其中根据需要生成数据。那样的话,可以节省很多space。所以尝试自定义数据生成器。让我知道它是否有效。
要考虑的要点:
尝试改变你的超参数,比如减少神经元的数量。 19260 个神经元非常庞大。如果它是分类任务,只需使用与 类 数量相同的神经元。如果你有 5 个 类 使用 5 个神经元。
减少批量大小也可能有所帮助。
尝试找出在训练过程中哪个内存耗尽了。如果是 RAM,自定义数据生成器会有所帮助,但如果是 GPU,则必须减小参数大小。我猜 16,911,164 个参数你必须至少有 16GB 的 GPU。所以你应该考虑尽量减少这个。
自定义数据生成器示例
如果 RAM 是问题,那么这可能会有所帮助。假设您有 pre_processed 数据并将数据保存在文本文件或 CSV 格式中。
this is for custom image data generator 但您会了解整体思路
我会添加示例代码。这不是一个工作示例。我只是想给你一个关于自定义生成器的想法
def custom_gen(batch_size,file_path):
sentences=[]
labels=[]
with open(file_path) as file:
csvreader = csv.reader(file)
#for the first time it will give header so skip it
_=next(csvsreader)
#since you know the len of data
for i in range(length of your data):
#considering you have only two columns [sentences and labels]
data=next(csvreader)#it returns a list with number of columns in your csv.In this case 2 columns
sentences.append(data[0])
labels.append(data[1])
if len(sentences) == batch_size:
sentences=np.array(sentences)
labels=np.array(labels)
final_data=sentences,labels #always be sure if you have the desired shape and datatype
yeild final_data
sentences.clear()
labels.clear()
#finally make the function as generator
dataset=tf.data.Dataset.from_generator(custom_gen,output_signature=
(tf.TensorSpec(shape=(your sentences array shape),dtype=sentences array dtype),tf.TensorSpec(shape=(labels array shape),dtype=labels array dtype)))
#to generate data for as many times as you want
# prefetch helps you to manage memory.
dataset=dataset.prefetch(buffer_size=tf.data.AUTOTUNE).repeat(1000)
#you can then fit the model with your custom data generator
model.fit(dataset, epochs=1000) #don't need separate values for x and y
我需要避免这个错误:tensorflow.python error.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized
。和我3060内存的获取有关,为了避免它,我不得不在CPU上做Embedding
层计算,但是怎么做呢?我在 CPU 上尝试了 运行 完整模型,它工作正常,但非常慢。例如,如果我将所有层中的神经元数量减少到 128,那么我可以使用 8000 个句子(data_list[:8000]
而不是下面的 6000 个句子)进行训练,但我有 ~ 20000 个。
我的模特:
class CPUEmbedding(Embedding):
@tf_utils.shape_type_conversion
def build(self, input_shape):
with ops.device('cpu:0'):
self.embeddings = self.add_weight(
shape=(self.input_dim, self.output_dim),
initializer=self.embeddings_initializer,
name='embeddings',
regularizer=self.embeddings_regularizer,
constraint=self.embeddings_constraint)
self.built = True
print('Embedding starts on cpu')
model = Sequential()
model.add(CPUEmbedding(19260, 256, input_length=163))
model.add(LSTM(256, return_sequences=True)) # the output will be a sequence of the same length
model.add(Dropout(0.2))
model.add(LSTM(512))
model.add(Dropout(0.2))
model.add(Dense(self.total_words, activation='softmax'))
adam = Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['acc'])
模型摘要:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
cpu_embedding (CPUEmbedding (None, 163, 256) 4930560
)
lstm (LSTM) (None, 163, 256) 525312
dropout (Dropout) (None, 163, 256) 0
lstm_1 (LSTM) (None, 512) 1574912
dropout_1 (Dropout) (None, 512) 0
dense (Dense) (None, 19260) 9880380
=================================================================
Total params: 16,911,164
Trainable params: 16,911,164
Non-trainable params: 0
你可以运行的模型,但首先你需要下载一些大书
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Embedding, LSTM, Dense, Dropout
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
import numpy as np
tokenizer = Tokenizer()
# Book with len > 1 000 000 words
with open('text.txt', encoding='utf-8') as f:
data = f.read().replace('\ufeff', '')
data_list = data.lower().split("\n")
tokenizer.fit_on_texts(data_list)
total_words = len(tokenizer.word_index) + 1
print('Words number:', total_words)
input_sequences = []
for line in data_list:
token_list = tokenizer.texts_to_sequences([line])[0]
for i in range(1, len(token_list)):
n_gram_sequence = token_list[:i + 1]
input_sequences.append(n_gram_sequence)
max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))
X, labels = input_sequences[:, :-1], input_sequences[:, -1]
Y = to_categorical(labels, num_classes=total_words)
model = Sequential()
model.add(Embedding(total_words, 256, input_length=max_sequence_len - 1))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(512))
model.add(Dropout(0.1))
model.add(Dense(total_words, activation='softmax'))
adam = Adam(lr=0.001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['acc'])
history = model.fit(x=X, y=Y, batch_size=128, epochs=1000)
版本:
1)
- OS: Windows 10
- Cuda:11.6(最新,来自 nvidia 网站)
- python: 3.9
- 张量流:2.8
- 开始于:cmd
- 显卡:3060
- OS: Windows 11
- Cuda:由 conda 下载
- python: 3.8
- 张量流:2.6
- 开始于:conda
- 显卡:1060Ti
可能是内存问题。您可能没有足够的 ram 将嵌入从 CPU 复制到 GPU。监控您的 RAM 和 GPU 使用情况。如果它占用太多内存,请尝试使用自定义数据生成器,而不是将所有 20,000 个句子存储在一个变量中,您可以在其中根据需要生成数据。那样的话,可以节省很多space。所以尝试自定义数据生成器。让我知道它是否有效。
要考虑的要点:
尝试改变你的超参数,比如减少神经元的数量。 19260 个神经元非常庞大。如果它是分类任务,只需使用与 类 数量相同的神经元。如果你有 5 个 类 使用 5 个神经元。
减少批量大小也可能有所帮助。
尝试找出在训练过程中哪个内存耗尽了。如果是 RAM,自定义数据生成器会有所帮助,但如果是 GPU,则必须减小参数大小。我猜 16,911,164 个参数你必须至少有 16GB 的 GPU。所以你应该考虑尽量减少这个。
自定义数据生成器示例
如果 RAM 是问题,那么这可能会有所帮助。假设您有 pre_processed 数据并将数据保存在文本文件或 CSV 格式中。
this is for custom image data generator 但您会了解整体思路
我会添加示例代码。这不是一个工作示例。我只是想给你一个关于自定义生成器的想法
def custom_gen(batch_size,file_path):
sentences=[]
labels=[]
with open(file_path) as file:
csvreader = csv.reader(file)
#for the first time it will give header so skip it
_=next(csvsreader)
#since you know the len of data
for i in range(length of your data):
#considering you have only two columns [sentences and labels]
data=next(csvreader)#it returns a list with number of columns in your csv.In this case 2 columns
sentences.append(data[0])
labels.append(data[1])
if len(sentences) == batch_size:
sentences=np.array(sentences)
labels=np.array(labels)
final_data=sentences,labels #always be sure if you have the desired shape and datatype
yeild final_data
sentences.clear()
labels.clear()
#finally make the function as generator
dataset=tf.data.Dataset.from_generator(custom_gen,output_signature=
(tf.TensorSpec(shape=(your sentences array shape),dtype=sentences array dtype),tf.TensorSpec(shape=(labels array shape),dtype=labels array dtype)))
#to generate data for as many times as you want
# prefetch helps you to manage memory.
dataset=dataset.prefetch(buffer_size=tf.data.AUTOTUNE).repeat(1000)
#you can then fit the model with your custom data generator
model.fit(dataset, epochs=1000) #don't need separate values for x and y