如何仅将 CPU 用于嵌入？

Question

我需要避免这个错误：tensorflow.python error.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized。和我3060内存的获取有关，为了避免它，我不得不在CPU上做Embedding层计算，但是怎么做呢？我在 CPU 上尝试了运行完整模型，它工作正常，但非常慢。例如，如果我将所有层中的神经元数量减少到 128，那么我可以使用 8000 个句子（data_list[:8000] 而不是下面的 6000 个句子）进行训练，但我有 ~ 20000 个。

我的模特：

class CPUEmbedding(Embedding):
    @tf_utils.shape_type_conversion
    def build(self, input_shape):
        with ops.device('cpu:0'):
            self.embeddings = self.add_weight(
                shape=(self.input_dim, self.output_dim),
                initializer=self.embeddings_initializer,
                name='embeddings',
                regularizer=self.embeddings_regularizer,
                constraint=self.embeddings_constraint)

        self.built = True

        print('Embedding starts on cpu')

model = Sequential()
model.add(CPUEmbedding(19260, 256, input_length=163))
model.add(LSTM(256, return_sequences=True))  # the output will be a sequence of the same length
model.add(Dropout(0.2))
model.add(LSTM(512))
model.add(Dropout(0.2))
model.add(Dense(self.total_words, activation='softmax'))
adam = Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['acc'])

模型摘要：

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 cpu_embedding (CPUEmbedding  (None, 163, 256)         4930560   
 )                                                               
                                                                 
 lstm (LSTM)                 (None, 163, 256)          525312    
                                                                 
 dropout (Dropout)           (None, 163, 256)          0         
                                                                 
 lstm_1 (LSTM)               (None, 512)               1574912   
                                                                 
 dropout_1 (Dropout)         (None, 512)               0         
                                                                 
 dense (Dense)               (None, 19260)             9880380   
                                                                 
=================================================================
Total params: 16,911,164
Trainable params: 16,911,164
Non-trainable params: 0

你可以运行的模型，但首先你需要下载一些大书

from keras.preprocessing.sequence import pad_sequences
from keras.layers import Embedding, LSTM, Dense, Dropout
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential

from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical

import numpy as np

tokenizer = Tokenizer()

# Book with len > 1 000 000 words
with open('text.txt', encoding='utf-8') as f:
    data = f.read().replace('\ufeff', '')

data_list = data.lower().split("\n")
tokenizer.fit_on_texts(data_list)
total_words = len(tokenizer.word_index) + 1

print('Words number:', total_words)

input_sequences = []

for line in data_list:
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i + 1]
        input_sequences.append(n_gram_sequence)

max_sequence_len = max([len(x) for x in input_sequences])

input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))

X, labels = input_sequences[:, :-1], input_sequences[:, -1]
Y = to_categorical(labels, num_classes=total_words)

model = Sequential()
model.add(Embedding(total_words, 256, input_length=max_sequence_len - 1))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(512))
model.add(Dropout(0.1))
model.add(Dense(total_words, activation='softmax'))
adam = Adam(lr=0.001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['acc'])

history = model.fit(x=X, y=Y, batch_size=128, epochs=1000)

版本：

1)

OS: Windows 10
Cuda：11.6（最新，来自 nvidia 网站）
python: 3.9
张量流：2.8
开始于：cmd
显卡：3060

OS: Windows 11
Cuda：由 conda 下载
python: 3.8
张量流：2.6
开始于：conda
显卡：1060Ti

Answer 1

可能是内存问题。您可能没有足够的 ram 将嵌入从 CPU 复制到 GPU。监控您的 RAM 和 GPU 使用情况。如果它占用太多内存，请尝试使用自定义数据生成器，而不是将所有 20,000 个句子存储在一个变量中，您可以在其中根据需要生成数据。那样的话，可以节省很多space。所以尝试自定义数据生成器。让我知道它是否有效。

要考虑的要点：

尝试改变你的超参数，比如减少神经元的数量。 19260 个神经元非常庞大。如果它是分类任务，只需使用与类数量相同的神经元。如果你有 5 个类使用 5 个神经元。
减少批量大小也可能有所帮助。
尝试找出在训练过程中哪个内存耗尽了。如果是 RAM，自定义数据生成器会有所帮助，但如果是 GPU，则必须减小参数大小。我猜 16,911,164 个参数你必须至少有 16GB 的 GPU。所以你应该考虑尽量减少这个。

自定义数据生成器示例
如果 RAM 是问题，那么这可能会有所帮助。假设您有 pre_processed 数据并将数据保存在文本文件或 CSV 格式中。

To save data as csv

read csv

this is for custom image data generator 但您会了解整体思路

我会添加示例代码。这不是一个工作示例。我只是想给你一个关于自定义生成器的想法

def custom_gen(batch_size,file_path):
    sentences=[]
    labels=[]
    with open(file_path) as file:
        csvreader = csv.reader(file)
        #for the first time it will give header so skip it
        _=next(csvsreader)

        #since you know the len of data 
        for i in range(length of your data):
            #considering you have only two columns [sentences and labels]
            
             data=next(csvreader)#it returns a list with number of columns in your csv.In this case 2 columns
             sentences.append(data[0])
             labels.append(data[1])

             if len(sentences) == batch_size:
                sentences=np.array(sentences)
                labels=np.array(labels)
                final_data=sentences,labels #always be sure if you have the desired shape and datatype

                yeild final_data

                sentences.clear()
                labels.clear()

#finally make the function as generator
dataset=tf.data.Dataset.from_generator(custom_gen,output_signature= 
(tf.TensorSpec(shape=(your sentences array shape),dtype=sentences array dtype),tf.TensorSpec(shape=(labels array shape),dtype=labels array dtype)))

#to generate data for as many times as you want
# prefetch helps you to manage memory. 
dataset=dataset.prefetch(buffer_size=tf.data.AUTOTUNE).repeat(1000)

#you can then fit the model with your custom data generator

model.fit(dataset, epochs=1000) #don't need separate values for x and y

如何仅将 CPU 用于嵌入？

How to use CPU only for Embedding?

python

tensorflow

tensorflow2.0