将转换器模型应用于 pandas 列中的每一行
Apply transformer model to each row in a pandas column
我有一种情况,我想将翻译模型应用于数据框列之一的每一行。
我使用的翻译代码:
from transformers import FSMTForConditionalGeneration, FSMTTokenizer
mname = "allenai/wmt19-de-en-6-6-big"
tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)
#Loop here for all rows in the German_Text column
input_ids = tokenizer.encode(input, return_tensors="pt")
outputs = model.generate(input_ids)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded)
我想将这个模型应用到以下列并创建一个新的翻译列post这个:
German_Text English_Text
Wie geht es dir heute
mir geht es gut
英语文本列将包含上述模型的翻译文本,因此我想将该模型应用于 German_text 列中的每一行,以在 English_Text 中创建相应的翻译专栏
您需要做的就是将步骤包装到一个函数中,然后使用数据框的 apply 函数:
import pandas as pd
from transformers import FSMTForConditionalGeneration, FSMTTokenizer
mname = "allenai/wmt19-de-en-6-6-big"
tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)
df = pd.DataFrame(['Wie geht es dir heute', 'mir geht es gut'], columns=['German_Text'])
def translationPipeline(text):
input_ids = tokenizer.encode(text, return_tensors="pt")
outputs = model.generate(input_ids)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
return decoded
df['English_Text']=df['German_Text'].apply(translationPipeline)
print(df)
输出:
German_Text English_Text
0 Wie geht es dir heute How are you doing today
1 mir geht es gut I'm fine
我有一种情况,我想将翻译模型应用于数据框列之一的每一行。
我使用的翻译代码:
from transformers import FSMTForConditionalGeneration, FSMTTokenizer
mname = "allenai/wmt19-de-en-6-6-big"
tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)
#Loop here for all rows in the German_Text column
input_ids = tokenizer.encode(input, return_tensors="pt")
outputs = model.generate(input_ids)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded)
我想将这个模型应用到以下列并创建一个新的翻译列post这个:
German_Text English_Text
Wie geht es dir heute
mir geht es gut
英语文本列将包含上述模型的翻译文本,因此我想将该模型应用于 German_text 列中的每一行,以在 English_Text 中创建相应的翻译专栏
您需要做的就是将步骤包装到一个函数中,然后使用数据框的 apply 函数:
import pandas as pd
from transformers import FSMTForConditionalGeneration, FSMTTokenizer
mname = "allenai/wmt19-de-en-6-6-big"
tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)
df = pd.DataFrame(['Wie geht es dir heute', 'mir geht es gut'], columns=['German_Text'])
def translationPipeline(text):
input_ids = tokenizer.encode(text, return_tensors="pt")
outputs = model.generate(input_ids)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
return decoded
df['English_Text']=df['German_Text'].apply(translationPipeline)
print(df)
输出:
German_Text English_Text
0 Wie geht es dir heute How are you doing today
1 mir geht es gut I'm fine