在batch_encode_plus之后，如何获得一批句子的翻译？

如何解决在batch_encode_plus之后，如何获得一批句子的翻译？

我想使用预先训练的模型来翻译一批句子。

if let _ = vc as? T {
    _ = mainVc.popToViewController(vc,animated: true)
}

model = AutoModelWithLMHead.from_pretrained("Helsinki-NLP/opus-mt-es-en") tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-es-en") batch_input_str = (("Mary spends $20 on pizza"),("She likes eating it"),("The pizza was great")) encoded = (tokenizer.batch_encode_plus(batch_input_str,pad_to_max_length=True))类似于：

encoded

然后，我应该将{'input_ids': [[4963,10154,5021,9,25,1326,2255,35,17462,0],[552,3996,2274,129,75,2223,1370,[42,12378,5807,1949,65000,65000]],'attention_mask': [[1,1,1],[1,0]]}传递给

encoded

然后使用

output = model.generate(a)

？

谢谢！

解决方法

模型Helsinki-NLP/opus-mt-es-en从西班牙语翻译成英语。请看下面的例子：

# use AutoModelForSeq2SeqLM because AutoModelWithLMHead is deprecated
from transformers import AutoTokenizer,AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-es-en")
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-es-en")
batch_input_str = (("Mary gasta $ 20 en pizza"),("A ella le gusta comerlo"),("La pizza estuvo genial"))
encoded = tokenizer.prepare_seq2seq_batch(batch_input_str)
translated = model.generate(**encoded)
tokenizer.batch_decode(translated,skip_special_tokens=True)

输出：

['Mary spends $20 on pizza','She likes to eat it.','The pizza was great.']

如果您正在寻找一种可以将英语翻译成西班牙语的模型，则可以使用Helsinki-NLP/opus-mt-en-ROMANCE。大写字母表示该模型支持多种语言。您可以从令牌生成器中检索支持的语言的列表：

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-ROMANCE")
tokenizer.supported_language_codes

输出：

['>>fr<<','>>es<<','>>it<<','>>pt<<','>>pt_br<<','>>ro<<','>>ca<<','>>gl<<','>>pt_BR<<','>>la<<','>>wa<<','>>fur<<','>>oc<<','>>fr_CA<<','>>sc<<','>>es_ES<<','>>es_MX<<','>>es_AR<<','>>es_PR<<','>>es_UY<<','>>es_CL<<','>>es_CO<<','>>es_CR<<','>>es_GT<<','>>es_HN<<','>>es_NI<<','>>es_PA<<','>>es_PE<<','>>es_VE<<','>>es_DO<<','>>es_EC<<','>>es_SV<<','>>an<<','>>pt_PT<<','>>frp<<','>>lad<<','>>vec<<','>>fr_FR<<','>>co<<','>>it_IT<<','>>lld<<','>>lij<<','>>lmo<<','>>nap<<','>>rm<<','>>scn<<','>>mwl<<']

您可以使用以下语言代码来定义目标并获得所需的翻译：

model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-ROMANCE")
batch_input_str = (("Mary spends $20 on pizza"),("She likes eating it"),("The pizza was great"))
#we define Spanish as target language
batch_input_str = [ '>>es<< '+ x for x in batch_input_str]
encoded = tokenizer.prepare_seq2seq_batch(batch_input_str)
translated = model.generate(**encoded)
tokenizer.batch_decode(translated,skip_special_tokens=True)

输出：

['Mary gasta $20 en pizza','A ella le gusta comerlo.','La pizza fue genial.']

在batch_encode_plus之后，如何获得一批句子的翻译？

如何解决在batch_encode_plus之后，如何获得一批句子的翻译？

解决方法

相关推荐