如何解决在batch_encode_plus之后,如何获得一批句子的翻译?
我想使用预先训练的模型来翻译一批句子。
if let _ = vc as? T {
_ = mainVc.popToViewController(vc,animated: true)
}
model = AutoModelWithLMHead.from_pretrained("Helsinki-NLP/opus-mt-es-en")
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-es-en")
batch_input_str = (("Mary spends $20 on pizza"),("She likes eating it"),("The pizza was great"))
encoded = (tokenizer.batch_encode_plus(batch_input_str,pad_to_max_length=True))
类似于:
encoded
然后,我应该将{'input_ids': [[4963,10154,5021,9,25,1326,2255,35,17462,0],[552,3996,2274,129,75,2223,1370,[42,12378,5807,1949,65000,65000]],'attention_mask': [[1,1,1],[1,0]]}
传递给
encoded
然后使用
output = model.generate(a)
?
谢谢!
解决方法
模型Helsinki-NLP/opus-mt-es-en从西班牙语翻译成英语。请看下面的例子:
# use AutoModelForSeq2SeqLM because AutoModelWithLMHead is deprecated
from transformers import AutoTokenizer,AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-es-en")
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-es-en")
batch_input_str = (("Mary gasta $ 20 en pizza"),("A ella le gusta comerlo"),("La pizza estuvo genial"))
encoded = tokenizer.prepare_seq2seq_batch(batch_input_str)
translated = model.generate(**encoded)
tokenizer.batch_decode(translated,skip_special_tokens=True)
输出:
['Mary spends $20 on pizza','She likes to eat it.','The pizza was great.']
如果您正在寻找一种可以将英语翻译成西班牙语的模型,则可以使用Helsinki-NLP/opus-mt-en-ROMANCE。大写字母表示该模型支持多种语言。您可以从令牌生成器中检索支持的语言的列表:
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-ROMANCE")
tokenizer.supported_language_codes
输出:
['>>fr<<','>>es<<','>>it<<','>>pt<<','>>pt_br<<','>>ro<<','>>ca<<','>>gl<<','>>pt_BR<<','>>la<<','>>wa<<','>>fur<<','>>oc<<','>>fr_CA<<','>>sc<<','>>es_ES<<','>>es_MX<<','>>es_AR<<','>>es_PR<<','>>es_UY<<','>>es_CL<<','>>es_CO<<','>>es_CR<<','>>es_GT<<','>>es_HN<<','>>es_NI<<','>>es_PA<<','>>es_PE<<','>>es_VE<<','>>es_DO<<','>>es_EC<<','>>es_SV<<','>>an<<','>>pt_PT<<','>>frp<<','>>lad<<','>>vec<<','>>fr_FR<<','>>co<<','>>it_IT<<','>>lld<<','>>lij<<','>>lmo<<','>>nap<<','>>rm<<','>>scn<<','>>mwl<<']
您可以使用以下语言代码来定义目标并获得所需的翻译:
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-ROMANCE")
batch_input_str = (("Mary spends $20 on pizza"),("She likes eating it"),("The pizza was great"))
#we define Spanish as target language
batch_input_str = [ '>>es<< '+ x for x in batch_input_str]
encoded = tokenizer.prepare_seq2seq_batch(batch_input_str)
translated = model.generate(**encoded)
tokenizer.batch_decode(translated,skip_special_tokens=True)
输出:
['Mary gasta $20 en pizza','A ella le gusta comerlo.','La pizza fue genial.']
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。