如何解决为什么 BertForMaskedLM 不生成正确的屏蔽令牌?
我正在测试这段代码:
from transformers import BertTokenizer,BertModel,BertForMaskedLM
tokenizer = BertTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext")
model = BertForMaskedLM.from_pretrained("hfl/chinese-roberta-wwm-ext")
from transformers import pipeline
def check_model(model,tokenizer):
fill_mask = pipeline(
"fill-mask",model=model,tokenizer=tokenizer
)
print('Fill blank: ')
fill_mask("我喜欢 {nlp.tokenizer.mask_token}.")
print('Fill blank: ')
fill_mask("这个品牌的面膜 {nlp.tokenizer.mask_token}.")
print('Check model ...')
check_model(model,tokenizer)
但它打印出这个错误信息:
raceback (most recent call last):
File "/Users/congminmin/nlp/embedding/transformer/bert_roberta_wwm_test.py",line 21,in <module>
check_model(model,tokenizer)
File "/Users/congminmin/nlp/embedding/transformer/bert_roberta_wwm_test.py",line 15,in check_model
fill_mask("我喜欢 {nlp.tokenizer.mask_token}.")
File "/Users/congminmin/.venv/wbkg/lib/python3.7/site-packages/transformers/pipelines/fill_mask.py",line 162,in __call__
self.ensure_exactly_one_mask_token(masked_index.numpy())
File "/Users/congminmin/.venv/wbkg/lib/python3.7/site-packages/transformers/pipelines/fill_mask.py",line 90,in ensure_exactly_one_mask_token
f"No mask_token ({self.tokenizer.mask_token}) found on the input",transformers.pipelines.base.PipelineException: No mask_token ([MASK]) found on the input
解决方法
这是一个字符串格式问题。当前,当您致电时:
"这个品牌的面膜 {nlp.tokenizer.mask_token}."
您创建的字符串是:
'这个品牌的面膜 {nlp.tokenizer.mask_token}.'
你真正想做的是 (formated string literals):
f"我喜欢 {fill_mask.tokenizer.mask_token}."
输出:
'我喜欢 [MASK].'
完整示例:
from transformers import BertTokenizer,BertModel,BertForMaskedLM
tokenizer = BertTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext")
model = BertForMaskedLM.from_pretrained("hfl/chinese-roberta-wwm-ext")
from transformers import pipeline
def check_model(model,tokenizer):
fill_mask = pipeline(
"fill-mask",model=model,tokenizer=tokenizer
)
print('Fill blank: ')
print(fill_mask(f"我喜欢 {fill_mask.tokenizer.mask_token}."))
print('Fill blank: ')
print(fill_mask(f"这个品牌的面膜 {fill_mask.tokenizer.mask_token}."))
print('Check model ...')
check_model(model,tokenizer)
输出:
Some weights of the model checkpoint at hfl/chinese-roberta-wwm-ext were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias','cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Check model ...
Fill blank:
[{'sequence': '我 喜 欢 他.','score': 0.20969171822071075,'token': 800,'token_str': '他'},{'sequence': '我 喜 欢 你.','score': 0.2071659415960312,'token': 872,'token_str': '你'},{'sequence': '我 喜 欢 她.','score': 0.13876770436763763,'token': 1961,'token_str': '她'},{'sequence': '我 喜 欢 的.','score': 0.07542475312948227,'token': 4638,'token_str': '的'},{'sequence': '我 喜 欢 它.','score': 0.05587303638458252,'token': 2124,'token_str': '它'}]
Fill blank:
[{'sequence': '这 个 品 牌 的 面 膜 好.','score': 0.15848451852798462,'token': 1962,'token_str': '好'},{'sequence': '这 个 品 牌 的 面 膜..','score': 0.12413082271814346,'token': 119,'token_str': '.'},{'sequence': '这 个 品 牌 的 面 膜 呢.','score': 0.09926403313875198,'token': 1450,'token_str': '呢'},{'sequence': '这 个 品 牌 的 面 膜 啊.','score': 0.06865812838077545,'token': 1557,'token_str': '啊'},{'sequence': '这 个 品 牌 的 面 膜 1.','score': 0.061997584998607635,'token': 122,'token_str': '1'}]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。